One of the most impressive developments in recent years has been the production of AI systems that can teach themselves to master the rules of a larger system. Notable successes have included experiments with chess and Starcraft. Given that self-teaching capability, it’s tempting to think that computer-controlled systems should be able to teach themselves everything they need to know to operate. Obviously, for a complex system like a self-driving car, we’re not there yet. But it should be much easier with a simpler system, right?
Maybe not. A group of researchers in Amsterdam attempted to take a very simple mobile robot and create a system that would learn to optimize its movement through a learn-by-doing process. While the system the researchers developed was flexible and could be effective, it ran into trouble due to some basic features of the real world, like friction.
The robots in the study were incredibly simple and were formed from a varying number of identical units. Each had an on-board controller, battery, and motion sensor. A pump controlled a piece of inflatable tubing that connected a unit to a neighboring unit. When inflated, the tubing generated a force that pushed the two units apart. When deflated, the tubing would pull the units back together.
Linking these units together created a self-propelled train. Given the proper series of inflation and deflation, individual units could drag and push each other in a coordinated manner, providing a directional movement that pushed the system along like an inchworm. It would be relatively simple to figure out the optimal series of commands sent to the pump that controls the inflation—simple, but not especially interesting. So the researchers behind the new work decided to see if the system could optimize its own movement.
Each unit was allowed to act independently and was given a simple set of rules. Inflation/deflation was set to cycle every two seconds, with the only adjustable parameter being when, within that 2-second window, the pump would turn on (it would stay on for less than a second). Each unit in the chain would choose a start time at random, use it for a few cycles, and then use the system’s on-board sensor to determine how far the robot moved. The start time was chosen randomly during the learning period, and a refinement period followed, during which areas around the best-performing times were sampled.
Critically, each unit in the chain operated completely independently, without knowing what the other units were up to. The coordination needed for forward motion emerged spontaneously.
The researchers started by linking two robots and an inert block into a train and placing the system on a circular track. It only took about 80 seconds for some of the trains to reach the maximum speed possible, a stately pace of just over two millimeters per second. There’s no way for this hardware to go faster, as confirmed by simulations in a model system.
Not so fast
But problems were immediately apparent. Some of the systems got stuck in a local minimum, optimizing a speed that was only a quarter that of the maximum possible. Things went poorly in a different way when the team added a third robot to the train.
Here again, the system took only a few minutes to approach the maximum speed seen in simulations. But once they reached that speed, most systems seemed to start slowing down. That shouldn’t be possible, as the units always saved the cycle start time associated with the maximum velocity they reached. Since they should never intentionally choose a lower velocity, there’s no reason they should slow down, right?
Fortunately, someone on the team noticed that the systems weren’t experiencing a uniform slowdown. Instead, they came to a near-halt at specific locations on the track, suggesting that they were running into issues with friction at those points. Even though the robots kept performing the actions associated with the maximum speed elsewhere on the track, they were doing so in a location where a different series of actions might power through the friction more effectively.
To fix this issue, the researchers did some reprogramming. Originally, the system just looked for the maximum velocity and stored that and the inflation cycle start time associated with it. After the switch, the system always saved the most recent velocity but only updated the start time if the stored velocity was slower than the more recent one. If the system hit a rough spot and slowed down dramatically, it could find an optimal means to power through and then re-optimize for the optimum speed afterward.
This adjustment got the four-car system to move at an average speed of two millimeters per second. Not quite as good as the three-car train, but quite close to it.
The misadventures between expectations and reality did not end there. To test whether the system could learn to recover from failure, the researchers blocked the release valve in one of the units, forcing it into an always-inflated state. The algorithm re-optimized, but the researchers found that it worked even better when the pump still turned on and off, even if the pump wasn’t pushing any air. Apparently, the vibrations helped limit the friction that might otherwise bog the whole system down.
The refinement system, which tried start times close to the maximum, also turned out to be problematic once a train got long enough. With a seven-car example, the system would regularly reach the maximum speed but quickly slow back down. Apparently, the slight variations tested during refinement could be tolerated when a train was small, but they put too many cars out of sync once the train got long enough.
Still, the overall system was pretty effective, even if used on a simple system. It took two simple properties and turned them into a self-learning system that could respond to environmental changes like friction. The system was scalable in that it worked well for systems with a variety of train lengths. And it was robust to damage, such as when the researchers blocked a valve. In a different experiment, the researchers cut the train in half, and both halves re-optimized their speeds.
While simple, the system provides some insights into how we might think about self-teaching systems. And the experiment reminds us that the real world will throw even the best self-teaching system a few curves.
PNAS, 2021. DOI: 10.1073/pnas.2017015118 (About DOIs).