Learning to ride a bike without reading the manual

Why is it easy to ride a bike but difficult to explain to a newbie how to move each little muscle to complete a ride? One possible explanation is that activities like this, which seem simple and intuitive, are hard to describe through a series of detailed instructions. Fortunately, humans don’t learn to ride a bike by reading exhaustive manuals; we simply give it a try, stumble, brush ourselves off, and keep practicing until we get it right.

This learning process, characteristic of humans and other animals, is a cognitive skill that artificial intelligence seeks to imitate so that machines can figure out how to achieve complex goals through trial and error.

There’s a long way to go before machines can perform any activity without any human intervention, but in the last decade, we’ve taken huge steps toward that goal. Currently, intelligent systems can interact in the real world and achieve complex goals on their own. Some examples include robotic arms solving Rubik’s cubes or autonomous vehicles navigating the streets of Silicon Valley. The silver bullet behind many of these advances is a family of algorithms known as Reinforcement Learning (RL).

RL algorithms are designed to learn, through interactions with the environment, how to make sequences of decisions that maximize a reward function. Finding good sequences of decisions is a complex problem because the algorithm doesn’t receive feedback to guide it on whether it’s on the right path. Think, for example, of the mechanism by which Amazon decides which items to recommend to you. Basically, its artificial intelligence algorithm analyzes millions of sales, finds customers with a similar buying profile to yours, chooses products those customers have already bought, and learns from your reaction (buying or not) to improve next time. Now think of an artificial agent playing chess. In this case, the agent won’t receive feedback after every decision made. Is it good or bad to capture pawns? Is it good or bad to sacrifice pieces? The only thing that can really guide the agent is the outcome of the game (did it win or lose?), and this feedback is received after about 40 decisions!

What’s most interesting is that to solve these sophisticated challenges, RL algorithms draw on ideas from diverse areas, from philosophy to neuroscience, through probability, statistics, linear algebra, and economics.

Share it on:

Thanks for reading!