Robots playing

Robots playing "Guess Who" to understand people's requests

2022, Mar 10    

Reinforcement Learning (RL) has become the silver bullet behind many advancements in robotics; nowadays, it’s common to see robots learning autonomously, via a trial-and-error process, how to perform real-world tasks, such as sorting items in Amazon warehouses.

Nevertheless, we are still far from seeing robots achieving human-level performance in places with unexpected operating conditions. One challenge that prevents RL from unleashing its full power in robotics problems is the need for reward functions that accurately capture the task’s objective. This sounds super weird! But since robots don’t speak Spanish, English, or Chinese, how else could we let them know what we want them to do?

At Stanford University, Dr. Dorsa Sadigh has a better idea. In her research paper titled “Learning reward functions by integrating human demonstrations and preferences,” Dr. Sadigh shows how a robot can synthesize reward functions automatically by seeing demonstrations of a human expert and asking some questions about it.

Allowing the robot to learn the reward function through demonstrations is not new; this is a well-known idea called Inverse Reinforcement Learning (IRL). However, IRL requires a human to operate a robot remotely, which is very hard! A robot’s body is way different from ours (the Fetch robot we see in the header of this post, for example, has an arm with three elbows!). These differences in physiology result in imprecise demonstrations. To address this challenge, Dr. Sadigh complements IRL with a learning technique known as Active Learning, which allows the robot to ask questions about the human’s preferences until it finds the reward function that best encodes the requested task.

The process is simple: The human demonstrates what she wants the robot to do by remotely operating it. The robot generates some trajectories (hypotheses) that achieve what it “thinks” the human is requesting and selects two hypotheses that maximize the amount of information that could be obtained after asking the human which one is correct. Once the human has chosen the one that most resemble what she is trying to demonstrate, the robot discards some hypotheses and asks a new question. The process continues until the robot is confident that a particular hypothesis is true. Sounds familiar? Yes, it does! To understand her request, the robot plays “Guess Who” with the human 🙃