Reinforcement Learning
Imagine a mouse in a maze trying to find hidden pieces of cheese. The more times we expose the mouse to the maze, the better it gets at finding the cheese. At first, the mouse might move randomly, but after some time, the mouse’s experience helps it realize which actions bring it closer to the cheese.
The process for the mouse mirrors what we do with Reinforcement Learning (RF) to train a system or a game. Generally speaking, RL is a machine learning method that helps an agent learn from experience. By recording actions and using a trial-and-error approach in a set environment, RF can maximize a cumulative reward. In our example, the mouse is the agent and the maze is the environment. The set of possible actions for the mouse are: move front, back, left or right. The reward is the cheese.
You can use RF when you have little to no historical data about a problem, because it doesn’t need information in advance (unlike traditional machine learning methods). In a RF framework, you learn from the data as you go. Not surprisingly, RF is especially successful with games, especially games of “perfect information” like chess and Go. With games, feedback from the agent and the environment comes quickly, allowing the model to learn fast. The downside of RF is that it can take a very long time to train if the problem is complex.
Just as IBM’s Deep Blue beat the best human chess player in 1997, AlphaGo, a RF-based algorithm, beat the best Go player in 2016. The current pioneers of RF are the teams at DeepMind in the UK. More on AlphaGo and DeepMind here.
On April, 2019, the OpenAI Five team was the first AI to beat a world champion team of e-sport Dota 2, a very complex video game that the OpenAI Five team chose because there were no RF algorithms that were able to win it at the time. The same AI team that beat Dota 2’s champion human team also developed a robotic hand that can reorient a block. Read more about the OpenAI Five team here.
You can tell that Reinforcement Learning is an especially powerful form of AI, and we’re sure to see more progress from these teams, but it’s also worth remembering the method’s limitations.