A team of researchers at search engine giant Google have recently proposed a new algorithm, named Simulated Policy Learning (SimPLe), which utilizes game models to learn quality policies for selecting actions. In a paper titled ‘Model-Based Reinforcement Learning for Atari’, Google AI scientists Łukasz Kaiser and Dumitru Erhan noted that at a high-level, the idea behind SimPLe is to alternate between learning a world model of how the game behaves and using that model to optimize a policy (with model-free reinforcement learning) within the simulated game environment. They further described that the basic principles behind this algorithm are well established and have been employed in numerous recent model-based reinforcement learning methods.
Moreover, training an AI system to play games requires predicting the target game’s next frame given a sequence of observed frames and commands (e.g., “left,” “right,” “forward,” “backward”), they added. Additionally, they pointed out that a successful model can produce trajectories that could be used to train a gaming agent policy, which would obviate the need to rely on computationally costly in-game sequences. SimPLe does exactly this, according to researchers, it takes four frames as input to predict the next frame along with the reward, and after it’s fully trained, it produces “rollouts”, sample sequences of actions, observations, and outcomes that are used to improve policies. SimPLe only uses medium-length rollouts to minimize prediction errors, as Kaiser and Erhan noted.
The reason behind this development of model-based reinforcement learning methods is to lead new, better, and faster ways to perform multi-task Reinforcement Learning in that ecosystem where interactions are either costly, slow or require human labeling like many robotics tasks.