Mapless Navigation Using Reinforcement Learning
Abstract
In this project I explored training an agent to navigate to a goal in different environments, using reinforcement learning, without a map of the environment. I used a third party environment from openAI gym, miniworld. This environment package came with a CNN policy model for 3D image input. It also came with three state of the art optimization models: PPO, A2C, and ACKTR. I evaluated A2C and PPO. I tuned the reward functions and trained the agent in five different environments. For simple environments, hallway and single room, the default reward function works well. I was able to train the agent to navigate to the goal 100% of the time. For the more complicated environment, TMaze, adding distance to the goal as reward is able to train the agent with 99% success rate. For MazeS3 when an agent has to navigate around randomly placed walls, I was able to tune reward functions to achieve 50+% success rate.
Introduction
In mapless navigation, an agent navigates around obstacles without the knowledge of the map of its current environment. Mapless navigation is an important research area, because of its applicability. In the real world environment, objects can move, resulting in a changing environment.
Reinforcement learning has gained much attention in recent years due to the amazing results of alphago zero and training AI to play atari games. Reinforcement learning trains a policy neural network through exploration and exploitation. One advantage of reinforcement learning is that through policy training, it has the potential to develop better heuristics than what we can come up with. One example of the hide and seek game from openAI, where two agents hide and the other two agents seek. Through training the agents are able to find various game physics loopholes and develop heuristic policies to exploit those loopholes and win the game [1].
Using reinforcement learning to train an agent how to navigate to a goal is a natural and proper choice.
Related Work
One issue with reinforcement learning is the sparse reward problem. The agent only sees reward after taking multiple steps. During policy gradient, this becomes the ‘credit assignment problem’, i.e. the agent does not know which step contributes to the failure if there is zero reward. This issue makes it difficult to achieve convergence. Schulman et. al. [2] developed PPO to make training reinforcement learning models faster and easier.
Methods
Prof. Guy introduced us to openAI gym, and openAI baseline. Gym provides a collection of optimized and tested environments ranging from classic control to atari games. It makes no assumption about the agent and it provides a uniform API. The OpenAI Baselines contains a library of optimized and tested reinforcement learning algorithms.
For this project, I chose a third party environment called miniWorld. MiniWorld’s observation space is a 3D camera view of the agent, an RGB image of size 80x60x3. The action space is forward, backward, turn left, turn right, pickup (an object), and put down (an object). The step size and unit turn angle are fixed.
MiniWorld has a set of environments ranging from easy to hard. Figure[1] shows the environments that are relevant to this project. (a) is Hallway, the simplest environment. In this environment, the agent is randomly placed on one end of the narrow rectangle space and the goal is randomly placed on the other side of the space. (b) is One Room. In this environment both the agent and the goal are randomly placed in any location within the square space. The agent may have to turn to see the goal. (c) is TMaze. The agent is randomly placed in the left branch and the goal is randomly placed in either end of the T. (d) is YMaze. This environment is very similar to TMaze. Its main purpose is to test the transferability of the TMaze model. (e) is Maze S3. It is a simple maze where the agent has to navigate around randomly placed walls to the goal. Walls configuration varies, but maintains compartment width. The agent and the goal can be anywhere in the space. (f) is the full maze. I did not train an agent in the full maze at the time of this report.