ML Mini Project

Project Demo

Summary

In this project, I implemented a PDE-based agent navigating from a random start position within a region to a goal position within another region. The controller is a 2 layer neural network, learned using a CEM RL algorithm. Details of the system are described in following sessions.

The implementation is based on the week 3 activity.

System Description

The system contains three parts: a neural network controller takes current states (agent position, agent orientation and goal position) as input and outputs actions (velocity and delta angle), a PDE integrator updates the position and orientation of the agent based on actions, a renderer displays agent’s movement.

Agent Render

The agent has 3 state parameters: current x position, current y position and current orientation, i.e. angle. A scene is drawn 30 frames per second to give a smooth continuous feel. The agent’s state is updated every 0.1 second time step and it is rendered by its state in frame. I choose the absolute position as the agent’s state because this simulation has a static environment. However, a relative position is probably a better solution as it reduces the total number of state parameters from 5 to 2. I will try implementing with relative positions as future work.

Motion Model

This motion model in this simulation is fairly simple. Position is the integration of velocity and agent orientation is the integration of delta angle. Because the system time step can be small, the Eulerian integrator is used for its simplicity (Eulerian integrator can accumulate significant error if the time step has to be large). The agent’s velocity is capped between 0 to 80 and the agent’s angle change is capped between -2pi and 2pi. Both those control limits are to prevent the agent from getting into an out of control state.

Neural Network Controller

The controller is a two layer neural network. The network takes 5 inputs: agent position x, agent position y, agent orientation angle, goal position x, goal position y. The network’s hidden layer has 5 nodes and a leaky Relu activation function. The network has two output nodes: agent velocity and agent orientation change (delta angle). The input state parameters are multiplied by corresponding weights and summed and activated at each hidden node. The output at each hidden node multiplies the weights at the next layer and summed at the output node. Note this neural network output layer does not have an activation function.

I also tried a two layer neural network with 10 hidden nodes and 6 hidden nodes. Neither works better than the 5 hidden node neural network. I also tried a three layer neural network. Again it seems to perform worse than the simpler two layer 5 hidden node neural network.

Neural Network Training

As in most motion control systems, reinforcement learning is a preferred training method for the neural network because multiple actions are required to achieve the desired output. Since the consequences of a single action is often unclear, supervised learning is hard to achieve. In this project, CEM as the RL training method is used to train the network. It starts with a set of random weights. Evaluate the network performance with a reward function. Pick the best 30% (configurable) weights as seed for the next round of weights to evaluate. In the next round the set of weights are generated with the best weights plus some noise. As the evolution progresses, the weights converge to some optimal set. In order for this CEM algorithm to converge, a good reward function is essential.

In order for the evolution algorithm to work, a large 'gene pool' is essential. I found a large batch set helps my training.

In order to be able to start and end at a random location, multiple samples with different start and goal positions are used in the training. However, I found the algorithm does not converge to a solution when the samples positions are too far apart, such as starting at the opposite side of a goal. In the end, I am only able to achieve a random starting location at a quadrant of the simulation space and have the goal in a relatively fixed location.

Training a neural network with CEM is slow. With 100 iterations, 800 batch size and 6 training samples, it took approx. 2 hours to train the network.

CEM Reward Function

My CEM reward function has 6 components. The first one is a negative reward proportional to the agent’s current distance from the goal. This component ensures that the actions will be steering the agent towards the goal. The second reward component is a negative reward proportional to the agent's angle change, which discourage the agent from swirling too much. The third reward component is a negative reward proportional to the velocity that is not in the direction to the goal. This component is to punish the agent for wandering away from the direction to the goal. The fourth reward component is a negative reward proportional to the angle between the agent’s new orientation and the direction to the goal. This again is to steer the agent towards the goal. The fifth reward is a large positive reward if the agent is close to the goal and the sixth reward is an even larger reward if the agent is on the goal. Both the fifth and the sixth rewards are to ensure the agent stops at the goal.

I tried to give a negative feedback to velocity by itself, but found combining it with the angle between the agent’s new orientation and the direction to the goal works better.

Code:

https://github.com/CyberHolmes/CSCI8980-3/tree/main/MiniMLProject

I used Processing because it is easy to render simulation scenes with Processing. I implemented the neural network controller, a matrix library and the PDE system in Processing. The neural network's optimized weights are trained in python.

Discussion Write-up


- LeCun, Bengio, and Hinton. Nature (2015)

This article describes several major deep learning techniques and their applications. Convolution neural networks and recurrent neural networks had produced impressive results in the image processing and speech processing field that ignited tremendous interest in deep learning research. Compared to typical research papers, the article is written in layman terms which makes it relatively easy to understand.


- Geoffrey and Salakhutdinov. Science (2006)

This article describes a pretraining procedure for the autoencoder networks. An autoencoder network can efficiently and effectively convert high dimensional data to low dimensional codes but its weights are hard to optimize unless the initial weights are close to a good solution. This pretraining procedure enables efficient fine tuning of deep networks.


- Szita and Lorincz. Neural computation (2006)

The major contribution of this article is it shows how adding noise to the cross entropy method (CEM) affects the performance of the algorithm. Network trained by CEM with decreasing noise performed magnitude better than network trained by CEM without noise or CEM with constant noise. Although in the case of Tetris game, the general algorithm and the hand coded algorithm performs way better than CEM, it is a good learning that is worth sharing.