Recently, I’ve been working on some reinforcement learning. I learned about DDDQNs, which are neural networks that learns how to play games. DDDQNs tries to estimate the value for being at a state and the advantage of a certain action over the other actions. Then it becomes easy to choose which action to take at an observed state. Neural networks are able to approximate a vast amount of states, which makes it more useful in games that have a ton of different states that might be reachable. For the game I trained my DDDQN implementation to play, the goal was to control a blue box to reach green boxes (+1 reward) and avoid red boxes (-1 reward). The DDDQN directly uses the screen pixels as inputs, so initially, it does not understand concepts like being close to a box or even how to get a reward. Naturally, there are four possible action for each observed state.
The architecture includes three convolution layers and two fully connected layers.
Many different algorithms were used to make the training process more efficient and make the network converge faster. Double DQN and Dueling DQN techniques were implemented to boost performance by separating the action selection and evaluation calculations into two networks and splitting the network’s result after the convolutions into two separate fully connected stacks of layers. Uniform and proportional prioritized experience replay were used to let the network learn from past experiences and make training more efficient.
The DDDQN is implemented in Python, and it used TensorFlow, Numpy, and Matplotlib.
Here is a video demonstrating the results after ~3 hours of training on a GPU:
More information and the source code is available here.