Deep Q Networks in tensorflow

This is a side project to learn more about reinforcement learning.
The goal is to have a relatively simple implementation of Deep Q Networks [1,2] that can learn on (some) of the Atari Games.
It is not an exact reproduction of the original paper.

Notes

The architecture from DeepMind’s nature publication [2] is used.
Standard DQN (without target network) [1] and Double DQN [3] is implemented.
Loss clipping from DeepMind’s nature paper [2] is used. ( The implementation mimics [6].)
Pre-processing is done by
1. RGB to grayscale conversion
2. Rescaling to 84 by 84 (this does not preserve the aspect ratio).
On the atari games, the replay memory uses uint8 to reduce memory usage.
The atari games are accessed through OpenAI Gym [5] but not using the default environments.
1. PongDeterministic-v3 and BreakOutDeterministic_v3 are used.
```
This used deterministic frame skipping and action repeating similar to [2].
Consequently it learns about 4 times faster compared to the less deterministic _Pong-v0_  environment.
```
2. The loss of a life results in a terminal state. This was used by Mnih at al. in [2].

Content

train_agent.py contains the code to train and save the model. It will write summaries of the training reward per episode, the validation reward, the mse, the regularisation parameter, the mean target q value.
evaluate_agent.py has code to load a trained model and let it run indefinitely.
The script shows the following visualisation of game, q-function and value history+reward.
dqn.py the deep q network implemented in tensorflow. The code supports standard DQN [1] and Double DQN [3].
agent.py class for interacting with the environment.
replay.py replay memory implementation
config.py contains the parameter settings for CartPole, Pong and Breakout.
util.py some basic helper functions
saves/ Checkpoints of networks that work reliably
log/ directory where the tensorboard summaries and the checkpoints are written to.

Dependencies

Tensorflow 1.0
OpenAI gym
Matplotlib
Numpy
skimage for grayscale and resizing

Deep Q Networks in tensorflow

Notes

Content

Dependencies

References