项目作者: pikinder

项目描述 :
Deep Q-Networks in tensorflow
高级语言: Python
项目地址: git://github.com/pikinder/DQN.git
创建时间: 2017-03-21T08:52:30Z
项目社区:https://github.com/pikinder/DQN

开源协议:

下载


Deep Q Networks in tensorflow

This is a side project to learn more about reinforcement learning.
The goal is to have a relatively simple implementation of Deep Q Networks [1,2] that can learn on (some) of the Atari Games.
It is not an exact reproduction of the original paper.

Notes

  • The architecture from DeepMind’s nature publication [2] is used.
  • Standard DQN (without target network) [1] and Double DQN [3] is implemented.
  • Loss clipping from DeepMind’s nature paper [2] is used. ( The implementation mimics [6].)
  • Pre-processing is done by
    1. RGB to grayscale conversion
    2. Rescaling to 84 by 84 (this does not preserve the aspect ratio).
  • On the atari games, the replay memory uses uint8 to reduce memory usage.
  • The atari games are accessed through OpenAI Gym [5] but not using the default environments.
    1. PongDeterministic-v3 and BreakOutDeterministic_v3 are used.
      1. This used deterministic frame skipping and action repeating similar to [2].
      2. Consequently it learns about 4 times faster compared to the less deterministic _Pong-v0_ environment.
    2. The loss of a life results in a terminal state. This was used by Mnih at al. in [2].

Content

  • train_agent.py contains the code to train and save the model. It will write summaries of the training reward per episode, the validation reward, the mse, the regularisation parameter, the mean target q value.
  • evaluate_agent.py has code to load a trained model and let it run indefinitely.
    The script shows the following visualisation of game, q-function and value history+reward.
    alt text
  • dqn.py the deep q network implemented in tensorflow. The code supports standard DQN [1] and Double DQN [3].
  • agent.py class for interacting with the environment.
  • replay.py replay memory implementation
  • config.py contains the parameter settings for CartPole, Pong and Breakout.
  • util.py some basic helper functions
  • saves/ Checkpoints of networks that work reliably
  • log/ directory where the tensorboard summaries and the checkpoints are written to.

Dependencies

  • Tensorflow 1.0
  • OpenAI gym
  • Matplotlib
  • Numpy
  • skimage for grayscale and resizing

References

  1. Mnih et al. Playing Atari with Deep Reinforcement Learning
  2. Mnih et al. Human-level control through deep reinforcement learning
  3. van Hasselt et al. Deep Reinforcement Learning with Double Q-learning
  4. Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
  5. OpenAI gym
  6. Nathan Sprague’s theano DQN implementation