Using Double Q Networks with experience replay to solve Cartpole v0 in just 184 episodes, implemented in Tensorflow 2.