Train double-jointed arms to reach target locations using Proximal Policy Optimization (PPO) in Pytorch