AWS virtual infrastructure simulator for training reinforcement learning based cloud capacity management systems
This is AWS virtual infrastructure simulator for building automated infrastructure control systems. It is primarily designed for training reinforcement learning controllers but could be used for any other approaches. In 2016, Google DeepMind built a system for a similar problem of managing datacenter capacity.
The business case simulated is the following. You have an application such as an e-commerce website or online game and host it on AWS. You use AWS EC2 and Spot instances. They all are located behind AWS load balancer so that you can increase or shrink your cloud capacity.
However, there are many uncertainties there.
The simulator simulates running such a setup with all these uncertainties. You can run it in an automated mode where the simulation takes a traffic pattern from a real cloud application or in a manual mode where you can control these uncertainties with a simple user interface.
To start, please download the repository and run:
$ python control.py
You can also run the controller from the inside of your own code. To do that, you need to specify a simulator as an interface to AWS cloud and define if you want to simulate the behavior of a real cloud application which is provided along with the simulator ()
from simulator import interfaceSimulator
import controller
generator = interfaceSimulator(files=['data/full_balancer_model_normal.csv'], #a file describing user behavior
timeframe=10, #number of minutes per simulator step
initialServers=[4,4]) #initial number of EC2 and Spot virtual machines
ctrl = controller.controller(interface=generator, #interface to the AWS cloud or a simulator
plotHistory = 30, #number of steps to be shown on a plot
mode='A') #the mode of operation
ctrl.control(numSteps=50000, #number of steps to run
verbose=5, #how much of information to show
delay=0.) #additional delays between steps
If you create a controller
object with mode='M'
, the system will give you a visual “Cloud Controller” control interface to control the simulation manually in real-time.
Here are the methods of the controller that you may need to implement your own control logic.
You can start with the controller.estimateBestAction()
method. It implements a simple control algorithm that starts one EC2 machine if the load utilizes more than 80% of the current resources are and stops one EC2 machine if the load falls below 50%. Otherwise, it does nothing.
This simulator supports all possible reinforcement learning algorithms, including deep reinforcement learning and PEGASUS.
The environment consists of the following components.
MIT License. Copyright (c) 2018 Sergii Shelpuk.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.