Deploy your apps to a supercloud in a few clicks

This Engineering Education program is supported by Section. Instantly deploy your GitHub apps, Docker containers or K8s namespaces to a supercloud.

Try It For Free

Leveraging OpenAI Gym and the Anytrading Environment for Trading

February 8, 2022

The open AI Gym Anytrading environment is a custom trading environment that you can use to trade a bunch of stocks, forex, cryptocurrencies, equities, and securities.


To follow along with this tutorial, you need to be familiar with:

  • Reinforcement Learning and its algorithms.
  • Trading with cryptocurrencies, stocks, forex, equities, and securities.
  • Google Colab or Jupyter Notebook.

We will use Google Colab for this tutorial.

Table of contents

Getting started with Gym Anytrading environment

Open AI’s Gym Anytrading environment is a custom trading environment that you can use to trade a bunch of stocks, forex, cryptocurrencies, equities, and securities. We will leverage the environment to perform trading on Ethereum cryptocurrency data.

Installing and importing the dependencies

!pip install tensorflow-gpu==1.15.0 tensorflow==1.15.0 gym-anytrading gym stable-baselines

If you run the command above, it’ll go ahead and install five crucial dependencies:

  • tensorflow-gpu==1.15.0 gives us the GPU version of Tensorflow. We have selected Tensorflow version 1.15.0 as stable-baselines only works with versions 1.8.0 to 1.15.0. It does not support Tensorflow version 2.0.0 and above.
  • tensorflow==1.15.0 gives us the non-GPU version of TensorFlow to ensure we’ve got both bases covered.
  • gym-anytrading gives us our trading environment.
  • stable-baselines give us the different reinforcement learning algorithms.
  • gym gives us OpenAI’s Gym which is the base framework that Gym Anytrading is built on.

Let’s go ahead and import them into our notebook. We begin by importing the environments where our trading bot will learn how to trade.

import gym_anytrading
import gym

Our next imports include the RL algorithm and helpers imported from stable baselines. The algorithm that we will use is A2C. You could try out other RL algorithms, i.e., DQN and PPO2. To learn more about them, please refer to this link.

from stable_baselines import A2C
from stable_baselines.common.vec_env import DummyVecEnv

Next, we import our three main processing libraries.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
  • Pandas is used to read in our trading data.
  • Numpy is used in the evaluation phase of the tutorial to perform math.
  • matplotlib helps us to visualize our trades.

Loading historical data from MarketWatch

We will work with the Ethereum data from Market Watch. This data contains Ethereum’s Open, High, Low, and Close prices between the days 12/13/2021 and 01/12/2022. If you want to use the same data for reproducibility, download it here. After downloading, you can upload it into your notebook.

Remember, the upload gets deleted after runtime is recycled.

To load our historical data, we will be relying on the Pandas library.

df = pd.read_csv('ETHUSD.csv') 

This command will load in our data. To view it, we type in the following command:



     Date        Open         High       Low         Close
0   01/12/2022  3,231.84    3,412.64    3,208.15    3,370.80
1   01/11/2022  3,074.72    3,264.19    3,054.64    3,232.52
2   01/10/2022  3,188.77    3,199.13    2,933.19    3,073.50
3   01/09/2022  3,076.44    3,210.00    3,060.38    3,188.77
4   01/08/2022  3,216.12    3,245.65    2,999.08    3,080.64

Let’s perform some preprocessing to set this data up for trading. If we take a look at the data types using the command, df.dtypes, we get:

Date     object
Open     object
High     object
Low      object
Close    object
dtype: object

To get our data to work with the Gym Anytrading environment, we need to convert our Date from an object to a date-type format. We’ll use Pandas to_datetime function to perform the conversion.

df ['Date'] = pd.to_datetime(df ['Date'])

The command above should enable that conversion. To confirm, you need to type in df.dtypes. In addition, we need to set this column to be the index. This is a requirement when working with Gym Anytrading environment. To do that, we write the following command:

df.set_index('Date', inplace=True)

Lastly, we need to convert the other columns from objects to floats. Failure to perform this conversion will give you a TypeError.

df['Open'] = df['Open'].apply(lambda x: float(x.replace(",", "")))
df['High'] = df['High'].apply(lambda x: float(x.replace(",", "")))
df['Low'] = df['Low'].apply(lambda x: float(x.replace(",", "")))
df['Close'] = df['Close'].apply(lambda x: float(x.replace(",", "")))

When you type the code df ['Open'].unique() on the terminal, you’ll find that the values are in a string and have commas. The code above removes these commas by replacing them with empty spaces and converts the values into float types. By typing df.head(), you can confirm that the commas have been replaced. Besides, the df.dtypes command will help you confirm the conversion of all the columns to float64.

That’s all the preprocessing you need to do to pass this data to the Gym Anytrading environment. The environment also expects the Open, High, Close, and Close columns.

Ideally, the data that you use should mimic the frequency that you want to trade. For example, if you want the RL agent to trade daily data, use the daily data to train the agent and not hourly data.

Let’s create the environment and pass this data into the trading environment.

env = gym.make('stocks-v0', df=df, frame_bound=(5,30), window_size=5)

The window_size specifies how many previous timesteps our trading bot has as reference points when it makes its next trade. The frame_bound specifies the start and end of our df. The first parameter on the frame_bound should always be equal to the window_size so that it has the five sets of previous data. For the second parameter, you can adjust it to any value of your choice depending on your data.

Building the test environment

If we look at the actions we can take using environment.action_space, we’ll notice that we only have two actions we can take. We can only Short or Long. In other algorithms, you can Hold. But, this algorithm is only limited to two actions. If you’re not familiar with these crucial terminologies, read this documentation.

state = env.reset()
while True: 
    action = env.action_space.sample()
    n_state, reward, done, info = env.step(action)
    if done: 
        print("info", info)

If you’ve worked with OpenAI Gym before, this first part of the code must be familiar to you. It is the typical code you write when building your test environment.

The code below visualizes this environment using matplotlib.


Random step visualization

To summarize this section, we are taking a bunch of random steps in our environment and visualizing it.

Now, we can start building our RL agent to try and trade profitably in this environment.

Training an RL agent to trade using the Gym environment

We begin by wrapping our environment inside the dummy vectorized environment wrapper, DummyVecEnv.

env_build = lambda: gym.make('stocks-v0', df=df, frame_bound=(5,30), window_size=5)
env = DummyVecEnv([env_build])

We are creating an env_build function. We are taking that function and putting it inside the DummyVecEnv. Finally, we save the result inside the env variable so that when we start building our training model. We’ll now use the env variable.

Next, let’s set up our algorithm and kick off our training.

model_train = A2C('MlpLstmPolicy', env, verbose=1) 

With these two lines of code, our model will start training.

A2C is the algorithm that we will use for this run. A2C is an acronym that stands for Advantage Actor-Critic. It is an RL algorithm that combines Policy-Based and Value-Based RL techniques. You can read more about it here.

We are using the MlpLstmPolicy which is a deep neural network policy with an LSTM layer. It is an important policy as it allows a neural network to keep context and learn about the previous history within its neurons.

The last line of code kicks off training with 100000 timesteps. Ideally, what you need to observe while training is the explained_variance value. You want it to be as high as possible. We are looking at values between 0 and 1. A number close to 1 denotes a high value while that close to 0, a low value. You should also make sure the value_loss is as low as possible.

You can stop the model training when the explained_variance is 0.966. This should give us a model that performs reasonably well. However, finding a perfect solution is never a thing with algorithms. Keep experimenting until you find that sweet spot.

Let’s test it out and see how it performs.

Testing the RL-agent

env = gym.make('stocks-v0', df=df, frame_bound=(25,35), window_size=5)
obs = env.reset()
while True: 
    obs = obs[np.newaxis, ...]
    action, _states = model_train.predict(obs)
    obs, rewards, done, info = env.step(action)
    if done:
        print("info", info)

The code above is similar to the one we wrote above. The core difference is, instead of taking random actions, we are using our model to predict which action it should take, i.e., buy or sell.

Let’s visualize the results:


Trade evaluation

These are the results from training our agent operating from day 25 to 35. Our model has predicted new actions. We can note that it has made a few good trades and bad ones too. The green dots represent buying while the red dots represent selling cryptocurrencies.

For example, on its sixth trade, it was right to buy when the price was low. But, it should have sold on the eighth trade, but instead, it bought more, which was a mistake. Also, on the tenth trade, it sold Ethereum when it was at its lowest price making a loss. This is a wrong trade.

We made a profit of 0.9861862828347474, which is about 8.6%. It is not much. But, at least you get the idea of how the model is working.

The Google Colab link for this tutorial is available here.

Wrapping up

The model isn’t perfect. It has made some long and short trades. Some good and bad trades. You can play around with the code and see how it performs. With a few tweaks, this model can be trained to trade with stocks, forex, equities, and securities. All the code in this tutorial is for educational purposes only. If you are going to deploy such a model, please do your thorough research. In addition, this isn’t financial advice. It is only meant to show you what is possible with the technology.

Further reading

Peer Review Contributions by: Collins Ayuya