Prophet is a library developed by Facebook that is ideal for performing time series forecasting. It is used to forecast anything that has a time series trend, such as the weather and sales.
This tutorial will leverage this library to estimate sales trends accurately. We will use the Python programming language for this build.
To follow along, you need to be familiar with:
- Installing and importing required dependencies
- Loading data into our notebook
- Data preprocessing
- Training the time series model
- Making predictions and evaluating performance
- Wrapping up
- Further reading
Installing and importing required dependencies
Let’s begin by installing the model.
!pip install prophet
After installing it, we need to import it into our notebook.
import pandas as pd from prophet import Prophet
pandasallows us to bring in tabular data.
prophetallows us to import the Prophet library into our Google Colab.
Let’s bring our data into the notebook. We will use store sales transaction data from Kaggle.
The dataset includes dates, store and product information, and sales numbers. It contains four years’ worth of sales data sold at Favorita stores located in Ecuador. You’ll need to download the data and upload it into your Colab.
Loading data into our notebook
We will use the
pandas library to read in our
dataframe = pd.read_csv('transactions.csv')
We load in our data and save it inside a variable called
dataframe. We can check the first five rows of data using the pandas
You can use the
tail() method to check the last five rows.
date store_nbr transactions 0 2013-01-01 25 770 1 2013-01-02 1 2111 2 2013-01-02 2 2358 3 2013-01-02 3 3487 4 2013-01-02 4 1922
Let’s take a look at the data types for these columns.
date object store_nbr int64 transactions int64
From these results, we can see that the date column is a string. The model cannot accept it as it is. It needs to be converted into a date-time format for it to work with the model.
Let’s perform some preprocessing.
It’s important whenever you’re working with time-series data that you have a date or timestamp column. It is a requirement by the Prophet model to forecast trends.
Using the Pandas
to_datetime() function, we will convert the date column from a string to a date-time format.
dataframe ['date'] = pd.to_datetime(dataframe ['date']) dataframe.dtypes
date datetime64[ns] store_nbr int64 transactions int64
We have converted our date column into a date-time format.
We need to drop the
store_nbr column. Besides, for this data to work with the Prophet model, we only need two columns, i.e, a
y column. We need to rename our date column to
ds and the transactions column to
dataframe.drop('store_nbr', axis=1, inplace=True)
We are dropping only the
store_nbr column. The
axis=1 argument tells Pandas library to drop the columns and not the rows.
date transactions 0 2013-01-01 770 1 2013-01-02 2111 2 2013-01-02 2358 3 2013-01-02 3487 4 2013-01-02 1922
We now have only two columns. As mentioned above, we need to rename the data column to
ds and transactions columns to
dataframe.columns = ['ds', 'y'] dataframe.head()
ds y 0 2013-01-01 770 1 2013-01-02 2111 2 2013-01-02 2358 3 2013-01-02 3487 4 2013-01-02 1922
Using the command above, we have successfully renamed our columns. That’s the last of the preprocessing step. We can now go ahead and create the time series model.
Training the time series model
We begin by creating an instance
p of the Prophet class.
p = Prophet(interval_width=0.92, daily_seasonality=True)
We use the
interval_width argument to estimate the uncertainty interval from the number of samples used. We’ve set ours to
daily_seasonality=True will fit daily seasonality for a sub-daily time series. It will default to weekly and yearly seasonalities if you don’t set this parameter.
You can play around with these values to check how it affects the results obtained after training.
We can now train our model.
model = p.fit(dataframe)
After running the command above, the model will be trained on the data.
Making predictions and evaluating performance
Let’s go ahead and make predictions.
future = p.make_future_dataframe(periods=200, freq='D') future.tail()
ds 1877 2018-02-27 1878 2018-02-28 1879 2018-03-01 1880 2018-03-02 1881 2018-03-03
From the results, we can see that the model has made future predictions,
200 days away from the last data value using a daily frequency. If you want to train for longer periods, you can change the value in the
To predict, we use the
predict() method and pass in the future dataframe as shown:
forecast_prediction = p.predict(future) forecast_prediction.tail()
From the results generated, the model has generated a lot of sales information in addition to the predicted
yhat column. The most important column is the
yhat column, as it is what represents your sales forecast.
We can visualize these predictions.
plot1 = p.plot(forecast_prediction)
If you take a keen look at the plot, you’ll notice that the predicted sales trend mimics the actual data’s trend. We could take this plotting even a step further and plot the individual components that make up the above plot.
plot2 = p.plot_components(forecast_prediction)
This plot could give you a lot more information about the sales data. For instance, more sales are made between Friday and Monday. Also, they seem to make a lot of sales between November and February. During the rest of the year, sales are average.
You can find the complete code for this tutorial here.
That’s sales forecasting using the Prophet model in a nutshell.
This tutorial introduces you to time series forecasting using Prophet. It should only introduce you to how to use the model in a project, and is in no way to be used for production purposes.
To use the model for production, you’ll need to do more research on it. You can also read about the Neural Prophet library. It is an extension of Prophet that adds neural networks to the mix.
Peer Review Contributions by: Wilkister Mumbi