## EngEd Community

Section’s Engineering Education (EngEd) Program fosters a community of university students in Computer Science related fields of study to research and share topics that are relevant to engineers in the modern technology landscape. You can find more information and program guidelines in the GitHub repository. If you're currently enrolled in a Computer Science related field of study and are interested in participating in the program, please complete this form .

# Building a Time Series Weather Forecasting Application in Python

##### January 14, 2022

This tutorial will look at how we can forecast the weather using a time series package known as Neural Prophet.

In this tutorial, we will be going through a couple of key things:

• We’ll start by preprocessing our data fetched from Kaggle using the Pandas library.
• We’ll train a time series forecasting model to predict temperature using the model.
• We’ll learn how to forecast the temperature into the future.

### Prerequisites

To follow along with this tutorial, you need to:

• Be familiar with Machine Learning modeling.
• Use either Google Colab or Jupyter Notebook.

It is a time-series model built on top of AR-Net and Facebook Prophet. It is an upgraded version of Facebook Prophet. It uses the PyTorch framework as a backend. It is beginner-friendly, and one can get started using a quick `pip` install.

It incorporates traditional statistical and neural network models for time series modeling, used in forecasting and anomaly detection. The model generates high-quality forecasts for time series data that have multiple seasonality with linear or non-linear growth.

We will use the model to forecast the future temperature of Austin, Texas, given past temperature data of the same location.

### Installing and importing the required dependencies

The main package that we will install is the Neural prophet package.

``````!pip install neuralprophet
``````

We need to import the necessary dependencies into our notebook. We will import `Pandas`, `Neural Prophet`, `Matplotlib`, and `Pickle`.

``````import pandas as pd
from neuralprophet import NeuralProphet
from matplotlib import pyplot as plt
``````
• `Pandas` will help us read our data into our notebook.
• `NeuralProphet` is the class we will use to predict the future temperature.
• `Matplotlib` will be used in plotting.

The next step involves us importing our data.

We will use the Austin Weather dataset from Kaggle. Although it is a dataset contains the historical temperature, precipitation, humidity, and windspeed for Austin, Texas, we will only predict the temperature. That means we will only work with the temperature data from the dataset. You need to download it and upload the `austin_weather.csv` file into your notebook.

``````df = pd.read_csv('austin_weather.csv')
df.tail()
``````

We have used Pandas `read_csv()` method to load our dataset. In addition, we’ve used the `tail()` method to view the last five rows in our dataset.

Let us do a bit of exploratory data analysis on the data.

``````df.Date.unique()
``````

When you run the code above, you’ll see that the dates in our dataset that’ll be used for training range between `2013-12-21` and `2017-07-31`. That’s about four years worth of data.

Output:

``````array(['2013-12-21', '2013-12-22', '2013-12-23', ..., '2017-07-29',
'2017-07-30', '2017-07-31'], dtype=object)
``````

Let’s take a look at all the columns available in our dataset.

``````df.columns
``````

Output:

``````Index(['Date', 'TempHighF', 'TempAvgF', 'TempLowF', 'DewPointHighF',
'DewPointAvgF', 'DewPointLowF', 'HumidityHighPercent',
'HumidityAvgPercent', 'HumidityLowPercent',
'SeaLevelPressureHighInches', 'SeaLevelPressureAvgInches',
'SeaLevelPressureLowInches', 'VisibilityHighMiles',
'VisibilityAvgMiles', 'VisibilityLowMiles', 'WindHighMPH', 'WindAvgMPH',
'WindGustMPH', 'PrecipitationSumInches', 'Events'],
dtype='object')
``````

As we advance, we will only be focusing on the `TempAvgF` column.

Let’s now do a bit of preprocessing.

### Preprocessing the data

We begin by checking the data types of the columns.

``````df.dtypes
``````

Output:

``````Date                          object
TempHighF                      int64
TempAvgF                       int64
TempLowF                       int64
DewPointHighF                 object
DewPointAvgF                  object
DewPointLowF                  object
HumidityHighPercent           object
HumidityAvgPercent            object
HumidityLowPercent            object
SeaLevelPressureHighInches    object
SeaLevelPressureAvgInches     object
SeaLevelPressureLowInches     object
VisibilityHighMiles           object
VisibilityAvgMiles            object
VisibilityLowMiles            object
WindHighMPH                   object
WindAvgMPH                    object
WindGustMPH                   object
PrecipitationSumInches        object
Events                        object
dtype: object
``````

We will need to change the `Date` format from an `object` to a `datetime` format. The model only accepts the `datetime` format for the date column.

``````df ['Date'] = pd.to_datetime(df ['Date'])
df.tail()
``````

We’ve converted our date column from an object to a date-time type. If you type in `df.dtypes`, you will see that the formatting has changed.

Results:

``````Date                          datetime64[ns]
TempHighF                              int64
TempAvgF                               int64
dtype: object
``````

This is a requirement whenever you’re working with Neural Prophet.

Neural prophet requires you to give it two columns only. The `ds` column is a timestamp, and a `y` column is the numeric one we want to predict. In this case, our `ds` will be `Date` while our `y` will be `TempAvgF`.

Let’s use `matplotlib` to plot our temperature over time.

``````plt.plot(df ['Date'], df ['TempAvgF'])
plt.show()
``````

Result:

To plot the graph above, we’ve used `plt.plot()` method from Matplotlib. We’ve passed `df['Date']` as the x variable and `df ['TempAvgF']` as the y variable.

Always check whether your data has missing values as you would not want to pass data with missing values to Neural Prophet. For our case, the data looks good.

Next, we will filter out a couple of our columns. As mentioned earlier, Neural Prophet only expects two columns.

``````new_column = df[['Date', 'TempAvgF']]
new_column.dropna(inplace=True)
new_column.columns = ['ds', 'y']
new_column.tail()
``````

When you run the code above, you’ll notice that our dataset has been filtered to only two columns, `ds` and `y`. With our `Date` now being `ds` and the `TempAvgF` being `y`.

If you want to forecast something else such as `HumidityAvgPercent` or `DewPointAvgF`, you only need to change the second variable in `df[['Date', 'TempAvgF']]` to your desired target. For example, `df[['Date', 'HumidityAvgPercent']]`.

We can now go ahead and train our model.

### Training the forecasting model

We need first to create a new instance of Neural Prophet using the `NeuralProphet()` class we imported earlier. We store this instance inside a variable `n`. Secondly, we’ll use the `fit()` method to go ahead and train.

``````n = NeuralProphet()
model = n.fit(new_column, freq='D', epochs=5000)
``````

We will be training our model for `5000` epochs. You can choose to train yours for shorter or longer epochs depending on the accuracy you get. It uses AR-Net in the background to train. The `freq='D'` denotes that we’re using a daily frequency.

After training for 5000 epochs, we get a Mean Absolute Error of `1.74`.

Up until now, we’ve been doing preprocessing and training. Let’s go ahead and perform some forecasting.

### Forecasting the temperature into the future

``````future = n.make_future_dataframe(new_column, periods=1500)
forecast = n.predict(future)
forecast.tail()
``````

We are forecasting for 1500 periods (1500 days into the future). We’ve also used the `n.predict()` method to go ahead and predict our future values. Finally, we use the `tail()` method to list our five last rows. You’ll notice that the last row is our 1500th prediction. That is, `2021-09-08`. Remember, our dataset only has values up to the date `2017-07-31`.

Let’s visualize these predictions.

``````plot = n.plot(forecast)
``````

Result:

From these results, we can deduce that we expect the temperature to be very high in the middle of the year between June and August. In addition, between November and February, we expect a lot of colder temperatures. This result mimics the one that we had earlier with hotter temperatures between June - August and colder temperatures between November and February.

You can find the complete code for this tutorial here.

### Wrapping up

That wraps up how to generate weather forecasts into the future. We performed some exploratory data analysis on our data, trained our model, and finally made the predictions with only a few lines of code. Feel free to try it out yourself.

Happy coding!

Peer Review Contributions by: Willies Ogola