EngEd Community

Section’s Engineering Education (EngEd) Program fosters a community of university students in Computer Science related fields of study to research and write about topics that are relevant to engineers in the modern technology landscape. You can find more information and program guidelines in the GitHub repository. If you're currently enrolled in a Computer Science related field of study and are interested in participating in the program, please complete this form .

Building a Time Series Weather Forecasting Application in Python

January 14, 2022

This tutorial will look at how we can forecast the weather using a time series package known as Neural Prophet.

In this tutorial, we will be going through a couple of key things:

  • We’ll start by preprocessing our data fetched from Kaggle using the Pandas library.
  • We’ll train a time series forecasting model to predict temperature using the model.
  • We’ll learn how to forecast the temperature into the future.

Prerequisites

To follow along with this tutorial, you need to:

  • Be familiar with Machine Learning modeling.
  • Use either Google Colab or Jupyter Notebook.

Table of contents

About Neural Prophet

It is a time-series model built on top of AR-Net and Facebook Prophet. It is an upgraded version of Facebook Prophet. It uses the PyTorch framework as a backend. It is beginner-friendly, and one can get started using a quick pip install.

It incorporates traditional statistical and neural network models for time series modeling, used in forecasting and anomaly detection. The model generates high-quality forecasts for time series data that have multiple seasonality with linear or non-linear growth.

We will use the model to forecast the future temperature of Austin, Texas, given past temperature data of the same location.

Installing and importing the required dependencies

The main package that we will install is the Neural prophet package.

!pip install neuralprophet

We need to import the necessary dependencies into our notebook. We will import Pandas, Neural Prophet, Matplotlib, and Pickle.

import pandas as pd
from neuralprophet import NeuralProphet
from matplotlib import pyplot as plt
  • Pandas will help us read our data into our notebook.
  • NeuralProphet is the class we will use to predict the future temperature.
  • Matplotlib will be used in plotting.

The next step involves us importing our data.

Loading the dataset

We will use the Austin Weather dataset from Kaggle. Although it is a dataset contains the historical temperature, precipitation, humidity, and windspeed for Austin, Texas, we will only predict the temperature. That means we will only work with the temperature data from the dataset. You need to download it and upload the austin_weather.csv file into your notebook.

df = pd.read_csv('austin_weather.csv')
df.tail()

We have used Pandas read_csv() method to load our dataset. In addition, we’ve used the tail() method to view the last five rows in our dataset.

Let us do a bit of exploratory data analysis on the data.

df.Date.unique()

When you run the code above, you’ll see that the dates in our dataset that’ll be used for training range between 2013-12-21 and 2017-07-31. That’s about four years worth of data.

Output:

array(['2013-12-21', '2013-12-22', '2013-12-23', ..., '2017-07-29',
       '2017-07-30', '2017-07-31'], dtype=object)

Let’s take a look at all the columns available in our dataset.

df.columns

Output:

Index(['Date', 'TempHighF', 'TempAvgF', 'TempLowF', 'DewPointHighF',
       'DewPointAvgF', 'DewPointLowF', 'HumidityHighPercent',
       'HumidityAvgPercent', 'HumidityLowPercent',
       'SeaLevelPressureHighInches', 'SeaLevelPressureAvgInches',
       'SeaLevelPressureLowInches', 'VisibilityHighMiles',
       'VisibilityAvgMiles', 'VisibilityLowMiles', 'WindHighMPH', 'WindAvgMPH',
       'WindGustMPH', 'PrecipitationSumInches', 'Events'],
      dtype='object')

As we advance, we will only be focusing on the TempAvgF column.

Let’s now do a bit of preprocessing.

Preprocessing the data

We begin by checking the data types of the columns.

df.dtypes

Output:

Date                          object
TempHighF                      int64
TempAvgF                       int64
TempLowF                       int64
DewPointHighF                 object
DewPointAvgF                  object
DewPointLowF                  object
HumidityHighPercent           object
HumidityAvgPercent            object
HumidityLowPercent            object
SeaLevelPressureHighInches    object
SeaLevelPressureAvgInches     object
SeaLevelPressureLowInches     object
VisibilityHighMiles           object
VisibilityAvgMiles            object
VisibilityLowMiles            object
WindHighMPH                   object
WindAvgMPH                    object
WindGustMPH                   object
PrecipitationSumInches        object
Events                        object
dtype: object

We will need to change the Date format from an object to a datetime format. The model only accepts the datetime format for the date column.

df ['Date'] = pd.to_datetime(df ['Date'])
df.tail()

We’ve converted our date column from an object to a date-time type. If you type in df.dtypes, you will see that the formatting has changed.

Results:

Date                          datetime64[ns]
TempHighF                              int64
TempAvgF                               int64
dtype: object

This is a requirement whenever you’re working with Neural Prophet.

Neural prophet requires you to give it two columns only. The ds column is a timestamp, and a y column is the numeric one we want to predict. In this case, our ds will be Date while our y will be TempAvgF.

Let’s use matplotlib to plot our temperature over time.

plt.plot(df ['Date'], df ['TempAvgF'])
plt.show()

Result:

Plot

To plot the graph above, we’ve used plt.plot() method from Matplotlib. We’ve passed df['Date'] as the x variable and df ['TempAvgF'] as the y variable.

Always check whether your data has missing values as you would not want to pass data with missing values to Neural Prophet. For our case, the data looks good.

Next, we will filter out a couple of our columns. As mentioned earlier, Neural Prophet only expects two columns.

new_column = df[['Date', 'TempAvgF']] 
new_column.dropna(inplace=True)
new_column.columns = ['ds', 'y'] 
new_column.tail()

When you run the code above, you’ll notice that our dataset has been filtered to only two columns, ds and y. With our Date now being ds and the TempAvgF being y.

If you want to forecast something else such as HumidityAvgPercent or DewPointAvgF, you only need to change the second variable in df[['Date', 'TempAvgF']] to your desired target. For example, df[['Date', 'HumidityAvgPercent']].

We can now go ahead and train our model.

Training the forecasting model

We need first to create a new instance of Neural Prophet using the NeuralProphet() class we imported earlier. We store this instance inside a variable n. Secondly, we’ll use the fit() method to go ahead and train.

n = NeuralProphet()
model = n.fit(new_column, freq='D', epochs=5000)

We will be training our model for 5000 epochs. You can choose to train yours for shorter or longer epochs depending on the accuracy you get. It uses AR-Net in the background to train. The freq='D' denotes that we’re using a daily frequency.

After training for 5000 epochs, we get a Mean Absolute Error of 1.74.

Up until now, we’ve been doing preprocessing and training. Let’s go ahead and perform some forecasting.

Forecasting the temperature into the future

future = n.make_future_dataframe(new_column, periods=1500)
forecast = n.predict(future)
forecast.tail()

We are forecasting for 1500 periods (1500 days into the future). We’ve also used the n.predict() method to go ahead and predict our future values. Finally, we use the tail() method to list our five last rows. You’ll notice that the last row is our 1500th prediction. That is, 2021-09-08. Remember, our dataset only has values up to the date 2017-07-31.

Let’s visualize these predictions.

plot = n.plot(forecast)

Result:

Visualizing our prediction

From these results, we can deduce that we expect the temperature to be very high in the middle of the year between June and August. In addition, between November and February, we expect a lot of colder temperatures. This result mimics the one that we had earlier with hotter temperatures between June - August and colder temperatures between November and February.

You can find the complete code for this tutorial here.

Wrapping up

That wraps up how to generate weather forecasts into the future. We performed some exploratory data analysis on our data, trained our model, and finally made the predictions with only a few lines of code. Feel free to try it out yourself.

Happy coding!


Peer Review Contributions by: Willies Ogola