Applied Data Science Notebook in Python for Beginners to Professionals

Data Science Project – A Guide to Deep Learning (LSTM) for Time Series Forecasting in Python

Machine Learning for Beginners - A Guide to Deep Learning (LSTM) for Time Series Forecasting in Python

For more projects visit: https://setscholars.net

  • There are 5000+ free end-to-end applied machine learning and data science projects available to download at SETSscholar. SETScholars is a Science, Engineering and Technology Scholars community.
In [3]:
# Suppress warnings in Jupyter Notebooks
import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

In this notebook, we will learn how to build Deep Learning (LSTM) models for Time Series Forecasting in Python.

Python Codes

Load the dataset

In [4]:
from pandas import read_csv
from matplotlib import pyplot

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)

print(series.head())

series.plot(figsize = (15,8))
pyplot.show()
         Sales
Month         
1960-01   6550
1960-02   8728
1960-03  12026
1960-04  14395
1960-05  14587
In [ ]:
 

Deep Learning (LSTM) models for Time Series Forecasting in Python

In [5]:
from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
from matplotlib import pyplot
import numpy
import warnings
warnings.filterwarnings("ignore")
Using TensorFlow backend.
In [6]:
# date-time parsing function for loading the dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')
In [7]:
# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    df.fillna(0, inplace=True)
    return df
In [8]:
# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return Series(diff)
In [9]:
# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]
In [10]:
# scale train and test data to [-1, 1]
def scale(train, test):
    # fit scaler
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    
    # transform train
    train = train.reshape(train.shape[0], train.shape[1])
    train_scaled = scaler.transform(train)
    
    # transform test
    test = test.reshape(test.shape[0], test.shape[1])
    test_scaled = scaler.transform(test)
    
    return scaler, train_scaled, test_scaled
In [11]:
# inverse scaling for a forecasted value
def invert_scale(scaler, X, value):
    new_row = [x for x in X] + [value]
    array = numpy.array(new_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]
In [12]:
# fit an LSTM network to training data
def fit_lstm(train, batch_size, nb_epoch, neurons):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    model = Sequential()
    model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
        model.reset_states()
    return model
In [13]:
# make a one-step forecast
def forecast_lstm(model, batch_size, X):
    X = X.reshape(1, 1, len(X))
    yhat = model.predict(X, batch_size=batch_size)
    return yhat[0,0]
In [15]:
# load dataset
series = read_csv('shampoo.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)

# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)

# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values

# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]

# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)

# repeat experiment
repeats = 30
error_scores = list()
for r in range(repeats):
    # fit the model
    lstm_model = fit_lstm(train_scaled, 1, 3000, 4)
    
    # forecast the entire training dataset to build up state for forecasting
    train_reshaped = train_scaled[:, 0].reshape(len(train_scaled), 1, 1)
    lstm_model.predict(train_reshaped, batch_size=1)
    
    # walk-forward validation on the test data
    predictions = list()
    for i in range(len(test_scaled)):
        # make one-step forecast
        X, y = test_scaled[i, 0:-1], test_scaled[i, -1]
        yhat = forecast_lstm(lstm_model, 1, X)
        
        # invert scaling
        yhat = invert_scale(scaler, X, yhat)
        
        # invert differencing
        yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
        
        # store forecast
        predictions.append(yhat)
    # report performance
    rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
    print('%d) Test RMSE: %.3f' % (r+1, rmse))
    error_scores.append(rmse)

# ------------------------------------------------------------
# summarize results
# ------------------------------------------------------------
results = DataFrame()
results['rmse'] = error_scores
print(results.describe())
results.boxplot()
pyplot.show()
1) Test RMSE: 148.775
2) Test RMSE: 179.256
3) Test RMSE: 274.149
4) Test RMSE: 254.404
5) Test RMSE: 96.564
6) Test RMSE: 279.073
7) Test RMSE: 87.597
8) Test RMSE: 188.766
9) Test RMSE: 173.797
10) Test RMSE: 128.270
11) Test RMSE: 85.749
12) Test RMSE: 131.992
13) Test RMSE: 132.496
14) Test RMSE: 148.621
15) Test RMSE: 132.210
16) Test RMSE: 134.550
17) Test RMSE: 311.875
18) Test RMSE: 117.185
19) Test RMSE: 166.472
20) Test RMSE: 109.171
21) Test RMSE: 98.844
22) Test RMSE: 103.560
23) Test RMSE: 130.320
24) Test RMSE: 111.780
25) Test RMSE: 96.834
26) Test RMSE: 191.685
27) Test RMSE: 95.192
28) Test RMSE: 152.945
29) Test RMSE: 421.191
30) Test RMSE: 109.285
             rmse
count   30.000000
mean   159.753562
std     77.504800
min     85.748603
25%    109.199319
50%    132.352959
75%    177.891580
max    421.190656
In [ ]:
 

Summary

In this coding recipe, we discussed how to build Deep Learning (LSTM) models for Time Series Forecasting in Python.

Specifically, we have learned the followings:

  • How to build Deep Learning (LSTM) models for Time Series Forecasting in Python.
In [ ]: