Applied Data Science Notebook in Python for Beginners to Professionals¶

Data Science Project – A Guide to use weight regularization for Time Series Forecasting with LSTM Networks in Python¶

Machine Learning for Beginners - A Guide to use weight regularization for Time Series Forecasting with LSTM Networks in Python¶

For more projects visit: https://setscholars.net

There are 5000+ free end-to-end applied machine learning and data science projects available to download at SETSscholar. SETScholars is a Science, Engineering and Technology Scholars community.

# Suppress warnings in Jupyter Notebooks
import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

In this notebook, we will learn how to use weight regularization for Time Series Forecasting with LSTM Networks in Python.

Python Codes¶

Load the dataset¶

from pandas import read_csv
from matplotlib import pyplot

series = read_csv('shampoo.csv', header=0, index_col=0)

print(series.head())

series.plot(figsize = (15,8))
pyplot.show()

       Sales
Month       
1-01   266.0
1-02   145.9
1-03   183.1
1-04   119.3
1-05   180.3

Weight Regularization for Time Series Forecasting with LSTM Networks in Python¶

A Baseline LSTM Model¶

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.regularizers import L1L2
from math import sqrt
import matplotlib
import numpy

# date-time parsing function for loading the dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')

# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    return df

# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return Series(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

# scale train and test data to [-1, 1]
def scale(train, test):
    # fit scaler
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    # transform train
    train = train.reshape(train.shape[0], train.shape[1])
    train_scaled = scaler.transform(train)
    # transform test
    test = test.reshape(test.shape[0], test.shape[1])
    test_scaled = scaler.transform(test)
    return scaler, train_scaled, test_scaled

# inverse scaling for a forecasted value
def invert_scale(scaler, X, yhat):
    new_row = [x for x in X] + [yhat]
    array = numpy.array(new_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]

# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    
    model = Sequential()
    model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
        model.reset_states()
    return model

# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons):
    # transform data to be stationary
    raw_values = series.values
    diff_values = difference(raw_values, 1)
    # transform data to be supervised learning
    supervised = timeseries_to_supervised(diff_values, n_lag)
    supervised_values = supervised.values[n_lag:,:]
    # split data into train and test-sets
    train, test = supervised_values[0:-12], supervised_values[-12:]
    # transform the scale of the data
    scaler, train_scaled, test_scaled = scale(train, test)
    # run experiment
    error_scores = list()
    for r in range(n_repeats):
        # fit the model
        train_trimmed = train_scaled[2:, :]
        lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons)
        # forecast test dataset
        test_reshaped = test_scaled[:,0:-1]
        test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
        output = lstm_model.predict(test_reshaped, batch_size=n_batch)
        predictions = list()
        for i in range(len(output)):
            yhat = output[i,0]
            X = test_scaled[i, 0:-1]
            # invert scaling
            yhat = invert_scale(scaler, X, yhat)
            # invert differencing
            yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
            # store forecast
            predictions.append(yhat)
        # report performance
        rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
        print('%d) RMSE val: %.3f' % (r+1, rmse))
        error_scores.append(rmse)
    return error_scores

# configure the experiment
def run():
    # load dataset
    series = read_csv('shampoo.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
    # configure the experiment
    n_lag = 1
    n_repeats = 10
    n_epochs = 1000
    n_batch = 4
    n_neurons = 3
    # run the experiment
    results = DataFrame()
    results['results'] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons)
    # summarize results
    print(results.describe())
    # save boxplot
    results.boxplot(figsize=(14,10))
    plt.show()

# ---------------------------
# Run the experiments
# ---------------------------
run()

1) RMSE val: 99.410
2) RMSE val: 89.378
3) RMSE val: 101.760
4) RMSE val: 91.143
5) RMSE val: 93.206
6) RMSE val: 107.911
7) RMSE val: 101.214
8) RMSE val: 108.116
9) RMSE val: 104.330
10) RMSE val: 107.179
          results
count   10.000000
mean   100.364827
std      6.990164
min     89.378131
25%     94.757305
50%    101.487400
75%    106.466608
max    108.116262

Bias Weight Regularization¶

Weight regularization can be applied to the bias connection within the LSTM nodes.

In Keras, this is specified with a bias_regularizer argument when creating an LSTM layer. The regularizer is defined as an instance of the one of the L1, L2, or L1L2 classes.

In this experiment, we will compare L1, L2, and L1L2 with a default value of 0.01 against the baseline model. All configurations using the L1L2 class can be specified, as follows:

L1L2(0.0, 0.0) [e.g. baseline]
L1L2(0.01, 0.0) [e.g. L1]
L1L2(0.0, 0.01) [e.g. L2]
L1L2(0.01, 0.01) [e.g. L1L2 or elasticnet]

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.regularizers import L1L2
from math import sqrt
import matplotlib
import numpy

# date-time parsing function for loading the dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')

# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    return df

# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return Series(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

# scale train and test data to [-1, 1]
def scale(train, test):
    # fit scaler
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    # transform train
    train = train.reshape(train.shape[0], train.shape[1])
    train_scaled = scaler.transform(train)
    # transform test
    test = test.reshape(test.shape[0], test.shape[1])
    test_scaled = scaler.transform(test)
    return scaler, train_scaled, test_scaled

# inverse scaling for a forecasted value
def invert_scale(scaler, X, yhat):
    new_row = [x for x in X] + [yhat]
    array = numpy.array(new_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]

# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons, reg):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    
    model = Sequential()
    model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True, bias_regularizer=reg))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
        model.reset_states()
    return model

# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, reg):
    
    # transform data to be stationary
    raw_values = series.values
    diff_values = difference(raw_values, 1)
    
    # transform data to be supervised learning
    supervised = timeseries_to_supervised(diff_values, n_lag)
    supervised_values = supervised.values[n_lag:,:]
    # split data into train and test-sets
    train, test = supervised_values[0:-12], supervised_values[-12:]
    # transform the scale of the data
    scaler, train_scaled, test_scaled = scale(train, test)
    # run experiment
    error_scores = list()
    for r in range(n_repeats):
        # fit the model
        train_trimmed = train_scaled[2:, :]
        lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons, reg)
        # forecast test dataset
        test_reshaped = test_scaled[:,0:-1]
        test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
        output = lstm_model.predict(test_reshaped, batch_size=n_batch)
        predictions = list()
        for i in range(len(output)):
            yhat = output[i,0]
            X = test_scaled[i, 0:-1]
            # invert scaling
            yhat = invert_scale(scaler, X, yhat)
            # invert differencing
            yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
            # store forecast
            predictions.append(yhat)
        # report performance
        rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
        print('%d) RMSE val: %.3f' % (r+1, rmse))
        error_scores.append(rmse)
    return error_scores

# configure the experiment
def run():
    
    # load dataset
    series = read_csv('shampoo.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
    
    # configure the experiment
    n_lag = 1
    n_repeats = 10
    n_epochs = 1000
    n_batch = 4
    n_neurons = 3
    regularizers = [ L1L2(l1=0.0, l2=0.0), 
                     L1L2(l1=0.01, l2=0.0), 
                     L1L2(l1=0.0, l2=0.01), 
                     L1L2(l1=0.01, l2=0.01)
                   ]
    
    # run the experiment
    results = DataFrame()
    for reg in regularizers:
        name = ('l1 %.2f,l2 %.2f' % (reg.l1, reg.l2))
        
        print("Regularizers used in the run: "); print(name)
        results[name] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, reg)
    
    # summarize results
    print(results.describe())
    
    # save boxplot
    results.boxplot(figsize=(14,10))
    plt.show()

# ---------------------------
# Run the experiments
# ---------------------------
run()

Regularizers used in the run: 
l1 0.00,l2 0.00
1) RMSE val: 113.721
2) RMSE val: 98.775
3) RMSE val: 97.609
4) RMSE val: 113.385
5) RMSE val: 97.823
6) RMSE val: 89.244
7) RMSE val: 96.995
8) RMSE val: 96.312
9) RMSE val: 106.991
10) RMSE val: 97.559
Regularizers used in the run: 
l1 0.01,l2 0.00
1) RMSE val: 105.032
2) RMSE val: 105.824
3) RMSE val: 99.492
4) RMSE val: 109.527
5) RMSE val: 102.398
6) RMSE val: 101.815
7) RMSE val: 98.012
8) RMSE val: 101.675
9) RMSE val: 104.529
10) RMSE val: 102.031
Regularizers used in the run: 
l1 0.00,l2 0.01
1) RMSE val: 102.629
2) RMSE val: 105.578
3) RMSE val: 103.251
4) RMSE val: 101.373
5) RMSE val: 105.116
6) RMSE val: 98.596
7) RMSE val: 105.049
8) RMSE val: 94.546
9) RMSE val: 99.949
10) RMSE val: 107.708
Regularizers used in the run: 
l1 0.01,l2 0.01
1) RMSE val: 93.686
2) RMSE val: 99.119
3) RMSE val: 101.643
4) RMSE val: 104.990
5) RMSE val: 98.187
6) RMSE val: 104.990
7) RMSE val: 99.503
8) RMSE val: 99.632
9) RMSE val: 122.370
10) RMSE val: 93.148
       l1 0.00,l2 0.00  l1 0.01,l2 0.00  l1 0.00,l2 0.01  l1 0.01,l2 0.01
count        10.000000        10.000000        10.000000        10.000000
mean        100.841511       103.033606       102.379615       101.726658
std           7.926447         3.311153         3.896657         8.262018
min          89.243788        98.012214        94.546328        93.147983
25%          97.135967       101.710162       100.304814        98.419722
50%          97.716267       102.214801       102.940360        99.567306
75%         104.937480       104.906012       105.099260       104.153009
max         113.721155       109.526658       107.707873       122.369514

Input Weight Regularization¶

We can also apply regularization to input connections on each LSTM unit.

In Keras, this is achieved by setting the kernel_regularizer argument to a regularizer class.

We will test the same regularizer configurations as were used in the previous section, specifically:

L1L2(0.0, 0.0) [e.g. baseline]
L1L2(0.01, 0.0) [e.g. L1]
L1L2(0.0, 0.01) [e.g. L2]
L1L2(0.01, 0.01) [e.g. L1L2 or elasticnet]

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.regularizers import L1L2
from math import sqrt
import matplotlib
import numpy

# date-time parsing function for loading the dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')

# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    return df

# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return Series(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

# scale train and test data to [-1, 1]
def scale(train, test):
    # fit scaler
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    # transform train
    train = train.reshape(train.shape[0], train.shape[1])
    train_scaled = scaler.transform(train)
    # transform test
    test = test.reshape(test.shape[0], test.shape[1])
    test_scaled = scaler.transform(test)
    return scaler, train_scaled, test_scaled

# inverse scaling for a forecasted value
def invert_scale(scaler, X, yhat):
    new_row = [x for x in X] + [yhat]
    array = numpy.array(new_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]

# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons, reg):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    
    model = Sequential()
    model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True, kernel_regularizer=reg))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
        model.reset_states()
    
    return model

# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, reg):
    # transform data to be stationary
    raw_values = series.values
    diff_values = difference(raw_values, 1)
    # transform data to be supervised learning
    supervised = timeseries_to_supervised(diff_values, n_lag)
    supervised_values = supervised.values[n_lag:,:]
    # split data into train and test-sets
    train, test = supervised_values[0:-12], supervised_values[-12:]
    # transform the scale of the data
    scaler, train_scaled, test_scaled = scale(train, test)
    # run experiment
    error_scores = list()
    for r in range(n_repeats):
        # fit the model
        train_trimmed = train_scaled[2:, :]
        lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons, reg)
        # forecast test dataset
        test_reshaped = test_scaled[:,0:-1]
        test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
        output = lstm_model.predict(test_reshaped, batch_size=n_batch)
        predictions = list()
        for i in range(len(output)):
            yhat = output[i,0]
            X = test_scaled[i, 0:-1]
            # invert scaling
            yhat = invert_scale(scaler, X, yhat)
            # invert differencing
            yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
            # store forecast
            predictions.append(yhat)
        # report performance
        rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
        print('%d) RMSE val: %.3f' % (r+1, rmse))
        error_scores.append(rmse)
    return error_scores

# configure the experiment
def run():
    # load dataset
    series = read_csv('shampoo.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
    # configure the experiment
    n_lag = 1
    n_repeats = 10
    n_epochs = 1000
    n_batch = 4
    n_neurons = 3
    regularizers = [ L1L2(l1=0.0, l2=0.0), 
                     L1L2(l1=0.01, l2=0.0), 
                     L1L2(l1=0.0, l2=0.01), 
                     L1L2(l1=0.01, l2=0.01)
                   ]
    # run the experiment
    results = DataFrame()
    
    for reg in regularizers:
        name = ('l1 %.2f,l2 %.2f' % (reg.l1, reg.l2))
        
        print("Regularizers used in the run: "); print(name)
        results[name] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, reg)
    
    # summarize results
    print(results.describe())
    
    # save boxplot
    results.boxplot(figsize=(14,10))
    plt.show()

# ---------------------------
# Run the experiments
# ---------------------------
run()

Regularizers used in the run: 
l1 0.00,l2 0.00
1) RMSE val: 100.547
2) RMSE val: 96.594
3) RMSE val: 106.752
4) RMSE val: 96.934
5) RMSE val: 104.360
6) RMSE val: 101.920
7) RMSE val: 105.481
8) RMSE val: 106.521
9) RMSE val: 97.801
10) RMSE val: 96.115
Regularizers used in the run: 
l1 0.01,l2 0.00
1) RMSE val: 111.856
2) RMSE val: 109.974
3) RMSE val: 111.684
4) RMSE val: 114.306
5) RMSE val: 98.431
6) RMSE val: 112.468
7) RMSE val: 106.749
8) RMSE val: 115.526
9) RMSE val: 115.930
10) RMSE val: 115.901
Regularizers used in the run: 
l1 0.00,l2 0.01
1) RMSE val: 109.793
2) RMSE val: 115.098
3) RMSE val: 105.789
4) RMSE val: 110.877
5) RMSE val: 105.344
6) RMSE val: 121.167
7) RMSE val: 123.624
8) RMSE val: 103.270
9) RMSE val: 99.540
10) RMSE val: 118.681
Regularizers used in the run: 
l1 0.01,l2 0.01
1) RMSE val: 115.250
2) RMSE val: 113.963
3) RMSE val: 123.822
4) RMSE val: 117.805
5) RMSE val: 119.990
6) RMSE val: 126.099
7) RMSE val: 125.902
8) RMSE val: 113.027
9) RMSE val: 120.484
10) RMSE val: 112.914
       l1 0.00,l2 0.00  l1 0.01,l2 0.00  l1 0.00,l2 0.01  l1 0.01,l2 0.01
count        10.000000        10.000000        10.000000        10.000000
mean        101.302495       111.282466       111.318290       118.925652
std           4.283099         5.368617         8.091202         5.142308
min          96.115106        98.431217        99.540014       112.914493
25%          97.150795       110.401260       105.455209       114.284718
50%         101.233263       112.162069       110.334868       118.897517
75%         105.201111       115.220699       117.784998       122.987675
max         106.751542       115.929888       123.624288       126.099069

Recurrent Weight Regularization¶

We can apply regularization to recurrent connections on each LSTM unit as well.

In Keras, this is achieved by setting the recurrent_regularizer argument to a regularizer class.

We will test the same regularizer configurations as were used in the previous section, specifically:

L1L2(0.0, 0.0) [e.g. baseline]
L1L2(0.01, 0.0) [e.g. L1]
L1L2(0.0, 0.01) [e.g. L2]
L1L2(0.01, 0.01) [e.g. L1L2 or elasticnet]

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.regularizers import L1L2
from math import sqrt
import matplotlib
import numpy

# date-time parsing function for loading the dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')

# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    return df

# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return Series(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

# scale train and test data to [-1, 1]
def scale(train, test):
    # fit scaler
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    # transform train
    train = train.reshape(train.shape[0], train.shape[1])
    train_scaled = scaler.transform(train)
    # transform test
    test = test.reshape(test.shape[0], test.shape[1])
    test_scaled = scaler.transform(test)
    return scaler, train_scaled, test_scaled

# inverse scaling for a forecasted value
def invert_scale(scaler, X, yhat):
    new_row = [x for x in X] + [yhat]
    array = numpy.array(new_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]

# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons, reg):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    
    model = Sequential()
    model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True, recurrent_regularizer=reg))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
        model.reset_states()
    return model

# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, reg):
    # transform data to be stationary
    raw_values = series.values
    diff_values = difference(raw_values, 1)
    # transform data to be supervised learning
    supervised = timeseries_to_supervised(diff_values, n_lag)
    supervised_values = supervised.values[n_lag:,:]
    # split data into train and test-sets
    train, test = supervised_values[0:-12], supervised_values[-12:]
    # transform the scale of the data
    scaler, train_scaled, test_scaled = scale(train, test)
    # run experiment
    error_scores = list()
    for r in range(n_repeats):
        # fit the model
        train_trimmed = train_scaled[2:, :]
        lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons, reg)
        # forecast test dataset
        test_reshaped = test_scaled[:,0:-1]
        test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
        output = lstm_model.predict(test_reshaped, batch_size=n_batch)
        predictions = list()
        for i in range(len(output)):
            yhat = output[i,0]
            X = test_scaled[i, 0:-1]
            # invert scaling
            yhat = invert_scale(scaler, X, yhat)
            # invert differencing
            yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
            # store forecast
            predictions.append(yhat)
        # report performance
        rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
        print('%d) Test RMSE: %.3f' % (r+1, rmse))
        error_scores.append(rmse)
    return error_scores

# configure the experiment
def run():
    
    # load dataset
    series = read_csv('shampoo.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
    
    # configure the experiment
    n_lag = 1
    n_repeats = 10
    n_epochs = 1000
    n_batch = 4
    n_neurons = 3
    regularizers = [L1L2(l1=0.0, l2=0.0), L1L2(l1=0.01, l2=0.0), L1L2(l1=0.0, l2=0.01), L1L2(l1=0.01, l2=0.01)]
    
    # run the experiment
    results = DataFrame()
    for reg in regularizers:
        name = ('l1 %.2f,l2 %.2f' % (reg.l1, reg.l2))
        
        print("Regularizers used in the run: "); print(name)
        results[name] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, reg)
    # summarize results
    print(results.describe())
    
    # save boxplot
    results.boxplot(figsize=(14,10))
    plt.show()

# ---------------------------
# Run the experiments
# ---------------------------
run()

Regularizers used in the run: 
l1 0.00,l2 0.00
1) Test RMSE: 99.371
2) Test RMSE: 89.924
3) Test RMSE: 112.027
4) Test RMSE: 94.876
5) Test RMSE: 102.328
6) Test RMSE: 89.908
7) Test RMSE: 94.745
8) Test RMSE: 111.192
9) Test RMSE: 96.867
10) Test RMSE: 104.519
Regularizers used in the run: 
l1 0.01,l2 0.00
1) Test RMSE: 96.270
2) Test RMSE: 96.126
3) Test RMSE: 98.450
4) Test RMSE: 98.470
5) Test RMSE: 104.504
6) Test RMSE: 109.942
7) Test RMSE: 99.782
8) Test RMSE: 99.368
9) Test RMSE: 102.928
10) Test RMSE: 94.652
Regularizers used in the run: 
l1 0.00,l2 0.01
1) Test RMSE: 96.609
2) Test RMSE: 100.758
3) Test RMSE: 92.968
4) Test RMSE: 100.248
5) Test RMSE: 96.126
6) Test RMSE: 97.325
7) Test RMSE: 102.187
8) Test RMSE: 97.015
9) Test RMSE: 101.580
10) Test RMSE: 101.561
Regularizers used in the run: 
l1 0.01,l2 0.01
1) Test RMSE: 97.661
2) Test RMSE: 100.628
3) Test RMSE: 93.242
4) Test RMSE: 96.248
5) Test RMSE: 98.229
6) Test RMSE: 95.977
7) Test RMSE: 100.053
8) Test RMSE: 100.690
9) Test RMSE: 93.786
10) Test RMSE: 102.404
       l1 0.00,l2 0.00  l1 0.01,l2 0.00  l1 0.00,l2 0.01  l1 0.01,l2 0.01
count        10.000000        10.000000        10.000000        10.000000
mean         99.575717       100.049083        98.637678        97.891861
std           7.898025         4.599360         3.051612         3.079836
min          89.908292        94.651752        92.968235        93.241839
25%          94.777980        96.814860        96.710676        96.044759
50%          98.118776        98.918966        98.786464        97.944854
75%         103.971530       102.141389       101.360091       100.484410
max         112.026554       109.941504       102.186714       102.404434

Summary¶

In this coding recipe, we discussed how to use use weight regularization for Time Series Forecasting with LSTM Networks in Python.

Specifically, we have learned the followings:

How to use use weight regularization for Time Series Forecasting with LSTM Networks in Python.