Applied Data Science Notebook in Python for Beginners to Professionals

Data Science Project – A Guide to make baseline predictions for Time Series Forecasting with Python

Machine Learning for Beginners - A Guide to make baseline predictions for Time Series Forecasting with Python

For more projects visit: https://setscholars.net

  • There are 5000+ free end-to-end applied machine learning and data science projects available to download at SETSscholar. SETScholars is a Science, Engineering and Technology Scholars community.
In [1]:
# Suppress warnings in Jupyter Notebooks
import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

In this notebook, we will learn how to make baseline predictions for Time Series Forecasting with Python.

Python Codes

Load the dataset

In [2]:
from pandas import read_csv
from matplotlib import pyplot

series = read_csv('shampoo.csv', header=0, index_col=0)

print(series.head())

series.plot(figsize = (15,8))
pyplot.show()
       Sales
Month       
1-01   266.0
1-02   145.9
1-03   183.1
1-04   119.3
1-05   180.3

Make baseline predictions using Time Series Data in Python

In [3]:
from pandas import read_csv
from pandas import datetime
from pandas import DataFrame
from pandas import concat
from matplotlib import pyplot
from sklearn.metrics import mean_squared_error

def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)

# Create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t-1', 't+1']

print()
print(dataframe.head(5))

# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]

# persistence model
def model_persistence(x):
    return x

# walk-forward validation
predictions = list()
for x in test_X:
    yhat = model_persistence(x)
    predictions.append(yhat)
    
test_score = mean_squared_error(test_y, predictions)
print('Test MSE: %.3f' % test_score)

# plot predictions and expected results
pyplot.figure(figsize=(16,10))
pyplot.plot(train_y)
pyplot.plot([None for i in train_y] + [x for x in test_y])
pyplot.plot([None for i in train_y] + [x for x in predictions])
pyplot.show()
     t-1    t+1
0    NaN  266.0
1  266.0  145.9
2  145.9  183.1
3  183.1  119.3
4  119.3  180.3
Test MSE: 17730.518
In [ ]:
 

Summary

In this coding recipe, we discussed how to make baseline predictions using Time Series Data in Python.

Specifically, we have learned the followings:

  • The importance of establishing a baseline and the persistence algorithm that you can use.
  • How to implement the persistence algorithm in Python from scratch.
  • How to evaluate the forecasts of the persistence algorithm and use them as a baseline.
In [ ]: