Data Wrangling in Python – How to Group A Time Series With pandas

Hits: 19

Group A Time Series With pandas

Import required modules


import pandas as pd
import numpy as np

Create a dataframe

df = pd.DataFrame()

df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
df.index = pd.date_range('1/1/2014', periods=100, freq='H')

df.head()
german_army allied_army
2014-01-01 00:00:00 21413 37604
2014-01-01 01:00:00 25913 21144
2014-01-01 02:00:00 22418 34201
2014-01-01 03:00:00 20704 37313
2014-01-01 04:00:00 27859 24467

Truncate the dataframe

df.truncate(before='1/2/2014', after='1/3/2014')
german_army allied_army
2014-01-02 00:00:00 28783 22407
2014-01-02 01:00:00 27530 23106
2014-01-02 02:00:00 27351 36703
2014-01-02 03:00:00 28062 39590
2014-01-02 04:00:00 27691 35282
2014-01-02 05:00:00 22498 22229
2014-01-02 06:00:00 26470 34484
2014-01-02 07:00:00 22123 38835
2014-01-02 08:00:00 20056 30520
2014-01-02 09:00:00 22560 28191
2014-01-02 10:00:00 20335 26722
2014-01-02 11:00:00 28207 28571
2014-01-02 12:00:00 21135 31793
2014-01-02 13:00:00 25570 36780
2014-01-02 14:00:00 26743 39214
2014-01-02 15:00:00 28496 35278
2014-01-02 16:00:00 26156 23902
2014-01-02 17:00:00 21795 39038
2014-01-02 18:00:00 25840 34204
2014-01-02 19:00:00 22582 26021
2014-01-02 20:00:00 26145 21035
2014-01-02 21:00:00 25084 21895
2014-01-02 22:00:00 29835 27199
2014-01-02 23:00:00 29916 26703
2014-01-03 00:00:00 23252 23883

Set the dataframe’s index

df.index = df.index + pd.DateOffset(months=4, days=5)

View the dataframe

df.head()
german_army allied_army
2014-05-06 00:00:00 21413 37604
2014-05-06 01:00:00 25913 21144
2014-05-06 02:00:00 22418 34201
2014-05-06 03:00:00 20704 37313
2014-05-06 04:00:00 27859 24467

Lead a variable 1 hour

df.shift(1).head()
german_army allied_army
2014-05-06 00:00:00 NaN NaN
2014-05-06 01:00:00 21413.0 37604.0
2014-05-06 02:00:00 25913.0 21144.0
2014-05-06 03:00:00 22418.0 34201.0
2014-05-06 04:00:00 20704.0 37313.0

Lag a variable 1 hour

df.shift(-1).tail()
german_army allied_army
2014-05-09 23:00:00 26615.0 35368.0
2014-05-10 00:00:00 20410.0 21218.0
2014-05-10 01:00:00 24404.0 29038.0
2014-05-10 02:00:00 21190.0 31730.0
2014-05-10 03:00:00 NaN NaN

Aggregate into days by summing up the value of each hourly observation

df.resample('D').sum()
german_army allied_army
2014-05-06 596133 715399
2014-05-07 610963 729702
2014-05-08 604796 717520
2014-05-09 618359 701690
2014-05-10 92619 117354

Aggregate into days by averaging up the value of each hourly observation

df.resample('D').mean()
german_army allied_army
2014-05-06 24838.875000 29808.291667
2014-05-07 25456.791667 30404.250000
2014-05-08 25199.833333 29896.666667
2014-05-09 25764.958333 29237.083333
2014-05-10 23154.750000 29338.500000

Aggregate into days by taking the min value up the value of each hourly observation

df.resample('D').min()
german_army allied_army
2014-05-06 20331 21144
2014-05-07 20056 21035
2014-05-08 20475 21209
2014-05-09 21071 20475
2014-05-10 20410 21218

Aggregate into days by taking the median value of each day’s worth of hourly observation

df.resample('D').median()
german_army allied_army
2014-05-06 24747.5 28327.5
2014-05-07 26150.5 29545.5
2014-05-08 25035.5 29677.0
2014-05-09 25694.5 28969.0
2014-05-10 22797.0 30384.0

Aggregate into days by taking the first value of each day’s worth of hourly observation

df.resample('D').first()
german_army allied_army
2014-05-06 21413 37604
2014-05-07 28783 22407
2014-05-08 23252 23883
2014-05-09 25644 24035
2014-05-10 26615 35368

Aggregate into days by taking the last value of each day’s worth of hourly observation

df.resample('D').last()
german_army allied_army
2014-05-06 22406 38633
2014-05-07 29916 26703
2014-05-08 24882 29425
2014-05-09 28307 36548
2014-05-10 21190 31730

Aggregate into days by taking the first, last, highest, and lowest value of each day’s worth of hourly observation

df.resample('D').ohlc()
german_army allied_army
open high low close open high low close
2014-05-06 21413 29377 20331 22406 37604 39906 21144 38633
2014-05-07 28783 29916 20056 29916 22407 39590 21035 26703
2014-05-08 23252 29591 20475 24882 23883 37465 21209 29425
2014-05-09 25644 29924 21071 28307 24035 39892 20475 36548
2014-05-10 26615 26615 20410 21190 35368 35368 21218 31730

 

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  

Google –> SETScholars