Data Wrangling in Python – Pandas Data Structures

Hits: 34

Pandas Data Structures

Import modules


import pandas as pd

Series 101

Series are one-dimensional arrays (like R’s vectors)

Create a series of the number of floodingReports


floodingReports = pd.Series([5, 6, 2, 9, 12])
floodingReports
0     5
1     6
2     2
3     9
4    12
dtype: int64

Note that the first column of numbers (0 to 4) are the index.

Set county names to be the index of the floodingReports series


floodingReports = pd.Series([5, 6, 2, 9, 12], index=['Cochise County', 'Pima County', 'Santa Cruz County', 'Maricopa County', 'Yuma County'])
floodingReports
Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

View the number of floodingReports in Cochise County

floodingReports['Cochise County']
5

View the counties with more than 6 flooding reports

floodingReports[floodingReports > 6]
Maricopa County     9
Yuma County        12
dtype: int64

Create a pandas series from a dictionary

Note: when you do this, the dict’s key’s will become the series’s index

/* Create a dictionary */
fireReports_dict = {'Cochise County': 12, 'Pima County': 342, 'Santa Cruz County': 13, 'Maricopa County': 42, 'Yuma County' : 52}

/* Convert the dictionary into a pd.Series, and view it */
fireReports = pd.Series(fireReports_dict); fireReports
Cochise County        12
Maricopa County       42
Pima County          342
Santa Cruz County     13
Yuma County           52
dtype: int64

Change the index of a series to shorter names

fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]
fireReports
Cochice        12
Pima           42
Santa Cruz    342
Maricopa       13
Yuma           52
dtype: int64

DataFrame 101

DataFrames are like R’s Dataframes

Create a dataframe from a dict of equal length lists or numpy arrays

data = {'county': ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'], 
        'year': [2012, 2012, 2013, 2014, 2014], 
        'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data)
df
county reports year
0 Cochice 4 2012
1 Pima 24 2012
2 Santa Cruz 31 2013
3 Maricopa 2 2014
4 Yuma 3 2014

Set the order of the columns using the columns attribute

dfColumnOrdered = pd.DataFrame(data, columns=['county', 'year', 'reports'])
dfColumnOrdered
county year reports
0 Cochice 2012 4
1 Pima 2012 24
2 Santa Cruz 2013 31
3 Maricopa 2014 2
4 Yuma 2014 3

Add a column

dfColumnOrdered['newsCoverage'] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2])
dfColumnOrdered
county year reports newsCoverage
0 Cochice 2012 4 42.3
1 Pima 2012 24 92.1
2 Santa Cruz 2013 31 12.2
3 Maricopa 2014 2 39.3
4 Yuma 2014 3 30.2

Delete a column

del dfColumnOrdered['newsCoverage']
dfColumnOrdered
county year reports
0 Cochice 2012 4
1 Pima 2012 24
2 Santa Cruz 2013 31
3 Maricopa 2014 2
4 Yuma 2014 3

Transpose the dataframe

dfColumnOrdered.T
0 1 2 3 4
county Cochice Pima Santa Cruz Maricopa Yuma
year 2012 2012 2013 2014 2014
reports 4 24 31 2 3

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  

Google –> SETScholars