Applied Data Science Coding in Python: How to get descriptive statistics of Dataset

Applied Data Science Coding in Python: How to get descriptive statistics of Dataset

Descriptive statistics is a branch of statistics that deals with summarizing and describing a dataset. Descriptive statistics helps to understand the characteristics of the data, such as its central tendency, spread, shape, and so on.

There are several ways to get the descriptive statistics of a dataset in Python:

Using the describe() function in the pandas library: The describe() function can be used to calculate various summary statistics of a dataset. It takes a Pandas DataFrame or Series as an input, and returns the count, mean, standard deviation, minimum, and maximum values, as well as the 25th, 50th (median), and 75th percentiles.

Using the mean(), std(), min(), max(), median(), and other functions in the numpy library: The mean(), std(), min(), max(), median(), and other functions can be used to calculate specific summary statistics of a dataset. They take a Pandas DataFrame or Series as an input, and return the corresponding summary statistic.

Using the scipy library: The scipy.stats module provides a wide range of statistical functions, including scipy.stats.describe() function which provides a detailed summary of statistics of a dataset.

In summary, you can use the describe() function from pandas library, mean(), std(), min(), max(), median() functions from numpy library or scipy.stats.describe() function from scipy library to get the descriptive statistics of a dataset in Python. Descriptive statistics can provide a good overview of the data, and also helps to identify outliers, missing values and other characteristics of the dataset.

 

In this Applied Machine Learning & Data Science Recipe, the reader will learn: How to get descriptive statistics of Dataset.



 

Essential Gigs