Applied Data Science Coding in Python: How to normalise data

Applied Data Science Coding in Python: How to normalise data

Normalizing data is a way of scaling the data so that it falls within a specific range. The most commonly used range is between 0 and 1. Normalizing data is useful in situations where the scale of the data can affect the performance of a machine learning algorithm. The process of normalizing involves transforming the data so that the minimum value is 0 and the maximum value is 1. This can be done by subtracting the minimum value from each data point and then dividing by the range (maximum value – minimum value).

Normalizing can be done using different techniques such as Min-Max normalization, L1 normalization and L2 normalization. Min-Max normalization is the most common method used in normalizing data. To normalize data in python using scikit-learn, you can use the MinMaxScaler class from the sklearn.preprocessing module. This class has a fit_transform() method that can be used to normalize the data. It takes the input data as an argument and returns the normalized data.

It’s important to note that normalization should be done after handling missing values and before splitting the dataset into train and test sets.


In this Applied Machine Learning & Data Science Recipe, the reader will learn: How to standardize Data.

Essential Gigs