Applied Data Science Coding in Python: scatter plots A scatter plot is a graphical representation of two-dimensional data, where each point on the plot represents a pair of (x,y) values. It is used to visualize the relationship between two continuous variables. Scatter plots can be used to identify patterns in the data, such as linear …

# Month: July 2019

Applied Data Science Coding in Python: histogram plots A histogram is a graphical representation of the distribution of a dataset. It is an estimate of the probability distribution of a continuous variable. In other words, it shows how often certain values appear in a dataset. The histogram groups the values into bins, and the height …

Applied Data Science Coding in Python: How to generate density plots Density plots, also known as probability density plots, are used to visualize the probability density function of a continuous random variable. It gives an idea of the distribution of the data and helps to identify patterns, such as skewness or outliers. In Python, there …

Applied Data Science Coding in Python: How to generate Correlation Matrix A correlation matrix is a table that shows the correlation coefficients between multiple variables. It is a useful tool for understanding the relationship between different variables in a dataset. Correlation coefficient can range from -1 to 1, indicating the strength and direction of the …

Applied Data Science Coding in Python: How to visualise data with Boxplot A boxplot, also known as a box-and-whisker plot, is a powerful tool for visualizing the distribution of a dataset. It is particularly useful for identifying outliers and understanding the spread and skewness of the data. In Python, the matplotlib library provides several functions …

Applied Data Science Coding in Python: How to get descriptive statistics of Dataset Descriptive statistics is a branch of statistics that deals with summarizing and describing a dataset. Descriptive statistics helps to understand the characteristics of the data, such as its central tendency, spread, shape, and so on. There are several ways to get the …

How to get SKEW statistics of Dataset Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. In other words, it measures how much the data is tilted or leaning towards one side of the distribution. A positive skew means that the tail on the right …

How to get dimension of Dataset When working with datasets in Python, it’s important to know the dimension of your dataset, which refers to the number of rows and columns in your data. Knowing the dimension of your dataset can help you better understand the data and make better decisions when working with it. There …

How to get data types of each feature in Data When working with data in Python, it’s important to know the data types of each feature (column) in your dataset. The data type of a feature determines how it should be handled and the type of operations that can be performed on it. There are …

How to get correlation coefficient Correlation coefficient is a measure of the strength of the relationship between two variables. It ranges from -1 to 1, where -1 represents a perfect negative correlation, 0 represents no correlation, and 1 represents a perfect positive correlation. In other words, it tells us how much two variables change together. …

Applied Data Science Coding: How to get class distribution in Data Class distribution refers to the number of instances or samples that belong to each class in a dataset. In machine learning, class distribution is an important aspect to consider, as it can affect the performance of a model. For example, if a dataset …

How to Load Data From url using Pandas Loading data from a url is a common task in data analysis and machine learning. To load data from a url using pandas, you can use pandas.read_html() function. This function allows you to read tables from html pages and returns a list of DataFrame objects, one …