Data Analyst’s recipe | How to create a scatter plot in Python

Creating a scatter plot in Python is a straightforward process that involves using a plotting library such as Matplotlib. In this recipe, we will go through the steps of creating a scatter plot in Python.
Step 1: Install Matplotlib
Matplotlib is a popular data visualization library in Python. To install it, open your command prompt or terminal and run the following command:
pip install matplotlib
Step 2: Import Matplotlib and Numpy
To use Matplotlib for data visualisation, you need to import it into your Python code. Additionally, we’ll also import the NumPy library to create some sample data to plot:
import matplotlib.pyplot as plt
import numpy as np
Step 3: Load your dataset
In this example, we will use the “Iris” dataset from the UCI Machine Learning Repository. We can load the dataset using the pandas library as follows:
import pandas as pd
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
Step 4: Create Scatter Plot
To create a scatter plot using Matplotlib, we need to provide x and y coordinates for each point we want to plot. In this example, we will plot the sepal length and sepal width for the first 50 rows of the dataset.
# Extract the first 50 rows of data
X = df.iloc[0:50, [0]].values
Y = df.iloc[0:50, [1]].values
# Create a scatter plot
plt.scatter(X, Y)
# Set the title and labels for the axes
plt.title('Scatter Plot of Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
# Show the plot
plt.show()
This will create a scatter plot of sepal length vs sepal width for the first 50 rows of the Iris dataset.

Here is another example,
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', delimiter=';')
# Extract the data we want to plot
X = df['alcohol']
Y = df['volatile acidity']
# Create a scatter plot
plt.scatter(X, Y)
# Set the title and labels for the axes
plt.title('Scatter Plot of Alcohol Content vs Volatile Acidity')
plt.xlabel('Alcohol Content')
plt.ylabel('Volatile Acidity')
# Show the plot
plt.show()

Add a trendline to the scatter plot:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Load the dataset
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', delimiter=';')
# Extract the data we want to plot
X = df['alcohol']
Y = df['volatile acidity']
# Fit a linear regression line to the data
slope, intercept = np.polyfit(X, Y, 1)
x = np.linspace(8, 15, 50)
y = slope * x + intercept
# Create a scatter plot
plt.scatter(X, Y)
# Add the trendline
plt.plot(x, y, color='red')
# Set the title and labels for the axes
plt.title('Scatter Plot of Alcohol Content vs Volatile Acidity')
plt.xlabel('Alcohol Content')
plt.ylabel('Volatile Acidity')
# Show the plot
plt.show()

Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.