Data Visualisation for Beginners : How to create a scatter plot in R

Data Analyst’s Recipe | How to create a scatter plot in R

Creating a scatter plot is a powerful way to visualize relationships between two continuous variables. In R, scatter plots can be easily created using the ggplot2 package. In this tutorial, we will walk through the steps to create a scatter plot in R using the ggplot2 package.

1. Loading the data

First, we need to load the data that we want to use for our scatter plot. For this tutorial, we will be using the iris dataset which is included in the datasets package in R.

# Load the iris dataset
data(iris)

2. Creating a scatter plot using ggplot2

Next, we will create a scatter plot using the ggplot2 package. We will use the ggplot() function to create the basic plot object and then add layers to customize the plot.

# Load the ggplot2 package
library(ggplot2)

# Create a basic scatter plot
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point()

In the above code, we first loaded the ggplot2 package using the library() function. Then, we created a basic scatter plot using the ggplot() function and specified the iris dataset as the data source. We used the aes() function to specify the variables to be plotted on the x and y axes. Finally, we added a layer to the plot using the geom_point() function to create the scatter plot itself.

3. Customizing the scatter plot

Now that we have created a basic scatter plot, we can customize it to make it more visually appealing and informative. Here are a few examples:

Changing the color of the points based on a third variable

# Create a scatter plot with points colored by Species
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + 
  geom_point()

In the above code, we added a new argument to the aes() function to specify that the points should be colored by the Species variable. This creates a scatter plot where each species is represented by a different color.

Adding a title and axis labels

# Create a scatter plot with a title and axis labels
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + 
  geom_point() + 
  labs(title = "Sepal Length vs. Sepal Width", x = "Sepal Length", y = "Sepal Width")

In the above code, we added a new layer to the plot using the labs() function to specify the title and axis labels.

Changing the point shape

# Create a scatter plot with different point shapes based on Species
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, shape = Species)) + 
  geom_point() + 
  labs(title = "Sepal Length vs. Sepal Width", x = "Sepal Length", y = "Sepal Width")

In the above code, we added a new argument to the aes() function to specify that the point shapes should be based on the Species variable. This creates a scatter plot where each species is represented by a different point shape.

Another example:

Here, as an example, we will use the mtcars dataset from the datasets package in R.

# Load the mtcars dataset
data(mtcars)

# Create a scatter plot of mpg vs. wt
ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
  geom_point() +
  labs(title = "MPG vs. Weight", x = "Weight", y = "Miles Per Gallon")

In the above code, we created a scatter plot of mpg (miles per gallon) vs. wt (weight) using the mtcars dataset. We added a title and axis labels using the labs() function.

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!