Beginners Guide to R – R Scatter Plot – Base Graph

R Scatter Plot – Base Graph

A scatter plot is a graphical display of relationship between two sets of data.

typical scatter plot

They are good if you to want to visualize how two variables are correlated. That’s why they are also called correlation plot.

The plot() function

The basic plot() function is a generic function that can be used for a variety of different purposes. For the time being, however, you can use the plot() function to create scatter plots.

It has many options and arguments to control many things, such as the plot type, labels, titles and colors.

Syntax

The syntax for the plot() function is:

plot(x,y,type,main,xlab,ylab,pch,col,las,bty,bg,cex,)

Parameters

Parameter Description
x The coordinates of points in the plot
y The y coordinates of points in the plot
type The type of plot to be drawn
main An overall title for the plot
xlab The label for the x axis
ylab The label for the y axis
pch The shape of points
col The foreground color of symbols as well as lines
las The axes label style
bty The type of box round the plot area
bg The background color of symbols (only 21 through 25)
cex The amount of scaling plotting text and symbols
Other graphical parameters

Create a Scatter Plot

To get started with plot, you need a set of data to work with. Let’s consider the built-in iris flower data set as an example data set.

Here are the first six observations of the data set.

# First six observations of the 'Iris' data set
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Iris data set

Iris data set contains around 150 observations on three species of iris flower: setosa, versicolor and virginica. Every observation contains four measurements of flower’s Petal length, Petal width, Sepal length and Sepal width.

To create a scatter plot just specify any two variables of the data set in plot() function.

# Plot the ‘Iris’ data set
plot(iris$Petal.Length, iris$Petal.Width)

If you have your data contained in a data frame, you can use one of the following approaches to get at the variables; they all produce a similar result.

# $ syntax
plot(iris$Petal.Length, iris$Petal.Width)

# with() function
with(iris, plot(Petal.Length, Petal.Width))

# attach() function
attach(iris)
plot(Petal.Length, Petal.Width)
detach(iris)

# formula syntax
plot(Petal.Width ~ Petal.Length, data=iris)

The formula syntax requires your variables to be in an order y ~ x; which is opposite of the standard syntax plot(x, y).

Change the Shape and Size of the Points

You can use the pch (plotting character) argument to specify symbols to use when plotting points.

Here’s a list of symbols you can use.

With cex (character expansion) argument, you can change the size of the plotted characters.

# Change the shape of the points and scale them down by 0.6
plot(Petal.Width ~ Petal.Length, data=iris,
     pch=16,
     cex=0.6)

Changing the Color

You can change the foreground color of symbols using the col argument.

# Change the color of symbols to blue
plot(Petal.Width ~ Petal.Length, data=iris,
     pch=16,
     col="dodgerblue1")

R has a number of predefined colors that you can use in graphics. Use the colors() function to get a complete list of available names for colors.

# List of predefined colors in R
colors()
[1] "white"         "aliceblue"     "antiquewhite" 
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
...

Or you can refer the following color chart.

You can specify colors by index, name, hexadecimal, or RGB value. For example col=1col="white", and col="#FFFFFF" are equivalent.

Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following arguments.

Argument Description
main Main plot title
xlab x-axis label
ylab y-axis label
plot(Petal.Width ~ Petal.Length, data=iris,
     pch=16,
     col="dodgerblue1",
     main = "Iris Flower Data Set",
     xlab = "Petal Length (cm)",
     ylab = "Petal Width (cm)")

Creating a Scatter Plot of Multiple Groups

Plotting multiple groups in one scatter plot creates an uninformative mess. The graphic would be far more informative if you distinguish one group from another.

Following example uses the pch argument to plot each point with a different plotting character, according to the parallel factor “Species”.

# A scatter plot that shows the points in groups according to their "species"
plot(Petal.Width ~ Petal.Length, data=iris,
     col=c("brown1","dodgerblue1","limegreen")[as.integer(Species)],
     pch=c(1,2,3)[as.integer(Species)])

legend(x="topleft",
       legend=c("setosa","versicolor","virginica"),
       col=c("brown1","dodgerblue1","limegreen"),
       pch=c(1,2,3))

With the legend() function, you can include a legend to your plot, a little box that decodes the graphic for the viewer.

The position of the legend can be specified using the following keywords : “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.

Plotting the Regression Line

To add a regression line (line of Best-Fit) to the existing plot, you first need to estimate a linear regression model using the lm() function.

The result is an object of class lm. You can simply pass the lm object to abline() function to draw the regression line directly.

m <- lm(Petal.Width ~ Petal.Length, data=iris)
plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
abline(m, col="brown2")

Plotting the Lowess Line

The lowess() function performs the computations for locally weighted scatter plot smoothing (LOWESS).

Its result can be passed to the lines() function to add a lowess line to the existing plot.

plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
lines(lowess(iris$Petal.Length, iris$Petal.Width), col = "brown2")

Scatterplot Matrix

If your data set contains large number of variables, finding relation between them is difficult. In R, you can create scatter plots of all pairs of variables at once.

Following example plots all columns of iris data set, producing a matrix of scatter plots (pairs plot).

plot(iris,
     col=rgb(0,0,1,.15),
     pch=19)

By default, the plot() function takes all the columns in a data frame and creates a matrix of scatter plots. This becomes messy if you have many columns.

You can choose which columns you want to display by using the formula notation.

# Use formula notation to create customized pairs plots
plot(~ Petal.Length + Petal.Width + Sepal.Width,
     col=rgb(0,0,1,.15),
     pch=19,
     data=iris)

Coplots (conditioning scatter plots)

Often your dataset contains a mixture of both continuous and discrete variables. It can be quite tedious to find how a relationship between a pair of variables differs among groups.

Information of that nature can be gained using conditioning plots (or coplots).

Conditioning scatter plots contains multipanel display, where each panel contains a scatter plot for each group.

coplot(Petal.Length ~ Petal.Width | Species,
       data=iris,
       columns=3,
       bar.bg=c(fac="lightskyblue"),
       col="dodgerblue1")

3D scatter plots – scatterplot3D package

There are many packages in R (such as scatterplot3d, RGL, lattice, …) for creating 3D plots. The scaterplot3d package is simple and easy to use among all.

To create a 3D scatter plot, use scatterplot3d() function and pass in three variables representing the x, y, and z coordinates.

library(scatterplot3d)
attach(iris)
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length)

You can alter the appearance of your 3D scatterplot by using following parameters.

Parameter Description
type The type of item to plot
‘p’ for points,
‘l’ for lines,
‘h’ for line segments from z = 0,
color The color to be used for plotted items
pch Plotting symbol to use
angle Angle between x and y axis
xlab, ylab, zlab Labels for the coordinates
main, sub Title and subtitle
# Changing the appearance of the 3D scatterplot
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length,
              pch = 16,
              type="h",
              angle = 45,
              xlab = "Sepal length",
              ylab = "Sepal width",
              zlab = "Petal length",
              color = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])

legend("top",
       pch = 16,
       cex = 0.8,
       horiz = TRUE,
       legend = levels(iris$Species),
       col =  c("brown1","dodgerblue1","limegreen"))

3D scatter plots – rgl package

When it comes to 3D plots, it’s important to be able to view them from different angles.

The rgl package offers some simple functions to create 3D plots that you can rotate and zoom in/out. rgl utilizes OpenGL to render the graphics on your computer screen.

To create a 3D scatter plot, use plot3d() of rgl and pass in three variables representing the x, y, and z coordinates.

# Create a spinning 3D scatter plot
library(rgl)
attach(iris)
plot3d(Sepal.Length, Sepal.Width, Petal.Length)

You can rotate the plot by clicking and dragging with the mouse, and zoom in and out with the scroll wheel.

You can alter the appearance of your 3D scatterplot by using following parameters.

Parameter Description
type The type of item to plot
‘p’ for points,
‘s’ for spheres,
‘l’ for lines,
‘h’ for line segments from z = 0,~’n’ for nothing
col The color to be used for plotted items
size The size for plotted points
xlab, ylab, zlab Labels for the coordinates
main, sub Title and subtitle
# Changing the appearance of the 3D scatterplot
plot3d(Sepal.Length, Sepal.Width, Petal.Length,
       pch = 16,
       size = 1,
       type = "s",
       xlab = "Sepal length",
       ylab = "Sepal width",
       zlab = "Petal length",
       col = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])

legend3d("topright",
         col=c("brown1","dodgerblue1","limegreen"),
         legend=levels(Species),
         pch=16)

 

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.