# R Scatter Plot – Base Graph

A scatter plot is a graphical display of relationship between two sets of data.

They are good if you to want to visualize how two variables are correlated. That’s why they are also called correlation plot.

## The plot() function

The basic plot() function is a generic function that can be used for a variety of different purposes. For the time being, however, you can use the `plot()`

function to create scatter plots.

It has many options and arguments to control many things, such as the plot type, labels, titles and colors.

### Syntax

The syntax for the `plot()`

function is:

plot(x,y,type,main,xlab,ylab,pch,col,las,bty,bg,cex,…)

### Parameters

Parameter | Description |

x | The coordinates of points in the plot |

y | The y coordinates of points in the plot |

type | The type of plot to be drawn |

main | An overall title for the plot |

xlab | The label for the x axis |

ylab | The label for the y axis |

pch | The shape of points |

col | The foreground color of symbols as well as lines |

las | The axes label style |

bty | The type of box round the plot area |

bg | The background color of symbols (only 21 through 25) |

cex | The amount of scaling plotting text and symbols |

… | Other graphical parameters |

## Create a Scatter Plot

To get started with plot, you need a set of data to work with. Let’s consider the built-in iris flower data set as an example data set.

Here are the first six observations of the data set.

```
# First six observations of the 'Iris' data set
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

**Iris data set**

Iris data set contains around 150 observations on three species of iris flower: setosa, versicolor and virginica. Every observation contains four measurements of flower’s Petal length, Petal width, Sepal length and Sepal width.

To create a scatter plot just specify any two variables of the data set in `plot()`

function.

```
# Plot the ‘Iris’ data set
plot(iris$Petal.Length, iris$Petal.Width)
```

If you have your data contained in a data frame, you can use one of the following approaches to get at the variables; they all produce a similar result.

```
# $ syntax
plot(iris$Petal.Length, iris$Petal.Width)
# with() function
with(iris, plot(Petal.Length, Petal.Width))
# attach() function
attach(iris)
plot(Petal.Length, Petal.Width)
detach(iris)
# formula syntax
plot(Petal.Width ~ Petal.Length, data=iris)
```

The formula syntax requires your variables to be in an order `y ~ x`

; which is opposite of the standard syntax `plot(x, y)`

.

## Change the Shape and Size of the Points

You can use the pch (plotting character) argument to specify symbols to use when plotting points.

Here’s a list of symbols you can use.

With cex (character expansion) argument, you can change the size of the plotted characters.

```
# Change the shape of the points and scale them down by 0.6
plot(Petal.Width ~ Petal.Length, data=iris,
pch=16,
cex=0.6)
```

## Changing the Color

You can change the foreground color of symbols using the col argument.

```
# Change the color of symbols to blue
plot(Petal.Width ~ Petal.Length, data=iris,
pch=16,
col="dodgerblue1")
```

R has a number of predefined colors that you can use in graphics. Use the `colors()`

function to get a complete list of available names for colors.

```
# List of predefined colors in R
colors()
[1] "white" "aliceblue" "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
...
```

Or you can refer the following color chart.

You can specify colors by index, name, hexadecimal, or RGB value. For example `col=1`

, `col="white"`

, and `col="#FFFFFF"`

are equivalent.

## Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following arguments.

Argument | Description |

main | Main plot title |

xlab | x-axis label |

ylab | y-axis label |

```
plot(Petal.Width ~ Petal.Length, data=iris,
pch=16,
col="dodgerblue1",
main = "Iris Flower Data Set",
xlab = "Petal Length (cm)",
ylab = "Petal Width (cm)")
```

## Creating a Scatter Plot of Multiple Groups

Plotting multiple groups in one scatter plot creates an uninformative mess. The graphic would be far more informative if you distinguish one group from another.

Following example uses the pch argument to plot each point with a different plotting character, according to the parallel factor “Species”.

```
# A scatter plot that shows the points in groups according to their "species"
plot(Petal.Width ~ Petal.Length, data=iris,
col=c("brown1","dodgerblue1","limegreen")[as.integer(Species)],
pch=c(1,2,3)[as.integer(Species)])
legend(x="topleft",
legend=c("setosa","versicolor","virginica"),
col=c("brown1","dodgerblue1","limegreen"),
pch=c(1,2,3))
```

With the `legend()`

function, you can include a legend to your plot, a little box that decodes the graphic for the viewer.

The position of the legend can be specified using the following keywords : “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.

## Plotting the Regression Line

To add a regression line (line of Best-Fit) to the existing plot, you first need to estimate a linear regression model using the `lm()`

function.

The result is an object of class **lm**. You can simply pass the lm object to `abline()`

function to draw the regression line directly.

```
m <- lm(Petal.Width ~ Petal.Length, data=iris)
plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
abline(m, col="brown2")
```

## Plotting the Lowess Line

The `lowess()`

function performs the computations for locally weighted scatter plot smoothing (LOWESS).

Its result can be passed to the `lines()`

function to add a lowess line to the existing plot.

```
plot(Petal.Width ~ Petal.Length, data=iris, col="dodgerblue1")
lines(lowess(iris$Petal.Length, iris$Petal.Width), col = "brown2")
```

## Scatterplot Matrix

If your data set contains large number of variables, finding relation between them is difficult. In R, you can create scatter plots of all pairs of variables at once.

Following example plots all columns of iris data set, producing a matrix of scatter plots (pairs plot).

```
plot(iris,
col=rgb(0,0,1,.15),
pch=19)
```

By default, the `plot()`

function takes all the columns in a data frame and creates a matrix of scatter plots. This becomes messy if you have many columns.

You can choose which columns you want to display by using the formula notation.

```
# Use formula notation to create customized pairs plots
plot(~ Petal.Length + Petal.Width + Sepal.Width,
col=rgb(0,0,1,.15),
pch=19,
data=iris)
```

## Coplots (conditioning scatter plots)

Often your dataset contains a mixture of both continuous and discrete variables. It can be quite tedious to find how a relationship between a pair of variables differs among groups.

Information of that nature can be gained using **conditioning plots** (or coplots).

Conditioning scatter plots contains multipanel display, where each panel contains a scatter plot for each group.

```
coplot(Petal.Length ~ Petal.Width | Species,
data=iris,
columns=3,
bar.bg=c(fac="lightskyblue"),
col="dodgerblue1")
```

## 3D scatter plots – scatterplot3D package

There are many packages in R (such as scatterplot3d, RGL, lattice, …) for creating 3D plots. The scaterplot3d package is simple and easy to use among all.

To create a 3D scatter plot, use `scatterplot3d()`

function and pass in three variables representing the x, y, and z coordinates.

```
library(scatterplot3d)
attach(iris)
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length)
```

You can alter the appearance of your 3D scatterplot by using following parameters.

Parameter | Description |

type | The type of item to plot ‘p’ for points, ‘l’ for lines, ‘h’ for line segments from z = 0, |

color | The color to be used for plotted items |

pch | Plotting symbol to use |

angle | Angle between x and y axis |

xlab, ylab, zlab | Labels for the coordinates |

main, sub | Title and subtitle |

```
# Changing the appearance of the 3D scatterplot
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length,
pch = 16,
type="h",
angle = 45,
xlab = "Sepal length",
ylab = "Sepal width",
zlab = "Petal length",
color = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])
legend("top",
pch = 16,
cex = 0.8,
horiz = TRUE,
legend = levels(iris$Species),
col = c("brown1","dodgerblue1","limegreen"))
```

## 3D scatter plots – rgl package

When it comes to 3D plots, it’s important to be able to view them from different angles.

The **rgl** package offers some simple functions to create 3D plots that you can rotate and zoom in/out. rgl utilizes OpenGL to render the graphics on your computer screen.

To create a 3D scatter plot, use `plot3d()`

of rgl and pass in three variables representing the x, y, and z coordinates.

```
# Create a spinning 3D scatter plot
library(rgl)
attach(iris)
plot3d(Sepal.Length, Sepal.Width, Petal.Length)
```

You can rotate the plot by clicking and dragging with the mouse, and zoom in and out with the scroll wheel.

You can alter the appearance of your 3D scatterplot by using following parameters.

Parameter | Description |

type | The type of item to plot ‘p’ for points, ‘s’ for spheres, ‘l’ for lines, ‘h’ for line segments from z = 0,~’n’ for nothing |

col | The color to be used for plotted items |

size | The size for plotted points |

xlab, ylab, zlab | Labels for the coordinates |

main, sub | Title and subtitle |

```
# Changing the appearance of the 3D scatterplot
plot3d(Sepal.Length, Sepal.Width, Petal.Length,
pch = 16,
size = 1,
type = "s",
xlab = "Sepal length",
ylab = "Sepal width",
zlab = "Petal length",
col = c("brown1","dodgerblue1","limegreen")[as.integer(Species)])
legend3d("topright",
col=c("brown1","dodgerblue1","limegreen"),
legend=levels(Species),
pch=16)
```

# Python Example for Beginners

## Two Machine Learning Fields

There are two sides to machine learning:

**Practical Machine Learning:**This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.**Theoretical Machine Learning**: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

**Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes**

**Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!**

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

**Applied Statistics with R for Beginners and Business Professionals**

**Data Science and Machine Learning Projects in Python: Tabular Data Analytics**

**Data Science and Machine Learning Projects in R: Tabular Data Analytics**

**Python Machine Learning & Data Science Recipes: Learn by Coding**

**R Machine Learning & Data Science Recipes: Learn by Coding**

**Comparing Different Machine Learning Algorithms in Python for Classification (FREE)**

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause.The information presented here could also be found in public knowledge domains.