Site icon Towards Advanced Analytics Specialist & Analytics Engineer

Beginners Guide to R – R Box-whisker Plot – Base Graph

R Box-whisker Plot – Base Graph

The box-whisker plot (or a boxplot) is a quick and easy way to visualize complex data where you have multiple samples.

A box plot is a good way to get an overall picture of the data set in a compact manner.

The boxplot() function

You can use the boxplot() function to create box-whisker plots.

It has many options and arguments to control many things, such as the making it horizontal, adding labels, titles and colors.

Syntax

The syntax for the boxplot() function is:

boxplot(x,names,xlab,ylab,border,col,notch,horizontal,add,)

Parameters

Parameter Description
x A vector of values from which the boxplots are to be produced
names Group labels to be printed under each boxplot
xlab The label for the x axis
ylab The label for the y axis
border A vector of colors for the outlines of the boxplots
col The foreground color of symbols as well as lines
notch if TRUE, a notch is drawn in each side of the boxes
horizontal Set it to TRUE to draw the box-plot horizontally
add Set it to TRUE to add boxplot to current plot
other graphical parameters

Create a Box-Whisker Plot

To get started with plot, you need a set of data to work with. Let’s consider the built-in ToothGrowth data set as an example data set.

Here are the first six observations of the data set.

# First six observations of the 'ToothGrowth' data set
head(ToothGrowth)
   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

ToothGrowth data set

ToothGrowth data set contains observations on effect of vitamin C on tooth growth in 60 guinea pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (coded as VC).

To create a box plot just specify any variable of the data set in boxplot() function.

boxplot(ToothGrowth$len)

Horizontal Box Plot

You can also draw the box-plot horizontally by setting the horizontal argument to TRUE.

boxplot(ToothGrowth$len,
        horizontal = TRUE)

Notched Box Plot

The notched box plot allows you to assess whether the medians are different. If the notches do not overlap, there is strong evidence (95% confidence) their medians differ.

You add notches to a box plot by setting the notch argument to TRUE.

# Add notches to a box plot
boxplot(ToothGrowth$len,
        notch = TRUE)

Side-by-Side Box Plots

Often your data set contains a numeric variable (quantitative variable) and a factor (categorical variable). It can be quite tedious to find whether the numeric variable changes according to the level of the factor.

Information of that nature can be gained by plotting box plots side by side.

In R, you can do this by using the boxplot() function with a formula:

boxplot(x ~ f)

Here, x is the numeric variable and f is the factor.

# Creating one box plot for each factor level (dose)
boxplot(len ~ dose, data = ToothGrowth)

Grouped Box Plot

A grouped box plot is used when you have a numerical variable, several groups and subgroups.

You can create a grouped box plot by putting interaction of two categorical variables on x-axis and a numeric variable on y-axis.

The interaction of two variables is indicated by separating their names with an asterisk *

# Box plot of length based on interaction of two variables (supplement and dose)
boxplot(len ~ supp*dose, data = ToothGrowth)

Change Group Names

To change names for group of boxes, use names argument.

boxplot(len ~ dose, data = ToothGrowth,
        names=c("0.5 mg","1 mg","2 mg"))

Change Colors

Use col argument to change the fill colors used for the boxes.

boxplot(len ~ dose, data = ToothGrowth,
        col = "dodgerblue1")

You can change the colors of individual boxes by passing a vector of colors to the col argument.

boxplot(len ~ dose, data = ToothGrowth,
        col = c("orange1", "dodgerblue1", "olivedrab2"))

By using the border argument, you can even change the color used for the border of the boxes.

boxplot(len ~ dose, data = ToothGrowth,
        col="lightblue1",
        border="dodgerblue3")

Adding Titles and Axis Labels

You can add your own title and axis labels easily by specifying following arguments.

Argument Description
main Main plot title
xlab x‐axis label
ylab y‐axis label
boxplot(len ~ dose, data = ToothGrowth,
        main="Tooth Growth in Guinea Pigs",
        xlab="Vitamin C dose (mg/day)",
        ylab="Length of odontoblasts")

Add Means to a Box Plot

The horizontal line in the middle of a box plot is the median, not the mean.

The median alone will not help you understand if the data is normally distributed. So, you need to add mean markers on your box plot.

boxplot(len ~ dose, data=ToothGrowth,
        col="dodgerblue1")
meanval <- by(ToothGrowth$len, ToothGrowth$dose, mean)
points(meanval, col="white", pch=18)

 

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  
Exit mobile version