R Box-whisker Plot – Base Graph
The box-whisker plot (or a boxplot) is a quick and easy way to visualize complex data where you have multiple samples.
A box plot is a good way to get an overall picture of the data set in a compact manner.
The boxplot() function
You can use the
boxplot() function to create box-whisker plots.
It has many options and arguments to control many things, such as the making it horizontal, adding labels, titles and colors.
The syntax for the
boxplot() function is:
|x||A vector of values from which the boxplots are to be produced|
|names||Group labels to be printed under each boxplot|
|xlab||The label for the x axis|
|ylab||The label for the y axis|
|border||A vector of colors for the outlines of the boxplots|
|col||The foreground color of symbols as well as lines|
|notch||if TRUE, a notch is drawn in each side of the boxes|
|horizontal||Set it to TRUE to draw the box-plot horizontally|
|add||Set it to TRUE to add boxplot to current plot|
|…||other graphical parameters|
Create a Box-Whisker Plot
To get started with plot, you need a set of data to work with. Let’s consider the built-in ToothGrowth data set as an example data set.
Here are the first six observations of the data set.
# First six observations of the 'ToothGrowth' data set head(ToothGrowth) len supp dose 1 4.2 VC 0.5 2 11.5 VC 0.5 3 7.3 VC 0.5 4 5.8 VC 0.5 5 6.4 VC 0.5 6 10.0 VC 0.5
ToothGrowth data set
ToothGrowth data set contains observations on effect of vitamin C on tooth growth in 60 guinea pigs, where each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (coded as VC).
To create a box plot just specify any variable of the data set in
Horizontal Box Plot
You can also draw the box-plot horizontally by setting the horizontal argument to TRUE.
boxplot(ToothGrowth$len, horizontal = TRUE)
Notched Box Plot
The notched box plot allows you to assess whether the medians are different. If the notches do not overlap, there is strong evidence (95% confidence) their medians differ.
You add notches to a box plot by setting the notch argument to TRUE.
# Add notches to a box plot boxplot(ToothGrowth$len, notch = TRUE)
Side-by-Side Box Plots
Often your data set contains a numeric variable (quantitative variable) and a factor (categorical variable). It can be quite tedious to find whether the numeric variable changes according to the level of the factor.
Information of that nature can be gained by plotting box plots side by side.
In R, you can do this by using the boxplot() function with a formula:
boxplot(x ~ f)
Here, x is the numeric variable and f is the factor.
# Creating one box plot for each factor level (dose) boxplot(len ~ dose, data = ToothGrowth)
Grouped Box Plot
A grouped box plot is used when you have a numerical variable, several groups and subgroups.
You can create a grouped box plot by putting interaction of two categorical variables on x-axis and a numeric variable on y-axis.
The interaction of two variables is indicated by separating their names with an asterisk
# Box plot of length based on interaction of two variables (supplement and dose) boxplot(len ~ supp*dose, data = ToothGrowth)
Change Group Names
To change names for group of boxes, use names argument.
boxplot(len ~ dose, data = ToothGrowth, names=c("0.5 mg","1 mg","2 mg"))
Use col argument to change the fill colors used for the boxes.
boxplot(len ~ dose, data = ToothGrowth, col = "dodgerblue1")
You can change the colors of individual boxes by passing a vector of colors to the col argument.
boxplot(len ~ dose, data = ToothGrowth, col = c("orange1", "dodgerblue1", "olivedrab2"))
By using the border argument, you can even change the color used for the border of the boxes.
boxplot(len ~ dose, data = ToothGrowth, col="lightblue1", border="dodgerblue3")
Adding Titles and Axis Labels
You can add your own title and axis labels easily by specifying following arguments.
|main||Main plot title|
boxplot(len ~ dose, data = ToothGrowth, main="Tooth Growth in Guinea Pigs", xlab="Vitamin C dose (mg/day)", ylab="Length of odontoblasts")
Add Means to a Box Plot
The horizontal line in the middle of a box plot is the median, not the mean.
The median alone will not help you understand if the data is normally distributed. So, you need to add mean markers on your box plot.
boxplot(len ~ dose, data=ToothGrowth, col="dodgerblue1") meanval <- by(ToothGrowth$len, ToothGrowth$dose, mean) points(meanval, col="white", pch=18)
Python Example for Beginners
Two Machine Learning Fields
There are two sides to machine learning:
- Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
- Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.
Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes
Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!
Latest end-to-end Learn by Coding Recipes in Project-Based Learning:
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.