(Basic Statistics for Citizen Data Scientist)

Creating Box Plots in Excel

Another way to characterize a distribution or a sample is via a box plot (aka a box and whiskers plot). Specifically, a box plot provides a pictorial representation of the following statistics: maximum, 75^th percentile, median (50^th percentile), mean, 25^th percentile and minimum.

Box plots are especially useful when comparing samples and testing whether data is distributed symmetrically.

Real Statistics Data Analysis Tool: To generate a box plot, you can use the Box Plot option of the Descriptive Statistics and Normality data analysis tool found in the Real Statistics Resource Pack, as described in the following example. See also Special Charting Capabilities for how to create the box plot manually using Excel’s charting capabilities.

Example 1: A market research company asks 30 people to evaluate three brands of tablet computers using a questionnaire. The 30 people are divided at random into 3 groups of 10 people each, where the first group evaluates Brand A, the second evaluates Brand B and the third evaluates Brand C. The questionnaire scores from these groups are summarized in Figure 1.

Figure 1 – Sample data

To generate the box plots for these three groups, press Ctrl-m and select the Descriptive Statistics and Normality data analysis tool. A dialog box will appear. Select the Box Plot option and insert A3:C13 in the Input Range. Check Headings included with the data and uncheck Use exclusive version of quartile.

The resulting chart is shown in Figure 2.

Figure 2 – Box Plot

Note too that the data analysis tool also generates a table, which may be located behind the chart. For those who are interested, this table contains the information in Figure 3, as explained further in Special Charting Capabilities.

For each sample, the box plot consists of a rectangular box with one line extending upward and another extending downward (usually called whiskers). The box itself is divided into two parts. In particular, the meaning of each element in the box plot is described in Figure 3.

Element	Meaning
Top of upper whisker	Maximum value of the sample
Top of box	75th percentile of the sample
Line through the box	Median of the sample
Bottom of the box	25th percentile of the sample
Bottom of the lower whisker	Minimum of the sample
× markers	Mean of the sample

Figure 3 – Box Plot elements

There are two versions of this table, depending on whether you check or uncheck the Use exclusive version of quartile field. If checked then the QUARTILE.EXC version of the 25^th and 75^th percentile is used (or QUARTILE_EXC for Excel 2007 users), while if this field is unchecked then the QUARTILE (or equivalently the QUARTILE.INC) version is used. See Ranking Functions in Excel for more details about the difference between these two versions.

From the box plot (see Figure 2) we can see that the scores for Brand C tend to be higher than for the other brands and those for Brand B tend to be lower. We also see that the distribution of Brand A is pretty symmetric at least in the range between the 1^st and 3^rd quartiles, although there is some asymmetry for higher values (or potentially there is an outlier). Brands B and C look less symmetric. Because of the long upper whisker (especially with respect to the box), Brand B may have an outlier (see Outliers and Robustness for a discussion of outliers).

Another indication of symmetry is whether the × marker for the mean coincides with the median.

We can also convert the box plot to a horizontal representation of the data (as shown in Figure 4) by first deleting the markers for the means (by clicking on any of these markers and pressing the backspace key) and then clicking on the chart and selecting Insert > Charts|Bar > Stacked Bar.

Figure 4 – Horizontal Box Plot

Observation: When a data set has a negative value, the y-axis will be shifted upward by -MIN(R1) where R1 is the data range containing the data. Thus if R1 ranges from -10 to 20, the range in the chart will range from 0 to 30.

Example 2: Create the box plot for the data in Figure 5.9.1 where cell B11 is changed to -300 and the exclusive version of the quartile function.

The procedure is the same as for Example 1, except that this time we check the Use exclusive version of quartile option. The output is shown in Figure 5.

The key difference is that since the smallest data value is -300 (the value in cell F13), all the box plot values are shifted up by 300. This is evident by noting that the lower tail for Brand B is at 0 instead of -300 (and that cell G6 contains 0 instead of -300).

Figure 5 – Box plot for negative data

Note that two y-axes are displayed. The one on left is based on the displacement of 300 units, while the one on the right shows the correct units.

You can remove the y-axis on the left by following the following steps:

Select the y-axis on the left and then right click.
Choose the Format Axis… option from the menu that appears.
When the menu of option appears as shown in Figure 5.9.6, change the Label Position option from Next to Axis to None.

Figure 6 – Remove left y-axis

Note that if you change any of the data elements, the box chart will still be correct, although the right y-axis will not change and will still reflect the original data, and so you will need to rely on the left y-axis (you can remove the right y-axis as described above for the left y-axis).

How to create BOX chart in R

Statistics for Beginners with Excel – Creating Box Plots

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:

All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.

End-to-End Python Machine Learning Recipes & Examples.

End-to-End R Machine Learning Recipes & Examples.

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Towards Advanced Analytics Specialist & Analytics Engineer

Statistics for Beginners with Excel – Creating Box Plots

(Basic Statistics for Citizen Data Scientist)

Creating Box Plots in Excel

Statistics for Beginners with Excel – Creating Box Plots

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Related Posts

ETL vs. ELT: Navigating Data Integration Techniques in Data Warehousing

Unlocking the Power of Mixed Models in Statistical Analysis

Mastering Analysis of Covariance (ANCOVA): A Comprehensive Statistical Guide with Python and R Examples