Statistics for Beginners in Excel – Box Plots with Outliers

(Basic Statistics for Citizen Data Scientist)

Box Plots with Outliers

Excel 2016 has added a Box and Whiskers chart capability. To access this capability for Example 1 of Creating Box Plots in Excel, highlight the data range A2:C11 (from Figure 1) and select Insert > Charts|Statistical > Box and Whiskers. The chart shown on the right side of Figure 1 will appear.

 

Box plot Excel 2016

Figure 1 – Excel’s Box and Whiskers chart

 

You can add a legend as well as chart and axis titles as usual. The box part of the chart is as described above, except that the mean is shown as an ×. The whiskers extend up from the top of the box to the largest data element that is less than or equal to 1.5 times the interquartile range (IQR) and down from the bottom of the box to the smallest data element that is larger than 1.5 times the IQR. Values outside this range are considered to be outliers and are represented by dots.

The boundaries of the box and whiskers are as calculated by the values and formulas shown in Figure 2. The only outlier is the value 1850 for Brand B, which is higher than the upper whisker, and so is shown as a dot.

 

Box and Whiskers formulas

Figure 2 – Formulas for the Box Plot

 

Note that we could also use the array formula

=MAX(IF(C2:C11<=H7,C2:C11,MIN(C2:C11)))

to calculate the value in cell H9, and the array formula

=MIN(IF(C2:C11>=H8,C2:C11,MAX(C2:C11)))

to calculate a value for cell H10. In fact, since the Excel Box Plot is only available in Excel 2016, we can also use the Excel 2016 (non-array) formulas =MAXIFS(C2:C11,”<=”&H7) and =MINIFS(C2:C11,”>=”&H8).

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack also provides a way of generating box plots with outliers. To produce such a box plot, proceed as in Example 1 of Creating Box Plots in Excel, except that this time you should select the Box Plots with Outliers option of the Descriptive Statistics and Normality data analysis tool. The output for Example 1 of Creating Box Plots in Excel is shown in Figure 3.

 

Box Plots with Outliers

Figure 3 – Output from Box Plots with Outliers tool

 

As you can see, the output is similar to that shown in Figure 1, except that this version is available in other releases of Excel besides Excel 2016. Also the Outlier Multiplier is not fixed at 1.5, but can be set to another value by the user (in the dialog box for the Descriptive Statistics and Normality data analysis tool).

The Outlier Multiplier is shown in cell F2 of the output displayed in Figure 3. This value is used in calculating the Min and Max values (which are the values at the bottom of the lower whisker and the top of the upper whisker). E.g. cell F12 contains the array formula =MIN(IF(ISBLANK(A4:A13),””,IF(A4:A13>=F13-$F2*(F15-F13),A4:A13,””))) and cell F16 contains the formula =MAX(IF(ISBLANK(A4:A13),””,IF(A4:A13<=F15+$F2*(F15-F13),A4:A13,””))).

If the Percentage option is set on the Configuration dialog box, then you should enter a value 100 times the desired value in the Outlier Multiplier field; e.g. enter 150 if you want a 1.5 outlier multiplier factor. Note too that if you leave this field blank, the outlier multiplier factor defaults to 2.2.

Negative numbers are handled in a manner similar to that for Box Plots without Outliers (often using a second y axis). Keep in mind, though, that a second y axis is only employed when the lower whisker of at least one of the box plots is negative. If some outlier is negative but none of the lower whiskers are negative, then a second y axis is not needed.

 

 

Statistics for Beginners in Excel – Box Plots with Outliers

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!