(Basic Statistics for Citizen Data Scientist)
Box Plots with Outliers
Excel 2016 has added a Box and Whiskers chart capability. To access this capability for Example 1 of Creating Box Plots in Excel, highlight the data range A2:C11 (from Figure 1) and select Insert > Charts|Statistical > Box and Whiskers. The chart shown on the right side of Figure 1 will appear.
Figure 1 – Excel’s Box and Whiskers chart
You can add a legend as well as chart and axis titles as usual. The box part of the chart is as described above, except that the mean is shown as an ×. The whiskers extend up from the top of the box to the largest data element that is less than or equal to 1.5 times the interquartile range (IQR) and down from the bottom of the box to the smallest data element that is larger than 1.5 times the IQR. Values outside this range are considered to be outliers and are represented by dots.
The boundaries of the box and whiskers are as calculated by the values and formulas shown in Figure 2. The only outlier is the value 1850 for Brand B, which is higher than the upper whisker, and so is shown as a dot.
Figure 2 – Formulas for the Box Plot
Note that we could also use the array formula
=MAX(IF(C2:C11<=H7,C2:C11,MIN(C2:C11)))
to calculate the value in cell H9, and the array formula
=MIN(IF(C2:C11>=H8,C2:C11,MAX(C2:C11)))
to calculate a value for cell H10. In fact, since the Excel Box Plot is only available in Excel 2016, we can also use the Excel 2016 (non-array) formulas =MAXIFS(C2:C11,”<=”&H7) and =MINIFS(C2:C11,”>=”&H8).
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack also provides a way of generating box plots with outliers. To produce such a box plot, proceed as in Example 1 of Creating Box Plots in Excel, except that this time you should select the Box Plots with Outliers option of the Descriptive Statistics and Normality data analysis tool. The output for Example 1 of Creating Box Plots in Excel is shown in Figure 3.
Figure 3 – Output from Box Plots with Outliers tool
As you can see, the output is similar to that shown in Figure 1, except that this version is available in other releases of Excel besides Excel 2016. Also the Outlier Multiplier is not fixed at 1.5, but can be set to another value by the user (in the dialog box for the Descriptive Statistics and Normality data analysis tool).
The Outlier Multiplier is shown in cell F2 of the output displayed in Figure 3. This value is used in calculating the Min and Max values (which are the values at the bottom of the lower whisker and the top of the upper whisker). E.g. cell F12 contains the array formula =MIN(IF(ISBLANK(A4:A13),””,IF(A4:A13>=F13-$F2*(F15-F13),A4:A13,””))) and cell F16 contains the formula =MAX(IF(ISBLANK(A4:A13),””,IF(A4:A13<=F15+$F2*(F15-F13),A4:A13,””))).
If the Percentage option is set on the Configuration dialog box, then you should enter a value 100 times the desired value in the Outlier Multiplier field; e.g. enter 150 if you want a 1.5 outlier multiplier factor. Note too that if you leave this field blank, the outlier multiplier factor defaults to 2.2.
Negative numbers are handled in a manner similar to that for Box Plots without Outliers (often using a second y axis). Keep in mind, though, that a second y axis is only employed when the lower whisker of at least one of the box plots is negative. If some outlier is negative but none of the lower whiskers are negative, then a second y axis is not needed.
Statistics for Beginners in Excel – Box Plots with Outliers
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.