Statistics for Beginners with Excel – Histograms

(Basic Statistics for Citizen Data Scientist)

Histograms

A histogram is a graphical representation of the output of the FREQUENCY function (as described in Frequency Tables).

Example 1: Create a histogram for the data and bin selection for Example 1 from Frequency Tables.

We start by replicating the data and bin section for Example 1 in Figure 1.

Data example 1

Figure 1 – Data for Example 1

You can use Excel’s chart tool to graph the data in Figure 1, or alternatively you can use the Histogram data analysis tool to accomplish this directly, as described next.

Excel Data Analysis Tool: To use Excel’s Histogram data analysis tool, you must first establish a bin array (as for the FREQUENCY function described in Frequency Tables) and then select the Histogram data analysis tool. In the dialog box that is displayed you next specify the input data (Input Range) and bin array (Bin Range). You can optionally include the labels for these ranges (in which case you check the Labels check box).

For Example 1, the Input Range is A4:B14 and the Bin Range is D4:D7 (with the Labels check box unchecked). The output is displayed in Figure 2.

 

Histogram data analysis tool

Figure 2 – Histogram data analysis tool

 

Observation: Caution must be exercised when creating histograms to present the data in a clear and accurate way. For most purposes it is important that the intervals be equal in size (except for an unbounded first and/or last interval). Otherwise a distorted picture of the data may be presented.

To avoid this problem equally-spaced intervals can be used. This is the approach illustrated in Example 4 of Frequency Tables using the FREQTABLE supplemental function. Alternatively, the Real Statistics Histogram with Normal Curve Overlay data analysis tool can be used.

Real Statistics Data Analysis Tool: The Histogram with Normal Curve Overlay data analysis tool provided in the Real Statistics Resource Pack can be used to create a frequency table and histogram as illustrated in the following example.

Example 2: Create a frequency table and histogram for the 22 data elements in the range A4:B14 of Figure 1 based on bins of size 15.

Enter Ctrl-m and select the Histogram with Normal Curve Overlay option. Fill in the the dialog box that appears as shown in Figure 3.

Histogram dialog box

Figure 3 – Dialog box for Histogram data analysis tool

After clicking on the OK button, the output as shown in Figure 4 is displayed.

Frequency table histogram

Figure 4 – Frequency Table and Histogram

Note that if 100 is inserted in the Maximum bin value (or blank) field of Figure 3 then the output would be as shown in Figure 5.

Histogram frequency table

Figure 5 – Frequency Table and Histogram (revised)

If the Curve chart option is chosen as shown in the dialog box in Figure 3 then the output is as in Figure 6.

Smoothed Line Curve

Figure 6 – Smoothed Line Curve

Observation: What is the ideal number of bins or bin width? Historically, Sturgis’s Rule was used.

Sturgis' Rule

This approach works pretty well for small n (up to about 200). Better approaches were supplied by Scott, namely

Scott's Rule

where s = the standard deviation, or Freedman and Diaconis, namely

Freedman-Diaconis Rule

where IQR is the interquartile range. The latter is probably more appropriate for data that is not normally distributed.

For Example 1, since n = 22, s = 22.43 and IQR = 21, the appropriate bin width is 28 per Scott or 15 per Freedman and Diaconis; or about 5.5 bins per Sturgis.

Observation: The histogram (or straight line chart or smoothed line curve) produced by this data analysis tool can also be overlayed by a normal curve to help determine whether the data is normally distributed. See Histogram with Normal Curve Overlay for more details.

Observation: You can also produce a frequency table and histogram of the type described in Example 3 of Discrete Probability Distributions (i.e. without specifying any bins) via the Histogram with Normal Curve Overlay data analysis tool. In this case you would leave the Bin size field blank in the dialog box shown in Figure 3.

Statistics for Beginners with Excel – Histograms

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!