(Basic Statistics for Citizen Data Scientist)
Descriptive Statistics Tools
Excel provides a data analysis tool called Descriptive Statistics which produces a summary of the key statistics for a data set.
Example 1: Provide a table of the most common descriptive statistics for the scores in column A of Figure 1.
Figure 1 – Output from Descriptive Statistics data analysis tool
The output from the tool is shown in the right side of Figure 1. To use the tool, select Data > Analysis|Data Analysis and choose the Descriptive Statistics option. A dialog box appears as in Figure 2.
Figure 2 – Dialog box for Excel’s data analysis tool
Now click on Input Range and highlight the scores in column A (i.e. cells A3:A14). If you include the heading, as is done here, check Labels in first row. Since we want the output to start in cell C3, click the Output Range radio button and insert C3 (or click on cell C3). Finally click the Summary statistics checkbox and press the OK button.
Note that if we had also checked the Kth Largest checkbox, the output would also contain the value for LARGE(A4:A14, k) where k is the number we insert in the box to the right of the label Kth Largest. Similarly, checking the Kth Smallest checkbox outputs SMALL(A4:A14, k). The option Confidence Interval for Mean option generates a confidence interval using the t distribution.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides the supplemental Descriptive Statistics and Normality data analysis tool which outputs the above statistics plus GEOMEAN, HARMEAN, MAD, AAD and IQR. But instead of just generating the numerical value of each statistic, as is in Excel’s Descriptive Statistics data analysis tool, the Real Statistics tool outputs the appropriate Excel formula for computing each statistic (see Figure 4 below). Thus whenever the input data values change, the output values will change automatically as well.
Both Excel’s Descriptive Statistics and the Real Statistics Descriptive Statistics and Normality data analysis tools allow you to report on multiple sets of data at the same time, as shown in the following example.
Example 2 – Use Excel’s Descriptive Statistics data analysis tool as well as the Real Statistics Descriptive Statistics and Normality data analysis tool to show the descriptive statistics for the two samples on the left side of Figure 3.
Figure 3 – Output from Excel’s Descriptive Statistics data analysis tool
The output from the Excel Descriptive Statistics data analysis tool is given on the right side of Figure 3.
To use the Real Statistics data analysis tool, enter Ctrl-m and select the Descriptive Statistics and Normality option. A dialog box will now appear as shown in Figure 4. Insert A5:B18 into the Input Range, make sure Column headings included with data is checked and select the Descriptive Statistics option.
Figure 4 – Real Statistics Descriptive Statistics and Normality dialog box
After clicking the OK button, the output shown in Figure 5 is displayed.
Figure 5 – Real Statistics Descriptive Statistics data analysis tool
As described above, the tool actually generates formulas instead of the numerical values.
Statistics for Beginners – Introduction to Excel Data Analysis Tools
Statistics for Beginners with Excel – Descriptive Statistics Tools
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.