Statistics for Beginners in Excel – Basic Probability Concepts

(Basic Statistics for Citizen Data Scientist)

Coefficient of Variation Testing

One Sample Testing

In Measures of Variability, we describe the unitless measure of dispersion called the coefficient of variation. It turns out that s/ is a biased estimator for the population coefficient of variation σ/μ. A nearly unbiased estimator is

Unbiased coefficient of variation

where n is the sample size.

When the coefficient of variation is calculated from a sample drawn from a normal population, then the standard error can be calculated by

Standard deviation coefficient variation

Using the unbiased sample coefficient of variation, we get

Corrected s.e. coefficient variation

For normally distributed data, we can use the following test statistic

Test statistic coefficient variation

Example 1: Determine whether the population coefficient of variation for the data in range A4:A13 of Figure 1 (representing the length of certain biological organisms) is significantly different from 0. Also find the 95% confidence interval for the population coefficient of variation.

 

Coefficient of Variation Testing

Figure 1 – Test of Coefficient of Variation

 

We see from the figure that p-value < alpha, and so the coefficient of variation is significantly different from zero. The 95% confidence interval is (.1079, .3403).

Two Sample Testing

For two samples you can test whether their populations have the same coefficient of variation (i.e. H0σ11 = σ22) when the two samples are taken from normal distributions with positive means. The test statistic is

Two sample test statistic

where V1 and V2 are the coefficients of variation for the two samples of size n1 and n2 and the pooled coefficient of variation is

Pooled coefficient of variation

The 1 – α confidence interval for the difference between the population coefficients of variation is

confidence interval coefficient variation

The test works best when the  sample sizes are at least 10 and the population coefficients are at most .33.

Example 2: Determine whether there is a significant difference between the population coefficient of variation for weight and height based on the two independent samples in range of A3:B14 of Figure 2. Also find the 95% confidence interval for the difference between the population coefficients of variation.

 

Coefficient of variation testing

Figure 2 – Two sample test for coefficient of variation

 

As you can see from Figure 2, there is no significant difference between the two coefficients of variation (p-value =.18) and the 95% confidence interval for the difference between the coefficients is (-.1614, .2306).

Real Statistics Support

Real Statistics Functions: The Real Statistics Resource Pack provides the following array functions.

CVTEST(R1, lab, alpha): returns an array with the values from the one sample coefficient of variation (CV) test on the data in R1: sample CV, unbiased CV, standard error, p-value, lower and upper 1-alpha  confidence interval

CV2TEST(R1, R2, lab, alpha): returns an array with the values from the two sample coefficient of variation (CV) test on the data in R1 and R2: sample 1 CV, sample 2 CV, pooled CV, z-stat, p-value, lower and upper 1-alpha  confidence interval

alpha is the significance level of the test (default .05). If lab = TRUE (default FALSE) then a column of labels is appended to the output.

The output for Example 1 is shown on the left side of Figure 3, as calculated by the array formula =CVTEST(A4:A13,TRUE). The output for Example 2 is shown on the right side of the figure, as calculated by the array formula =CV2TEST(A4:A13,B4:B14,TRUE).

 

Coefficient of Variation Tests

Figure 3 – Real Statistics output

 

ML Classification in Python | XGBoost | Grid Search CV | Data Science Tutorials | IRIS Dataset | Pandas

 

Statistics for Beginners in Excel – Basic Probability Concepts

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!