(Basic Statistics for Citizen Data Scientist)
Coefficient of Variation Testing
One Sample Testing
In Measures of Variability, we describe the unitless measure of dispersion called the coefficient of variation. It turns out that s/x̄ is a biased estimator for the population coefficient of variation σ/μ. A nearly unbiased estimator is
where n is the sample size.
When the coefficient of variation is calculated from a sample drawn from a normal population, then the standard error can be calculated by
Using the unbiased sample coefficient of variation, we get
For normally distributed data, we can use the following test statistic
Example 1: Determine whether the population coefficient of variation for the data in range A4:A13 of Figure 1 (representing the length of certain biological organisms) is significantly different from 0. Also find the 95% confidence interval for the population coefficient of variation.
Figure 1 – Test of Coefficient of Variation
We see from the figure that p-value < alpha, and so the coefficient of variation is significantly different from zero. The 95% confidence interval is (.1079, .3403).
Two Sample Testing
For two samples you can test whether their populations have the same coefficient of variation (i.e. H0: σ1/μ1 = σ2/μ2) when the two samples are taken from normal distributions with positive means. The test statistic is
where V1 and V2 are the coefficients of variation for the two samples of size n1 and n2 and the pooled coefficient of variation is
The 1 – α confidence interval for the difference between the population coefficients of variation is
The test works best when the sample sizes are at least 10 and the population coefficients are at most .33.
Example 2: Determine whether there is a significant difference between the population coefficient of variation for weight and height based on the two independent samples in range of A3:B14 of Figure 2. Also find the 95% confidence interval for the difference between the population coefficients of variation.
Figure 2 – Two sample test for coefficient of variation
As you can see from Figure 2, there is no significant difference between the two coefficients of variation (p-value =.18) and the 95% confidence interval for the difference between the coefficients is (-.1614, .2306).
Real Statistics Support
Real Statistics Functions: The Real Statistics Resource Pack provides the following array functions.
CVTEST(R1, lab, alpha): returns an array with the values from the one sample coefficient of variation (CV) test on the data in R1: sample CV, unbiased CV, standard error, p-value, lower and upper 1-alpha confidence interval
CV2TEST(R1, R2, lab, alpha): returns an array with the values from the two sample coefficient of variation (CV) test on the data in R1 and R2: sample 1 CV, sample 2 CV, pooled CV, z-stat, p-value, lower and upper 1-alpha confidence interval
alpha is the significance level of the test (default .05). If lab = TRUE (default FALSE) then a column of labels is appended to the output.
The output for Example 1 is shown on the left side of Figure 3, as calculated by the array formula =CVTEST(A4:A13,TRUE). The output for Example 2 is shown on the right side of the figure, as calculated by the array formula =CV2TEST(A4:A13,B4:B14,TRUE).
Figure 3 – Real Statistics output
Statistics for Beginners in Excel – Basic Probability Concepts
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.