Statistics for Beginners in Excel – Two-sample hypothesis testing

(Basic Statistics for Citizen Data Scientist)

Two-sample Proportion Testing

Theorem 1: Let x1 and x2 be random variables with proportional distributions with mean π1 and π2 respectively. Let p1 be the proportion of successes in n1 trials of the first distribution and let p2 be the proportion of successes in n2 trials of the second distribution. When the number of trials n1 and n2 are sufficiently large, usually when ni πi ≥ 5 and n(1 –πi) ≥ 5, the difference between the sample proportions p1 – p2 will be approximately normal with mean π1 – π2 and standard deviation

image567

Proof: Based on Theorem 2 of the Binomial Distribution, xi has approximately the distribution

image568

Since x1 and x2 are independently distributed, by the linear transformation property of the normal distribution, x1 – x2 has distribution

image570

Example 1: A company that manufactures long-lasting light bulbs sells halogen and compact florescent bulbs. They ran an experiment in which they ran 100 halogen and 100 florescent bulbs continuously for 250 days. After 250 days they found that half of the halogen bulbs were still working while 60% of the florescent bulbs were still operating. Is there a significant difference between the two types of bulbs?

Let x1 = the percentage of halogen bulbs that are functional after 250 days and x2 = the percentage of florescent bulbs that are functional after 250 days. The presumption is that the distributions for each of these are proportional. We now test the following null hypothesis:

H0: π1 = π2

Assuming the null hypothesis is true, by Theorem 1, x1 – x2 will be approximately normal with mean π1 – π2 = 0 and standard deviation

image572

where the common value of the mean is denoted π and both samples are of size n. Since the value for π is unknown, we estimate its value from the sample, namely, 50 + 60 = 110 successes out of 200, i.e. π ≈ 0.55, Thus, the mean of x1 – x2 is 0 (based on the null hypothesis) and the standard deviation is approximately sqrt{frac{2(.55)(.45)}{100}} = .704. The observed value of x1 – x2 is .60 – .50 =.10, and so we have (two-tail test):

p-value = NORMDIST(.1, 0, .704, TRUE) = .922 < .975 = 1 – α/2

Thus, we can’t reject the null hypothesis and so we cannot conclude there is a significant difference between the two types of bulbs. More precisely

p-value = 2*(1–NORM.DIST(.1, 0, .0703, TRUE)) = .155 > .05 = α

Alternatively, we can reach the same conclusion via the following test:

critical value of x1 – x2 = NORMINV(.975,0,.0703) = .138 > .1 = observed value of x1 – x2

 

Statistics with R for Business Analysts – Normal Distribution

 

Statistics for Beginners in Excel – Two-sample hypothesis testing

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!