(Basic Statistics for Citizen Data Scientist)

Two-sample Proportion Testing

Theorem 1: Let x₁ and x₂ be random variables with proportional distributions with mean π₁ and π₂ respectively. Let p₁ be the proportion of successes in n₁ trials of the first distribution and let p₂ be the proportion of successes in n₂ trials of the second distribution. When the number of trials n₁ and n₂ are sufficiently large, usually when n_i π_i ≥ 5 and n_i(1 –π_i) ≥ 5, the difference between the sample proportions p₁ – p₂ will be approximately normal with mean π₁ – π₂ and standard deviation

Proof: Based on Theorem 2 of the Binomial Distribution, x_i has approximately the distribution

Since x₁ and x₂ are independently distributed, by the linear transformation property of the normal distribution, x₁ – x₂ has distribution

Example 1: A company that manufactures long-lasting light bulbs sells halogen and compact florescent bulbs. They ran an experiment in which they ran 100 halogen and 100 florescent bulbs continuously for 250 days. After 250 days they found that half of the halogen bulbs were still working while 60% of the florescent bulbs were still operating. Is there a significant difference between the two types of bulbs?

Let x₁ = the percentage of halogen bulbs that are functional after 250 days and x₂ = the percentage of florescent bulbs that are functional after 250 days. The presumption is that the distributions for each of these are proportional. We now test the following null hypothesis:

H₀: π₁ = π₂

Assuming the null hypothesis is true, by Theorem 1, x₁ – x₂ will be approximately normal with mean π₁ – π₂ = 0 and standard deviation

where the common value of the mean is denoted π and both samples are of size n. Since the value for π is unknown, we estimate its value from the sample, namely, 50 + 60 = 110 successes out of 200, i.e. π ≈ 0.55, Thus, the mean of x₁ – x₂ is 0 (based on the null hypothesis) and the standard deviation is approximately $sqrt{frac{2(.55)(.45)}{100}}$ = .704. The observed value of x₁ – x₂ is .60 – .50 =.10, and so we have (two-tail test):

p-value = NORMDIST(.1, 0, .704, TRUE) = .922 < .975 = 1 – α/2

Thus, we can’t reject the null hypothesis and so we cannot conclude there is a significant difference between the two types of bulbs. More precisely

p-value = 2*(1–NORM.DIST(.1, 0, .0703, TRUE)) = .155 > .05 = α

Alternatively, we can reach the same conclusion via the following test:

critical value of x₁ – x₂ = NORMINV(.975,0,.0703) = .138 > .1 = observed value of x₁ – x₂

Statistics with R for Business Analysts – Normal Distribution

Statistics for Beginners in Excel – Two-sample hypothesis testing

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:

All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.

End-to-End Python Machine Learning Recipes & Examples.

End-to-End R Machine Learning Recipes & Examples.

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Towards Advanced Analytics Specialist & Analytics Engineer

Statistics for Beginners in Excel – Two-sample hypothesis testing

(Basic Statistics for Citizen Data Scientist)

Two-sample Proportion Testing

Statistics for Beginners in Excel – Two-sample hypothesis testing

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Related Posts

Operational Database vs. Data Warehouse

ETL vs. ELT: Navigating Data Integration Techniques in Data Warehousing

Unlocking the Power of Mixed Models in Statistical Analysis