Hits: 22 (Basic Statistics for Citizen Data Scientist) Independence Testing The method described in Goodness of Fit can also be used to determine whether two sets of data are independent of each other. Such data are organized in what are called contingency tables, as described in Example 1. In these cases df = (row count – 1) (column count – …

Hits: 51 (Basic Statistics for Citizen Data Scientist) Goodness of Fit Basic Concepts Observation: Suppose the random variable x has binomial distribution B(n, p) and define z as By Corollary 1 of Relationship between Binomial and Normal Distributions, provided n is large enough, generally if np ≥ 5 and n(1–p) ≥ 5, then z is approximately normally distributed with mean 0 and standard deviation 1. Thus by Corollary 1 …

Hits: 14 (Basic Statistics for Citizen Data Scientist) Power of One Sample Variance Testing Let represent the hypothetical variance and s2 the observed variance. Let x+crit be the right critical value (based on the null hypothesis with significance level α/2) and x-crit be the left critical value (two-tailed test) , i.e. x-crit = CHIINV(1−α/2,n−1) x+crit = CHIINV(α/2,n−1) Let δ = /s2. Then …

Hits: 422 (Basic Statistics for Citizen Data Scientist) One Sample Hypothesis Testing of the Variance Based on Theorem 2 of Chi-square Distribution and its corollaries, we can use the chi-square distribution to test the variance of a distribution. Example 1: A company produces metal pipes of a standard length. Twenty years ago it tested its production quality and …

Hits: 41 (Basic Statistics for Citizen Data Scientist) Chi-square Distribution Definition 1: The chi-square distribution with k degrees of freedom, abbreviated χ2(k), has probability density function k does not have to be an integer and can be any positive real number. Click here for more technical details about the chi-square distribution, including proofs of some of the propositions described below. Except for …

Hits: 201 (Basic Statistics for Citizen Data Scientist) Equivalence Testing (TOST) The objective of a two-sample equivalence test is to determine whether the means of two populations are equivalent based on two independent samples from these populations; here “equivalent” means that the two means differ by a small pre-defined amount. This margin of equivalence is …

Hits: 14 (Basic Statistics for Citizen Data Scientist) Coefficient of Variation Testing One Sample Testing In Measures of Variability, we describe the unitless measure of dispersion called the coefficient of variation. It turns out that s/x̄ is a biased estimator for the population coefficient of variation σ/μ. A nearly unbiased estimator is where n is the sample size. When the coefficient …

Hits: 344 (Basic Statistics for Citizen Data Scientist) Paired Sample t Test In paired sample hypothesis testing, a sample from the population is chosen and two measurements for each element in the sample are taken. Each set of measurements is considered a sample. Unlike the hypothesis testing studied so far, the two samples are not independent of …

Hits: 157 (Basic Statistics for Citizen Data Scientist) Two Sample t Test: unequal variances Theorem 1: Let x̄ and ȳ be the sample means and sx and sy be the sample standard deviations of two sets of data of size nx and ny respectively. If x and y are normal, or nx and ny are sufficiently large for the Central Limit Theorem to hold, then the random variable has distribution T(m) where Observation: The nearest integer …

Hits: 59 (Basic Statistics for Citizen Data Scientist) Two Sample t Test: equal variances We now consider an experimental design where we want to determine whether there is a difference between two groups within the population. For example, let’s suppose we want to test whether there is any difference between the effectiveness of a new …

Hits: 99 (Basic Statistics for Citizen Data Scientist) One Sample t Test The t distribution provides a good way to perform one-sample tests on the mean when the population variance is not known provided the population is normal or the sample is sufficiently large so that the Central Limit Theorem applies. It turns out that the t distribution provides good …

Hits: 124 (Basic Statistics for Citizen Data Scientist) Basic Concepts of t Distribution The one sample hypothesis test described in Hypothesis Testing using the Central Limit Theorem using the normal distribution is fine when one knows the standard deviation of the population distribution and the population is either normally distributed or the sample is sufficiently large that …

Hits: 29 (Basic Statistics for Citizen Data Scientist) Required Sample Size for the Binomial Testing We now show how to determine the sample size required to achieve a specified power objective. Example 1: A company has made a major improvement in their manufacturing process and wants to test whether this improvement will result in 80% of …

Hits: 28 (Basic Statistics for Citizen Data Scientist) Statistical Power for the Binomial Distribution Power of one-tailed test Example 1: What is the power of the test in Example 3 of Hypothesis Testing for the Binomial Distribution? For this example we found 13 successes in a sample of size 24 and used a one-tailed test with α …

Hits: 80 (Basic Statistics for Citizen Data Scientist) Poisson Distribution Basic Concepts Definition 1: The Poisson distribution has a probability distribution function (pdf) given by The parameter μ is often replaced by λ. A chart of the pdf of the Poisson distribution for λ = 3 is shown in Figure 1. Figure 1 – Poisson Distribution Observation: Some key statistical properties of …