Hits: 37 (Basic Statistics for Citizen Data Scientist) Distribution Property Functions In the descriptions of the distributions described throughout the website, we have provided formulas for the distribution mean and variance. Real Statistics provides the following functions to carry out these calculations. Real Statistics Functions: The Real Statistics Resource Pack contains the following functions. MEAN_DIST(dist, …

Hits: 143 (Basic Statistics for Citizen Data Scientist) Laplace Distribution The pdf of the Laplace distribution (aka the double exponential distribution) with location parameter μ and scale parameter β is where β > 0. The cdf is The inverse of the Laplace distribution is Key statistical properties of the Laplace distribution are shown in Figure 1. Figure 1 – Statistical …

Hits: 199 (Basic Statistics for Citizen Data Scientist) Gumbel Distribution The Gumbel distribution is used to model the largest value from a relatively large set of independent elements from distributions whose tails decay relatively fast, such as a normal or exponential distribution. As a result, it can be used to analyze annual maximum daily rainfall …

Hits: 50 (Basic Statistics for Citizen Data Scientist) Logistic Distribution The pdf of the Logistic distribution at location parameter µ and scale parameter β is where β > 0. The cdf is The inverse of the logistic distribution is The standard Gumbel distribution is the case where μ = 0 and β = 1. Key statistical properties of the Logistic distribution are shown in Figure …

Hits: 78 (Basic Statistics for Citizen Data Scientist) Weibull Distribution Definition 1: The Weibull distribution has the probability density function (pdf) for x ≥ 0. Here β > 0 is the shape parameter and α > 0 is the scale parameter. The cumulative distribution function (cdf) is The inverse cumulative distribution function is I(p) = Observation: There is also a three-parameter version of the Weibull distribution. Observation: If x represents …

Hits: 105 (Basic Statistics for Citizen Data Scientist) Uniform Distribution When you ask for a random set of say 100 numbers between 1 and 10, you are looking for a sample from a continuous uniform distribution, where α = 1 and β = 10 according to the following definition. Definition 1: The continuous uniform distribution has probability density function (pdf) given by where α and β are …

Hits: 49 (Basic Statistics for Citizen Data Scientist) Exponential Distribution The exponential distribution can be used to determine the probability that it will take a given number of trials to arrive at the first success in a Poisson distribution; i.e. it describes the inter-arrival times in a Poisson process. It is the continuous counterpart to the geometric …

Hits: 18 (Basic Statistics for Citizen Data Scientist) Gamma Distribution The gamma distribution has the same relationship to the Poisson distribution that the negative binomial distribution has to the binomial distribution. We aren’t going to study the gamma distribution directly, but it is related to the exponential distribution and especially to the chi-square distribution which will receive a lot more attention in this website. …

Hits: 8 (Basic Statistics for Citizen Data Scientist) Two Sample Hypothesis Testing to Compare Variances Theorem 1 of F Distribution can be used to test whether the variances of two populations are equal, using the Excel functions and tools which follows. In order to deal exclusively with the right tail of the distribution, when taking ratios of sample …

Hits: 30 (Basic Statistics for Citizen Data Scientist) F Distribution The F-distribution is primarily used to compare the variances of two populations, as described in Hypothesis Testing to Compare Variances. This is particularly relevant in the analysis of variance testing (ANOVA) and in regression analysis. Definition 1: The The F-distribution with n1, n2degrees of freedom is defined by Theorem 1: If we …

Hits: 653 (Basic Statistics for Citizen Data Scientist) Fisher’s Exact Test When the conditions for Pearson’s chi-square test are not met, especially when one of more of the cells have expi < 5, an alternative approach with 2 × 2 contingency tables is to use Fisher’s exact test. Since this method is more computationally intense, it is best used …

Hits: 14 (Basic Statistics for Citizen Data Scientist) Independence Testing The method described in Goodness of Fit can also be used to determine whether two sets of data are independent of each other. Such data are organized in what are called contingency tables, as described in Example 1. In these cases df = (row count – 1) (column count – …

Hits: 46 (Basic Statistics for Citizen Data Scientist) Goodness of Fit Basic Concepts Observation: Suppose the random variable x has binomial distribution B(n, p) and define z as By Corollary 1 of Relationship between Binomial and Normal Distributions, provided n is large enough, generally if np ≥ 5 and n(1–p) ≥ 5, then z is approximately normally distributed with mean 0 and standard deviation 1. Thus by Corollary 1 …

Hits: 12 (Basic Statistics for Citizen Data Scientist) Power of One Sample Variance Testing Let represent the hypothetical variance and s2 the observed variance. Let x+crit be the right critical value (based on the null hypothesis with significance level α/2) and x-crit be the left critical value (two-tailed test) , i.e. x-crit = CHIINV(1−α/2,n−1) x+crit = CHIINV(α/2,n−1) Let δ = /s2. Then …