Statistics for Beginners in Excel – Basic Concepts of t Distribution

(Basic Statistics for Citizen Data Scientist)

Basic Concepts of t Distribution

The one sample hypothesis test described in Hypothesis Testing using the Central Limit Theorem using the normal distribution is fine when one knows the standard deviation of the population distribution and the population is either normally distributed or the sample is sufficiently large that the Central Limit Theorem applies.

The problem is that the standard deviation of the population is generally not known. One approach for addressing this is to use the standard deviation s of the sample as an approximation for the standard deviation σ for the population. In fact, as is described below, such an approach is possible using the t distribution.

Definition 1: The (Student’st distribution with k degrees of freedom, abbreviated T(k) has probability distribution function given by

Observations: Key statistical properties of the t distribution are:

  • Mean = 0 for k > 0
  • Median = 0
  • Mode = 0
  • Range = (-∞, ∞)
  • Variance = k ⁄ (k – 2) for k > 2
  • Skewness = 0 for k > 3
  • Kurtosis = 6 ⁄ (k – 4) for k > 4

The overall shape of the probability density function of the t distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of freedom grows, the t distribution approaches the standard normal distribution, and in fact the approximation is quite close for k ≥ 30.

 

t distribution chart

Figure 1 – Chart of t distribution by degrees of freedom

 

Theorem 1: If x has normal distribution N(μ, σ), then for samples of size n, the random variable

image674

has distribution T(n – 1).

Corollary 1: For samples of sufficiently large size n, the random variable

image674

has distribution T(– 1).

Observation: The test statistic in the theorem and corollary are the same as

image676

from Central Limit Theorem with the population standard deviation σ replaced by the sample standard deviation s. What makes this useful is that usually the standard deviation of the population is unknown while the standard deviation of the sample is known.

Excel Functions: Excel provides the following functions regarding the t distribution:

TDIST(x, dftails) = the right tail at x of the Student’s t cumulative probability distribution function with df degrees of freedom when tails = 1 (for a one-tailed test). When tails = 2 (for a two-tailed test), TDIST(x, df, tails) is the sum of the right and left tails.

Since the t distribution is symmetric about x = 0, TDIST(x, df, 2) is simply 2 * TDIST(xdf, 1). Also note that x must be non-negative, but since the t distribution is symmetric about x = 0, the left tail when x < 0 is TDIST(-x, dftails). Thus we can use the formula TDIST(ABS(x), dftails) for any x. The cumulative probability distribution function is given by 1 – TDIST(x, df, 1) when x ≥ 0 and by TDIST(-x, df, 1) when x < 0.

TINV(p, df) = x such that TDIST(x, df, 2) = p; i.e. TINV is the inverse of TDIST in the two-tailed case. For the one-tailed case simply double p; i.e. TINV(2*p, df) = x such that TDIST(x, df, 1) = p.

With Excel 2010/2013/2016 there are a number of new functions (T.DIST, T.INV, T.DIST.RT, T.INV.RT and T.INV.2T) that provide equivalent functionality to TDIST and TINV, but whose syntax is more consistent with other distribution functions. These functions are described in Built-in Statistical Functions.

Real Statistics Function: In all these Excel functions that support the t distribution, the value of df is rounded down to the next lower integer. Thus, df = 3.7 is treated the same as df = 3. Furthermore, for versions of Excel prior to Excel 2010 there is no function equivalent to T.DIST(x, df, FALSE), i.e. there is no function that provides the pdf for the t distribution.

To address these issues, the Real Statistics Resource provides the following functions:

T_DIST(x, df ,cum),  T_DIST_RT(x, df),  T_DIST_2T(x, df)

T_INV(p, df),  T_INV_2T(p, df)

Except for the fact that the df is not rounded, these functions are identical to their standard Excel counterparts.

 

Statistics for Beginners in Excel – Power and Sample Size using Real Statistics

 

Statistics for Beginners in Excel – Basic Concepts of t Distribution

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!