(Basic Statistics for Citizen Data Scientist)

Poisson Distribution

Basic Concepts

Definition 1: The Poisson distribution has a probability distribution function (pdf) given by

The parameter μ is often replaced by λ. A chart of the pdf of the Poisson distribution for λ = 3 is shown in Figure 1.

Figure 1 – Poisson Distribution

Observation: Some key statistical properties of the Poisson distribution are:

Mean = µ
Variance = µ
Skewness = 1 / $! sqrt{mu}$
Kurtosis = 1/µ

Excel Function: Excel provides the following function for the Poisson distribution:

POISSON(x, μ, cum) where μ = the mean of the distribution and cum takes the values TRUE and FALSE

POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for the Poisson distribution with mean μ.

POISSON(x, μ, TRUE) = cumulative probability distribution function F(x) at the value x for the Poisson distribution with mean μ.

Excel 2010/2013/2016 provide the additional function POISSON.DIST which is equivalent to POISSON.

Real Statistics Function: Excel doesn’t provide a worksheet function for the inverse of the Poisson distribution. Instead you can use the following function provided by the Real Statistics Resource Pack.

POISSON_INV(p, μ) = smallest integer x such that POISSON(x, μ, TRUE) ≥ p

Note that the maximum value of x is 1,024,000,000. A value higher than this indicates an error.

Poisson Process

If the average number of occurrences of a particular event in an hour (or some other unit of time) is μ and the arrival times are random without any tendency to bunch up (i.e. the assumptions for what is called a Poisson process) then the probability of x events occurring in an hour is given by

Example 1: A large department store sells on average 100 MP3 players a week. Assuming that purchases are as described in the above observation, what is the probability that the store will have to turn away potential buyers before the end if they stock 120 players? How many MP3 players should the store stock in order to make sure that it has a 99% probability of being able to supply a week’s demand?

The probability that they will sell ≤ 120 MP3 players in a week is

POISSON(120, 100, TRUE) = 0.977331

Thus, the answer to the first problem is 1 – 0.977331 = 0.022669, or about 2.3%. We can answer the second question by using successive approximations until we arrive at the correct answer. E.g. we could try x = 130, which is higher than 120. The cumulative Poisson is 0.998293, which is too high. We then pick x = 125 (halfway between 120 and 130). This yields 0.993202, which is a little too high, and so we try 123. This yields 0.988756, which a little too low, and so we finally arrive at 124, which has cumulative Poisson distribution of 0.991226.

Alternatively, you can arrive at the same answer (124) by using the Real Statistics formula =POISSON_INV(0.99,100).

Confidence Intervals

The 1–α confidence interval for the mean based on x events occurring (in a unit of time) is given by

where

For Excel 2007, χ²_p,df = CHIINV(1−p,df).

Example 2: Suppose the number of radioactive particles that hits a screen per second follows a Poisson process and suppose that 5 hits occurred in one second, find the 95% confidence interval for the mean number of hits per second.

Figure 2 shows the confidence intervals for various values of x and α.

Figure 2 – Confidence intervals for the Poisson mean

The requested confidence interval is

1.623486 ≤ μ ≤ 11.66833

as calculated by the formulas in cells C9 and D9:

=CHISQ.INV(B9/2,2*A9)/2

=CHISQ.INV.RT(B9/2,2*(A9+1))/2

Note that CHISQ.INV(p,0) = #NUM! for any value of p, and so we cannot use this formula to calculate the lower bound when x = 0 (cell C4). In any case, this value is zero.

Relationship with Binomial and Normal Distributions

Theorem 1: If the probability p of success of a single trial approaches 0 while the number of trials n approaches infinity and the value μ = np stays fixed, then the binomial distribution B(n, p) approaches the Poisson distribution with mean μ.

Click here for the proof of this theorem.

Observation: Based on Theorem 1 the Poisson distribution can be used to estimate the binomial distribution when n ≥ 50 and p ≤ .01, preferably with np ≤ 5.

Example 3: A company produces high precision bolts so that the probability of a defect is .05%. In a sample of 4,000 units what is the probability of having more than 3 defects?

We can solve this problem using the distribution B(4000, .0005), namely the desired probability is

1 – BINOMDIST(3, 4000, .0005, TRUE) = 1 – 0.857169 = 0.142831

We can also use the Poisson approximation as follows:

μ = np = 4000(.0005) = 2

1 – POISSON(3, 2, TRUE) = 1 – 0.857123 = 0.142877

As you can see the approximation is quite accurate.

Observation: The Poisson distribution can be approximated by the normal distribution, as shown in the following theorem.

Theorem 2: For n sufficiently large (usually n ≥ 20), if x has a Poisson distribution with mean μ, then x ~ N(μ, $sqrt mu$ ).

Test for a Poisson Distribution

The index of dispersion of a data set or distribution is the variance divided by the mean.

Since the mean and variance of a Poisson distribution are equal, data that conforms to a Poisson distribution must have an index of dispersion approximately equal to 1. This fact can be used to test whether a data set has a Poisson distribution, as described in Goodness of Fit.

In fact in Goodness of Fit, we also show how to use the chi-square goodness-of-fit test to determine whether a data set follows a Poisson distribution.

Statistics for Beginners in Excel – Normal Distribution

Statistics for Beginners in Excel – Poisson Distribution

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:

All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.

End-to-End Python Machine Learning Recipes & Examples.

End-to-End R Machine Learning Recipes & Examples.

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.

Towards Advanced Analytics Specialist & Analytics Engineer

Statistics for Beginners in Excel – Poisson Distribution

(Basic Statistics for Citizen Data Scientist)

Poisson Distribution

Statistics for Beginners in Excel – Poisson Distribution

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Related Posts

Utilizing Mixed Models in Economics Research: An In-Depth Guide with Python and R

Operational Database vs. Data Warehouse

ETL vs. ELT: Navigating Data Integration Techniques in Data Warehousing