Statistics for Beginners

Statistics for Beginners in Excel – Outliers and Robustness

(Basic Statistics for Citizen Data Scientist) Outliers and Robustness One problem that we face in analyzing data is the presence of outliers, i.e. a data element that is much bigger or much smaller than the other data elements. For example, the mean of the sample {2, 3, 4, 5, 6} is 4, while the mean of …

Statistics for Beginners in Excel – ROC and Classification Table Data Analysis Tool

(Basic Statistics for Citizen Data Scientist) ROC and Classification Table Data Analysis Tool Real Statistics Data Analysis Tools: The Real Statistics Resource Pack supplies the ROC Curve and Classification Table data analysis tool which provides an easier way to construct the ROC curve and classification table. We show how this is done for Example 1 of Classification Table and ROC Curve. …

Statistics for Beginners in Excel – AUC Confidence Interval

(Basic Statistics for Citizen Data Scientist) AUC Confidence Interval For large samples, AUC (area under the curve for a ROC curve) is approximately normally distributed, and so a 1-α confidence interval for AUC may be calculated as described in Confidence Interval for Sampling Distributions. The confidence interval is equal to AUC  ± se · zcrit where zcrit is the two-tailed critical value …

Statistics for Beginners in Excel – ROC Curve

(Basic Statistics for Citizen Data Scientist) ROC Curve The ROC Curve is a plot of values of the False Positive Rate (FPR) versus the True Positive Rate (TPR) for a specified cutoff value. Example 1: Create the ROC curve for Example 1 of Classification Table. We begin by creating the ROC table as shown on the left side …

Statistics for Beginners with Excel – Classification Table

(Basic Statistics for Citizen Data Scientist) Classification Table The Classification Table compares the predicted number of successes  to the number successes actually observed and similarly the predicted number of failures compared to the number actually observed. We have four possible outcomes: True Positives (TP) = the number of cases which were correctly classified to be positive, i.e. …

Statistics for Beginners with Excel – Dot Plots

(Basic Statistics for Citizen Data Scientist) Dot Plots A Dot Plot is another way to view data graphically. A dot plot is somewhat similar to a box plot, except that instead of summarizing the data in each group (the brands in Example 1 of Box Plots), the actual data values are plotted. Real Statistics Data Analysis Tool: To …

Statistics for Beginners with Excel – Creating Box Plots

(Basic Statistics for Citizen Data Scientist) Creating Box Plots in Excel Another way to characterize a distribution or a sample is via a box plot (aka a box and whiskers plot). Specifically, a box plot provides a pictorial representation of the following statistics: maximum, 75th percentile, median (50th percentile), mean, 25th percentile and minimum. Box plots are especially useful when comparing samples …

Statistics for Beginners with Excel – Histograms

(Basic Statistics for Citizen Data Scientist) Histograms A histogram is a graphical representation of the output of the FREQUENCY function (as described in Frequency Tables). Example 1: Create a histogram for the data and bin selection for Example 1 from Frequency Tables. We start by replicating the data and bin section for Example 1 in Figure 1. …

Statistics for Beginners with Excel – Frequency Tables

(Basic Statistics for Citizen Data Scientist) Frequency Tables Often data is presented in the form of a frequency table. For example, the data in range A4:A11 of Figure 1 can be expressed by the frequency table in range C4:D7. Figure 1 – Frequency Table The table in Figure 1 shows that the data element 2 occurs …

Statistics for Beginners with Excel – Descriptive Statistics Tools

(Basic Statistics for Citizen Data Scientist) Descriptive Statistics Tools Excel provides a data analysis tool called Descriptive Statistics which produces a summary of the key statistics for a data set. Example 1: Provide a table of the most common descriptive statistics for the scores in column A of Figure 1. Figure 1 – Output from Descriptive Statistics data …