(Basic Statistics for Citizen Data Scientist)
ROC and Classification Table Data Analysis Tool
Real Statistics Data Analysis Tools: The Real Statistics Resource Pack supplies the ROC Curve and Classification Table data analysis tool which provides an easier way to construct the ROC curve and classification table. We show how this is done for Example 1 of Classification Table and ROC Curve.
The tool accepts two input formats: one with two columns (e.g. B24:C34 of Figure 1 of Classification Table) and another with three columns, which we illustrate in Figure 1.
Figure 1 – Data input for ROC (three column format)
The format is similar to that in Figure 1 of ROC Curve except that only the upper bounds of the intervals are shown in column A. These correspond to the bins in the histograms, except that we now have two frequency columns (B and C) instead of just one.
To perform the analysis, press Ctrl-m and double-click on the ROC Curve and Classification Table data analysis tool. Fill in the dialog box that appears as shown in Figure 2.
Figure 2 – ROC Curve and Classification Table dialog box
Note that we choose a cutoff at the 5th row of the data by specifying the upper limit of failure range, namely 10. Also note that in Example 1 of ROC Curve we estimated the area under the ROC curve (AUC) via rectangles. This time we estimate AUC by using trapezoids instead.
After clicking on the OK button on the dialog box, the output shown in Figure 3 is produced.
Figure 3 – Output from ROC Curve and Classification Table data analysis tool
In addition, the output also includes the ROC curve shown in Figure 1 of ROC Curve.
The classification table is identical to that shown in 1 of Classification Table and the ROC Table with the exception of the AUC values is the same as that shown in Figure 1 of ROC Curve. The AUC values are slightly different since the area under the ROC curve is estimated via trapezoids instead of rectangles.
Observation: If we use the data from Figure 1 of Classification Table as input then we would insert B24:C34 (two-column format) from Figure 1 of Classification Table into the Input Range field in the dialog box in Figure 2 and set the Cutoff to 5 (5th row). The output would be the same as that described above.
Observation: The output shown in Figure 3 also includes the 95% confidence interval for the AUC (range M12:M15). Note that the 95% corresponds to an alpha value of .05 in cell M12 since 95% = 1 – .05. You can change the alpha value in M12 of the output and the corresponding confidence interval will change automatically (e.g. inserting .01 in M12 will generate the 99% confidence interval in M14 and M15).
Statistics for Beginners in Excel – ROC and Classification Table Data Analysis Tool
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.