Statistics for Beginners – Basic Statistical Concepts

(Basic Statistics for Citizen Data Scientist)

Basic Statistical Concepts

Statistics plays a central role in research in the social sciences, pure sciences and medicine. A simplified view of experimental research is as follows:

  • You make some observations about the world and then create a theory consisting of a hypothesis and possible alternative hypotheses that try to explain the observations you have made
  • You then test your theory by conducting experiments. Such experiments include collecting data, analyzing the results and coming to some conclusions about how well your theory holds up
  • You iterate this process, observing more about the world and improving your theory

Statistics also plays a central role in decision making for business and government, including marketing, strategic planning, manufacturing and finance.

Statistics is a discipline that is concerned with the collection and analysis of data based on a probabilistic approach. Theories about a general population are tested on a smaller sample and conclusions are made about how well properties of the sample extend to the population at large.

We now briefly define some key terms. These definitions will be further elaborated throughout the rest of the website.

Data and data sets: observations from the environment.

Population: a complete set of data which we wish to study or analyze. A key focus of the field of statistics is the study of characteristics of interest about a population.

Sample: a subset of the data from the population which we analyze in order to learn about the population. A major objective in the field of statistics is to make inferences about a population based on properties of the sample.

Random sample: a sample in which each member of the population has an equal chance of being included and in which the selection of one member is independent from the selection of all other members.

Random variable: a variable that represents value(s) from a random sample. We will use letters at the end of the alphabet, especially x, y and z, as random variables.

Independent random variable: a variable that is chosen, and then measured or manipulated, by the researcher in order to study some observed behavior.

Dependent random variable: a variable whose value depends on the value of one or more independent variables.

Discrete variable: a variable which can take a discrete set of values (e.g. cards in a deck or scores on an IQ test). Discrete variables can take either a finite or infinite set of values, although for our purposes we usually consider discrete variables which only take a finite set of values.

Continuous variable: a variable that can take all the values in a finite or infinite interval (e.g. weight or temperature). A continuous variable can take an infinite set of values.

Statistic: a quantity that is calculated from a sample and is used to estimate a corresponding characteristic (i.e. parameter) about the population from which the sample is drawn.

Data scales: We consider four types of data measurements (i.e. data scales):

data measurement types

Figure 1 – Data scales

Nominal data (also called categorical data) can be labeled, but not calculated or compared. E.g. we can’t say Female < Male or Male < Female. Ordinal data can be compared (thus we can say one data element is greater than another), but they cannot be added or subtracted or calculated in any other way. Nominal and ordinal data are called non-metric data.

Metric data can be manipulated mathematically (i.e. they can be added, subtracted, multiplied, divided, etc.). As we will see, unlike non-metric data, it makes sense to take the mean, standard deviation, etc. of metric data. There are two types of metric data: interval and ratio data. The difference is that ratio data has an absolute zero value, and so it makes sense to say, for example, that one data element is 50% bigger than another or twice as effective as another.

A random variable can be considered metric or non-metric, nominal (or categorical), ordinal, interval or ratio, depending on whether the underlying data corresponding to the random variable has this type.

 

Python Data Visualisation for Business Analyst – How to plot population pyramid in Python

 

Statistics for Beginners – Basic Statistical Concepts

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

A list of FREE programming examples together with eTutorials & eBooks @ SETScholars

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Topics included: Classification, Clustering, Regression, Forecasting, Algorithms, Data Structures, Data Analytics & Data Science, Deep Learning, Machine Learning, Programming Languages and Software Tools & Packages.
(Discount is valid for limited time only)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Please do not waste your valuable time by watching videos, rather use end-to-end (Python and R) recipes from Professional Data Scientists to practice coding, and land the most demandable jobs in the fields of Predictive analytics & AI (Machine Learning and Data Science).

The objective is to guide the developers & analysts to “Learn how to Code” for Applied AI using end-to-end coding solutions, and unlock the world of opportunities!