(Basic Statistics for Citizen Data Scientist)
Basic Statistical Concepts
Statistics plays a central role in research in the social sciences, pure sciences and medicine. A simplified view of experimental research is as follows:
- You make some observations about the world and then create a theory consisting of a hypothesis and possible alternative hypotheses that try to explain the observations you have made
- You then test your theory by conducting experiments. Such experiments include collecting data, analyzing the results and coming to some conclusions about how well your theory holds up
- You iterate this process, observing more about the world and improving your theory
Statistics also plays a central role in decision making for business and government, including marketing, strategic planning, manufacturing and finance.
Statistics is a discipline that is concerned with the collection and analysis of data based on a probabilistic approach. Theories about a general population are tested on a smaller sample and conclusions are made about how well properties of the sample extend to the population at large.
We now briefly define some key terms. These definitions will be further elaborated throughout the rest of the website.
Data and data sets: observations from the environment.
Population: a complete set of data which we wish to study or analyze. A key focus of the field of statistics is the study of characteristics of interest about a population.
Sample: a subset of the data from the population which we analyze in order to learn about the population. A major objective in the field of statistics is to make inferences about a population based on properties of the sample.
Random sample: a sample in which each member of the population has an equal chance of being included and in which the selection of one member is independent from the selection of all other members.
Random variable: a variable that represents value(s) from a random sample. We will use letters at the end of the alphabet, especially x, y and z, as random variables.
Independent random variable: a variable that is chosen, and then measured or manipulated, by the researcher in order to study some observed behavior.
Dependent random variable: a variable whose value depends on the value of one or more independent variables.
Discrete variable: a variable which can take a discrete set of values (e.g. cards in a deck or scores on an IQ test). Discrete variables can take either a finite or infinite set of values, although for our purposes we usually consider discrete variables which only take a finite set of values.
Continuous variable: a variable that can take all the values in a finite or infinite interval (e.g. weight or temperature). A continuous variable can take an infinite set of values.
Statistic: a quantity that is calculated from a sample and is used to estimate a corresponding characteristic (i.e. parameter) about the population from which the sample is drawn.
Data scales: We consider four types of data measurements (i.e. data scales):
Figure 1 – Data scales
Nominal data (also called categorical data) can be labeled, but not calculated or compared. E.g. we can’t say Female < Male or Male < Female. Ordinal data can be compared (thus we can say one data element is greater than another), but they cannot be added or subtracted or calculated in any other way. Nominal and ordinal data are called non-metric data.
Metric data can be manipulated mathematically (i.e. they can be added, subtracted, multiplied, divided, etc.). As we will see, unlike non-metric data, it makes sense to take the mean, standard deviation, etc. of metric data. There are two types of metric data: interval and ratio data. The difference is that ratio data has an absolute zero value, and so it makes sense to say, for example, that one data element is 50% bigger than another or twice as effective as another.
A random variable can be considered metric or non-metric, nominal (or categorical), ordinal, interval or ratio, depending on whether the underlying data corresponding to the random variable has this type.
Python Data Visualisation for Business Analyst – How to plot population pyramid in Python
Statistics for Beginners – Basic Statistical Concepts
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.