(R Example for Citizen Data Scientist & Business Analyst)

This code uses a dataset file with population estimates by the US Census Bureau (more info).

tbl <- read.table(file.choose(),header=TRUE,sep=",")
population <- tbl["POPESTIMATE2009"]
print(summary(population[-1:-5,]))

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
  544300  1734000  4141000  5980000  6613000 36960000

Reading a CSV file

read.table can read a variety of basic data formats into tables or “data frames”.
sep specifies the separator for the data, which is a comma for CSV files.
header indicates whether the first row contains the names of the data columns.

The first argument contains the file name. In this case file.choose is used to show a dialog.

(The user’s home folder is the default working directory in RStudio.)

Indexing data frames

Getting a specific column

You can use the column name as a string in brackets: tbl[“POPESTIMATE2009”]:

   POPESTIMATE2009
1        307006550
2         55283679
3         66836911
[...]

Using the column number also works: tbl[17].

Getting a column as a list

You can use the dollar sign for this: tbl$POPESTIMATE2009

[1] 307006550  55283679  66836911 113317879  71568081   4708708    698473
[8]   6595778   2889450  36961664   5024748   3518288    885122    599657
[...]

Fetching specific rows and columns

Here the table will be treated as a 2-dimensional matrix.
To get the first 5 rows from the population table:

population[1:5,]  #  first the rows, then the columns

[1] 307006550  55283679  66836911 113317879  71568081

The comma after the row information indicates that we want all columns. In this case we could also have written [1:5,1] because we only have 1 column in population.

Look at this data from the first 5 rows in the population column:

[1] 307006550  55283679  66836911 113317879  71568081

These are too big to be population values for US States. They are the total US population and that of the US Census Bureau regions: Northeast, Midwest, South and West.
Since we are only interested in the states we can drop them like this:

population[-1:-5,]

Negative numbers in matrix indices can be used to omit specific rows or columns.

A short equivalent of the code

You can also fetch the population column at the same time as you remove the multi-state rows. Replace

population <- tbl["POPESTIMATE2009"]
print(summary(population[-1:-5,]))

with

print(summary(tbl[-1:-5,"POPESTIMATE2009"]))

The summary function

summary calculates a few values based on the data passed as the first argument. The exact values calculated depend on the class of the data.

summary(1:10)

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.25    5.50    5.50    7.75   10.00

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:

All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.

End-to-End Python Machine Learning Recipes & Examples.

End-to-End R Machine Learning Recipes & Examples.

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.

Towards Advanced Analytics Specialist & Analytics Engineer

R Examples for Beginners – How read data files in R

(R Example for Citizen Data Scientist & Business Analyst)

Reading a CSV file

Indexing data frames

Getting a specific column

Getting a column as a list

Fetching specific rows and columns

A short equivalent of the code

The summary function

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Related Posts

Decoding Biomedical Insights: Mastering Analysis of Variance (ANOVA) in Biomedical Science

A Comprehensive Guide to Experimental Design and Analysis for Agricultural Science

Revolutionizing Predictive Modeling in R with Boosting and AdaBoost