Data could exist in various formats. For each format R has a specific function and argument. This tutorial explains how to import data to R.

In this tutorial, you will learn

Read CSV
Read Excel files
readxl_example()
read_excel()
excel_sheets()
Import data from other Statistical software
Read sas
Read STATA
Read SPSS
Best practices for Data Import

Read CSV

One of the most widely data store is the .csv (comma-separated values) file formats. R loads an array of libraries during the start-up, including the utils package. This package is convenient to open csv files combined with the reading.csv() function. Here is the syntax for read.csv

read.csv(file, header = TRUE, sep = ",")

Argument:

file: PATH where the file is stored
header: confirm if the file has a header or not, by default, the header is set to TRUE
sep: the symbol used to split the variable. By default, `,`.

We will read the data file name mtcats. The csv file is stored online. If your .csv file is stored locally, you can replace the PATH inside the code snippet. Don’t forget to wrap it inside ‘ ‘. The PATH needs to be a string value.

For mac user, the path for the download folder is:

 "/Users/USERNAME/Downloads/FILENAME.csv"

For windows user:

"C:\Users\USERNAME\Downloads\FILENAME.csv"

Note that, we should always specify the extension of the file name.

.csv
.xlsx
.txt
…

PATH <- '/datafolder/mtcars.csv'                
df <- read.csv(PATH, header =  TRUE, sep = ',')
length(df)

Output:

## [1] 12

class(df$X)

Output:

## [1] "factor"

R, by default, returns character values as Factor. We can turn off this setting by adding stringsAsFactors = FALSE.

PATH <- '/datafolder/mtcars.csv'
df <-read.csv(PATH, header =TRUE, sep = ',', stringsAsFactors =FALSE)
class(df$X)

Output:

## [1] "character"

The class for the variable X is now a character.

Read Excel files

Excel files are very popular among data analysts. Spreadsheets are easy to work with and flexible. R is equipped with a library readxl to import Excel spreadsheet.

Use this code

require(readxl)

to check if readxl is installed in your machine. If you install r with r-conda-essential, the library is already installed. You should see in the command window:

Output:

Loading required package: readxl.

If the package does not exit, you can install it with the conda library or in the terminal, use conda install -c mittner r-readxl.

Use the following command to load the library to import excel files.

library(readxl)

readxl_example()

We use the examples included in the package readxl during this tutorial.

Use code

readxl_example()

to see all the available spreadsheets in the library.

To check the location of the spreadsheet named clippy.xls, simple use

readxl_example("geometry.xls")

If you install R with conda, the spreadsheets are located in Anaconda3/lib/R/library/readxl/extdata/filename.xls

read_excel()

The function read_excel() is of great use when it comes to opening xls and xlsx extention.

The syntax is:

read_excel(PATH, sheet = NULL, range= NULL, col_names = TRUE)
arguments:
-PATH: Path where the excel is located
-sheet: Select the sheet to import. By default, all
-range: Select the range to import. By default, all non-null cells
-col_names: Select the columns to import. By default, all non-null columns

We can import the spreadsheets from the readxl library and count the number of columns in the first sheet.

# Store the path of `datasets.xlsx`
example <- readxl_example("datasets.xlsx")
# Import the spreadsheet
df <- read_excel(example)
# Count the number of columns
length(df)

Output:

## [1] 5

excel_sheets()

The file datasets.xlsx is composed of 4 sheets. We can find out which sheets are available in the workbook by using excel_sheets() function

example <- readxl_example("datasets.xlsx")

excel_sheets(example)

Output:

[1] "iris"     "mtcars"   "chickwts" "quakes"

If a worksheet includes many sheets, it is easy to select a particular sheet by using the sheet arguments. We can specify the name of the sheet or the sheet index. We can verify if both function returns the same output with identical().

example <- readxl_example("datasets.xlsx")
quake <- read_excel(example, sheet = "quakes")
quake_1 <-read_excel(example, sheet = 4)
identical(quake, quake_1)

Output:

## [1] TRUE

We can control what cells to read in 2 ways

Use n_max argument to return n rows
Use range argument combined with cell_rows or cell_cols

For example, we set n_max equals to 5 to import the first five rows.

# Read the first five row: with header
iris <-read_excel(example, n_max =5, col_names =TRUE)

If we change col_names to FALSE, R creates the headers automatically.

# Read the first five row: without header
iris_no_header <-read_excel(example, n_max =5, col_names =FALSE)

iris_no_header

In the data frame iris_no_header, R created five new variables named X__1, X__2, X__3, X__4 and X__5

We can also use the argument range to select rows and columns in the spreadsheet. In the code below, we use the excel style to select the range A1 to B5.

# Read rows A1 to B5
example_1 <-read_excel(example, range = "A1:B5", col_names =TRUE)
dim(example_1)

Output:

## [1] 4 2

We can see that the example_1 returns 4 rows with 2 columns. The dataset has header, that the reason the dimension is 4×2.

In the second example, we use the function cell_rows() which controls the range of rows to return. If we want to import the rows 1 to 5, we can set cell_rows(1:5). Note that, cell_rows(1:5) returns the same output as cell_rows(5:1).

# Read rows 1 to 5
example_2 <-read_excel(example, range =cell_rows(1:5),col_names =TRUE)			
dim(example_2)

Output:

## [1] 4 5

The example_2 however is a 4×5 matrix. The iris dataset has 5 columns with header. We return the first four rows with header of all columns

In case we want to import rows which do not begin at the first row, we have to include col_names = FALSE. If we use range = cell_rows(2:5), it becomes obvious our data frame does not have header anymore.

iris_row_with_header <-read_excel(example, range =cell_rows(2:3), col_names=TRUE)
iris_row_no_header <-read_excel(example, range =cell_rows(2:3),col_names =FALSE)

We can select the columns with the letter, like in Excel.
# Select columns A and B
col <-read_excel(example, range =cell_cols("A:B"))
dim(col)

Output:

## [1] 150   2

Note : range = cell_cols(“A:B”), returns output all cells with non-null value. The dataset contains 150 rows, therefore, read_excel() returns rows up to 150. This is verified with the dim() function.

read_excel() returns NA when a symbol without numerical value appears in the cell. We can count the number of missing values with the combination of two functions

sum
is.na

Here is the code

iris_na <-read_excel(example, na ="setosa")
sum(is.na(iris_na))

Output:

## [1] 50

We have 50 values missing, which are the rows belonging to the setosa species.

Import data from other Statistical software

We will import different files format with the heaven package. This package support SAS, STATA and SPSS softwares. We can use the following function to open different types of dataset, according to the extension of the file:

SAS: read_sas()
STATA: read_dta() (or read_stata(), which are identical)
SPSS: read_sav() or read_por(). We need to check the extension

Only one argument is required within these function. We need to know the PATH where the file is stored. That’s it, we are ready to open all the files from SAS, STATA and SPSS. These three function accepts an URL as well.

library(haven)

haven comes with conda r-essential otherwise go to the link or in the terminal conda install -c conda-forge r-haven

Read sas

For our example, we are going to use the admission dataset from IDRE.

PATH_sas <- '/datafolder/binary.sas7bdat?raw=true'
df <- read_sas(PATH_sas)
head(df)

Output:

## # A tibble: 6 x 4
##   ADMIT   GRE   GPA  RANK
##   <dbl> <dbl> <dbl> <dbl>
## 1     0   380  3.61     3
## 2     1   660  3.67     3
## 3     1   800  4.00     1
## 4     1   640  3.19     4
## 5     0   520  2.93     4
## 6     1   760  3.00     2

Read STATA

For STATA data files you can use read_dta(). We use exactly the same dataset but store in .dta file.

PATH_stata <- '/datafolder/binary.dta?raw=true'
df <- read_dta(PATH_stata)
head(df)

Output:

## # A tibble: 6 x 4				
##   admit   gre   gpa  rank				
##   <dbl> <dbl> <dbl> <dbl>				
## 1     0   380  3.61     3				
## 2     1   660  3.67     3				
## 3     1   800  4.00     1				
## 4     1   640  3.19     4				
## 5     0   520  2.93     4				
## 6     1   760  3.00     2

Read SPSS

We use the read_sav()function to open a SPSS file. The file extension “.sav”

PATH_spss <- '/datafolder/binary.sav?raw=true'
df <- read_sav(PATH_spss)
head(df)

Output:

## # A tibble: 6 x 4				
##   admit   gre   gpa  rank				
##   <dbl> <dbl> <dbl> <dbl>				
## 1     0   380  3.61     3				
## 2     1   660  3.67     3			
## 3     1   800  4.00     1				
## 4     1   640  3.19     4				
## 5     0   520  2.93     4				
## 6     1   760  3.00     2

Best practices for Data Import

When we want to import data into R, it is useful to implement following checklist. It will make it easy to import data correctly into R:

The typical format for a spreadsheet is to use the first rows as the header (usually variables name).
Avoid to name a dataset with blank spaces; it can lead to interpreting as a separate variable. Alternatively, prefer to use ‘_’ or ‘-.’
Short names are preferred
Do not include symbol in the name: i.e: exchange_rate_$_€ is not correct. Prefer to name it: exchange_rate_dollar_euro
Use NA for missing values otherwise; we need to clean the format later.

Summary

Following table summarizes the function to use in order to import different types of file in R. The column one states the library related to the function. The last column refers to the default argument.

Library	Objective	Function	Default Arguments
utils	Read CSV file	read.csv()	file, header =,TRUE, sep = “,”
readxl	Read EXCEL file	read_excel()	path, range = NULL, col_names = TRUE
haven	Read SAS file	read_sas()	path
haven	Read STATA file	read_stata()	path
haven	Read SPSS fille	read_sav()	path

Following table shows the different ways to import a selection with read_excel() function.

Function	Objectives	Arguments
read_excel()	Read n number of rows	n_max = 10
	Select rows and columns like in excel	range = “A1:D10”
	Select rows with indexes	range= cell_rows(1:3)
	Select columns with letters	range = cell_cols(“A:C”)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:

All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.

End-to-End Python Machine Learning Recipes & Examples.

End-to-End R Machine Learning Recipes & Examples.

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.

R tutorials for Business Analyst – R Exporting Data to Excel, CSV, SAS, STATA, Text File

R Examples for Beginners – How read data files in R

Beginners tutorial with R – How to load CSV Files

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Towards Advanced Analytics Specialist & Analytics Engineer

R tutorials for Business Analyst – Import Data into R: Read CSV, Excel, SPSS, Stata, SAS Files

(R Tutorials for Business Analyst)

Import Data into R: Read CSV, Excel, SPSS, Stata, SAS Files

Read CSV

Read Excel files

readxl_example()

read_excel()

excel_sheets()

Import data from other Statistical software

Read sas

Read STATA

Read SPSS

Best practices for Data Import

Summary

Personal Career & Learning Guide for Data Analyst, Data Engineer and Data Scientist

Applied Machine Learning & Data Science Projects and Coding Recipes for Beginners

95% Discount on “Projects & Recipes, tutorials, ebooks”

Projects and Coding Recipes, eTutorials and eBooks: The best All-in-One resources for Data Analyst, Data Scientist, Machine Learning Engineer and Software Developer

Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners

Related Posts

Decoding Biomedical Insights: Mastering Analysis of Variance (ANOVA) in Biomedical Science

Revolutionizing Predictive Modeling in R with Boosting and AdaBoost

Optimizing Predictive Analysis: Linear Regression with Gradient Descent in Machine Learning