Beginners Guide to R – Read and Write CSV Files in R

Hits: 1468

Read and Write CSV Files in R

One of the easiest and most reliable ways of getting data into R is to use CSV files.

The CSV file (Comma Separated Values file) is a widely supported file format used to store tabular data. It uses commas to separate the different values in a line, where each line is a row of data.

R’s Built-in csv parser makes it easy to read, write, and process data from CSV files.

Read a CSV File

Suppose you have the following CSV file.

mydata.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

You can open a file and read its contents by using the read.csv() function specifying its name. It reads the data into a data frame.

# Read entire CSV file into a data frame
mydata <- read.csv("mydata.csv")
mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

Specify a File

When you specify the filename only, it is assumed that the file is located in the current folder. If it is somewhere else, you can specify the exact path that the file is located at.

You can escape them by:

  • Changing the backslashes to forward slashes like: "C:/data/myfile.csv"
  • Using the double backslashes like: "C:\data\myfile.csv"
# Specify absolute path like this
mydata <- read.csv("C:/data/mydata.csv")

# or like this
mydata <- read.csv("C:\data\mydata.csv")

If you want to read CSV Data from the Web, substitute a URL for a file name. The read.csv() functions will read directly from the remote server.

# Read CSV file from Web
mydata <- read.csv("http://www.example.com/download/mydata.csv")

R can also read data from FTP servers, not just HTTP servers.

# Read CSV file from FTP server
mydata <- read.csv("ftp://ftp.example.com/download/mydata.csv")

Set Column Names

The read.csv() function assumes that the first line of your file is a header line. It takes the column names from the header line for the data frame.

# By default first row is used to name columns
mydata <- read.csv("mydata.csv")
mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

If your file does not contain a header like the file below, then you should specify header=FALSE so that R can create column names for you (V1, V2, V3 and V4 in this case)

mydata.csv
Bob,25,Manager,Seattle
Sam,30,Developer,New York
# If your file doesn't contain a header, set header to FALSE
mydata <- read.csv("mydata.csv",
                   header = FALSE)
mydata
   V1 V2        V3       V4
1 Bob 25   Manager  Seattle
2 Sam 30 Developer New York

However, if you want to manually set the column names, you specify col.names argument.

# Manually set the column names
mydata <- read.csv("mydata.csv",
                   header = FALSE,
                   col.names = c("name", "age", "job", "city"))
mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

Import the Data as is

The read.csv() function automatically coerces non-numeric data into a factor (categorical variable). You can see that by inspecting the structure of your data frame.

# By default, non-numeric data is coerced into a factor
mydata <- read.csv("mydata.csv")
str(mydata)
'data.frame':	2 obs. of  4 variables:
 $ name: Factor w/ 2 levels "Bob","Sam": 1 2
 $ age : int  25 30
 $ job : Factor w/ 2 levels "Developer","Manager": 2 1
 $ city: Factor w/ 2 levels "New York","Seattle": 2 1

If you want your data interpreted as string rather than a factor, set the as.is parameter to TRUE.

# Set as.is parameter to TRUE to interpret the data as is
mydata <- read.csv("mydata.csv",
                   as.is = TRUE)
str(mydata)
'data.frame':	2 obs. of  4 variables:
 $ name: chr  "Bob" "Sam"
 $ age : int  25 30
 $ job : chr  "Manager" "Developer"
 $ city: chr  "Seattle" "New York"

Set the Classes of the Columns

You can manually set the classes of the columns using the colClasses argument.

mydata <- read.csv("mydata.csv",
                   colClasses = c("character", "integer", "factor", "character"))
str(mydata)
'data.frame':	2 obs. of  4 variables:
 $ name: chr  "Bob" "Sam"
 $ age : int  25 30
 $ job : Factor w/ 2 levels "Developer","Manager": 2 1
 $ city: chr  "Seattle" "New York"

Limit the Number of Rows Read

If you want to limit the number of rows to read in, specify nrows argument.

# Read only one record from CSV
mydata <- read.csv("mydata.csv",
                   nrows = 1)
mydata
  name age     job    city
1  Bob  25 Manager Seattle

Handle Comma Within a Data

Sometimes your CSV file contains fields such as an address that contains a comma. This can become a problem when working with a CSV file.

To handle comma within a data, wrap it in quotes. R considers a comma in a quoted string as an ordinary character.

You can specify the character to be used for quoting using the quote argument.

mydata.csv
name,age,address
Bob,25,"113 Cherry St, Seattle, WA 98104, USA"
Sam,30,"150 Greene St, New York, NY 10012, USA"
mydata <- read.csv("mydata.csv"
                   quote = '"')
mydata
  name age                                address
1  Bob  25  113 Cherry St, Seattle, WA 98104, USA
2  Sam  30 150 Greene St, New York, NY 10012, USA

Write a CSV File

To write to an existing file, use write.csv() method and pass the data in the form of matrix or data frame.

# Write a CSV File from a data frame
df
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

write.csv(df, "mydata.csv")
mydata.csv
"","name","age","job","city"
"1","Bob","25","Manager","Seattle"
"2","Sam","30","Developer","New York"

Notice that the write.csv() function prepends each row with a row name by default. If you don’t want row labels in your CSV file, set row.names to FALSE.

# Remove row labels while writing a CSV File
write.csv(df, "mydata.csv",
          row.names = FALSE)
mydata.csv
"name","age","job","city"
"Bob","25","Manager","Seattle"
"Sam","30","Developer","New York"

Notice that all the values are surrounded by double quotes by default. Set quote = FALSE to change that.

# Write a CSV file without quotes
write.csv(df, "mydata.csv",
          row.names = FALSE,
          quote = FALSE)
mydata.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York

Append Data to a CSV File

By default, the write.csv() function overwrites entire file content. To append the data to a CSV File, use the write.table() method instead and set append = TRUE.

df
  name age       job    city
1  Amy  20 Developer Houston

write.table(df, "mydata.csv",
          append = TRUE,
          sep = ",",
          col.names = FALSE,
          row.names = FALSE,
          quote = FALSE)
mydata.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Amy,20,Developer,Houston

 

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.