Beginners Guide to R – Read and Write Excel Files in R

Read and Write Excel Files in R

Excel is the most popular spreadsheet software used to store tabular data. So, it’s important to be able to efficiently import and export data from these files.

R’s xlsx package makes it easy to read, write, and format excel files.

The xlsx Package

The xlsx package provides necessary tools to interact with both .xls or .xlsx format files from R.

In order to get started you first need to install and load the package.

# Install and load xlsx package
install.packages("xlsx")
library("xlsx")

Read an Excel file

Suppose you have the following Excel file.

rexcel01

You can read the contents of an Excel worksheet using the read.xlsx() or read.xlsx2() function.

The read.xlsx() function reads the data and creates a data frame.

# Read the first excel worksheet
library(xlsx)
mydata <- read.xlsx("mydata.xlsx", sheetIndex=1)
mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York
3  Amy  20 Developer  Houston

read.xlsx() vs read.xlsx2()

Both the functions work exactly the same except, read.xlsx() is slow for large data sets (worksheet with more than 100 000 cells).

On the contrary, read.xlsx2() is faster on big files.

Specify a File Name

When you specify the filename only, it is assumed that the file is located in the current folder. If it is somewhere else, you can specify the exact path that the file is located at.

Remember! While specifying the exact path, characters prefaced by  (like n r t etc.) are interpreted as special characters.

You can escape them using:

  • Changing the backslashes to forward slashes like: "C:/data/myfile.xlsx"
  • Using the double backslashes like: "C:\data\myfile.xlsx"
# Specify absolute path like this
mydata <- read.csv("C:/data/mydata.xlsx")

# or like this
mydata <- read.csv("C:\data\mydata.xlsx")

Specify Worksheet

When you use read.xlsx() function, along with a filename you also need to specify the worksheet that you want to import data from.

To specify the worksheet, you can pass either an integer indicating the position of the worksheet (for example, sheetIndex=1) or the name of the worksheet (for example, sheetName="Sheet1" )

The following two lines do exactly the same thing; they both import the data in the first worksheet (called Sheet1):

mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)

mydata <- read.xlsx("mydata.xlsx", sheetIndex = "Sheet1")

Import the Data as is

The read.xlsx() function automatically coerces character data into a factor (categorical variable). You can see that by inspecting the structure of your data frame.

# By default, character data is coerced into a factor
mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1)
str(mydata)
'data.frame':	3 obs. of  4 variables:
 $ name: Factor w/ 3 levels "Amy","Bob","Sam": 2 3 1
 $ age : num  25 30 20
 $ job : Factor w/ 2 levels "Developer","Manager": 2 1 1
 $ city: Factor w/ 3 levels "Houston","New York",..: 3 2 1

If you want your data interpreted as string rather than a factor, set the stringsAsFactors parameter to FALSE.

# Set stringsAsFactors parameter to TRUE to interpret the data as is
mydata <- read.xlsx("mydata.xlsx",
                    sheetIndex = 1,
                    stringsAsFactors = FALSE)
str(mydata)
'data.frame':	3 obs. of  4 variables:
 $ name: chr  "Bob" "Sam" "Amy"
 $ age : num  25 30 20
 $ job : chr  "Manager" "Developer" "Developer"
 $ city: chr  "Seattle" "New York" "Houston"

Read Specific Range

If you want to read a range of rows, specify the rowIndex argument.

# Read first three lines of a file
mydata <- read.xlsx("mydata.xlsx",
                    sheetIndex = 1,
                    rowIndex = 1:3)
mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York

If you want to read a range of columns, specify the colIndex argument.

# Read first two columns of a file
mydata <- read.xlsx("mydata.xlsx",
                    sheetIndex = 1,
                    colIndex = 1:2)
mydata
  name age
1  Bob  25
2  Sam  30
3  Amy  20

Specify Starting Row

Sometimes the excel file (like the file below) may contain notes, comments, headers, etc. at the beginning which you may not want to include.

To start reading data from a specified row in the excel worksheet, pass startRow argument.

rexcel02
# Read excel file from third row
mydata <- read.xlsx("mydata.xlsx",
                    sheetIndex = 1,
                    startRow = 3)
mydata
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York
3  Amy  20 Developer  Houston

Write Data to an Excel File

To write to an existing file, use write.xlsx() method and pass the data in the form of matrix or data frame.

# Export data from R to an excel workbook
df
  name age       job     city
1  Bob  25   Manager  Seattle
2  Sam  30 Developer New York
3  Amy  20 Developer  Houston

write.xlsx(df, file = "mydata.xlsx")
rexcel03

Notice that the write.xlsx() function prepends each row with a row name by default. If you don’t want row labels in your excel file, set row.names to FALSE.

# Remove row labels while writing an excel File
write.xlsx(df, file="mydata.xlsx",
           row.names = FALSE)
rexcel01

To set the name of the current worksheet, specify sheetName argument.

# Rename current worksheet
write.xlsx(df, file="mydata.xlsx",
           row.names = FALSE,
           sheetName = "Records")
rexcel04

Add Multiple Datasets at once

To add multiple data sets in the same Excel workbook, you have to set the append argument to TRUE.

# Write the first data set
write.xlsx(iris, file = "mydata.xlsx",
           sheetName = "IRIS", append = FALSE)

# Add a second data set
write.xlsx(mtcars, file = "mydata.xlsx",
           sheetName = "CARS", append = TRUE)

# Add a third data set
write.xlsx(Titanic, file = "mydata.xlsx",
           sheetName = "TITANIC", append = TRUE)
rexcel05

Create and Format an Excel Workbook

Sometimes you may wish to create a .xlsx file with some formatting. With the help of xlsx package, you can edit titles, borders, column width, format data table, add plot and much more.

The following example shows how to do so:

Step 1. Create a new excel workbook

You can create a new workbook using the createWorkbook() function.

# create new workbook
wb <- createWorkbook()

Step 2. Define cell styles for formatting the workbook

In R, using the CellStyle() function you can create your own cell styles to change the appearance of, for example:

  • The sheet title
  • The row and column names
  • Text alignment for the columns
  • Cell borders around the columns
# define style for title
title_style <- CellStyle(wb) +
  Font(wb, heightInPoints = 16,
       isBold = TRUE)

# define style for row and column names
rowname_style <- CellStyle(wb) +
  Font(wb, isBold = TRUE)
colname_style <- CellStyle(wb) +
  Font(wb, isBold = TRUE) +
  Alignment(wrapText = TRUE, horizontal = "ALIGN_CENTER") +
  Border(color = "black",
         position =c("TOP", "BOTTOM"),
         pen =c("BORDER_THIN", "BORDER_THIN"))

Step 3. Create worksheet and add title

Before you add data, you have to create an empty worksheet in the workbook. You can do this by using the creatSheet() function.

# create a worksheet named 'Data'
ws <- createSheet(wb, sheetName = "Data")

Step 4. Add sheet title

Here’s how you can add a title.

# create a new row 
rows <- createRow(ws, rowIndex = 1)

# create a cell in the row to contain the title.
sheetTitle <- createCell(rows, colIndex = 1)

# set the cell value
setCellValue(sheetTitle[[1,1]], "Vapor Pressure of Mercury")

# set the cell style
setCellStyle(sheetTitle[[1,1]], title_style)

Step 5. Add a table into a worksheet

With the addDataframe() function, you can add the data table in the newly created worksheet.

Below example adds built-in pressure dataset on row #3.

# add data table to worksheet
addDataFrame(pressure, sheet = ws, startRow = 3, startColumn = 1,
             colnamesStyle = colname_style,
             rownamesStyle = rowname_style,
             row.names = FALSE)

Step 6. Add a plot into a worksheet

You can add a plot in the worksheet using the addPicture() function.

# create a png plot
png("plot.png", height=900, width=1600, res=250, pointsize=8)
plot(pressure, xlab = "Temperature (deg C)",
     ylab = "Pressure (mm of Hg)",
     main = "pressure data: Vapor Pressure of Mercury",
     col="red", pch=19, type="b")
dev.off()

# Create a new sheet to contain the plot
sheet <-createSheet(wb, sheetName = "plot")

# Add the plot created previously
addPicture("plot.png", sheet, scale = 1, startRow = 2,
           startColumn = 1)

# Remove the plot from the disk
res<-file.remove("plot.png")

Step 7. Change column width

Now change the column width to fit the contents.

# change column width of first 2 columns
setColumnWidth(sheet = ws, colIndex = 1:2, colWidth = 15)

Step 8. Save the workbook

Finally, save the workbook with the saveWorkbook() function.

# save workbook
saveWorkbook(wb, file = "mydata.xlsx")

Step 9. View the result

r excel format1
r excel format2

 

 

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.