Read and Write Excel Files in R
Excel is the most popular spreadsheet software used to store tabular data. So, it’s important to be able to efficiently import and export data from these files.
R’s xlsx package makes it easy to read, write, and format excel files.
The xlsx Package
The xlsx package provides necessary tools to interact with both .xls or .xlsx format files from R.
In order to get started you first need to install and load the package.
# Install and load xlsx package install.packages("xlsx") library("xlsx")
Read an Excel file
Suppose you have the following Excel file.
You can read the contents of an Excel worksheet using the
read.xlsx() function reads the data and creates a data frame.
# Read the first excel worksheet library(xlsx) mydata <- read.xlsx("mydata.xlsx", sheetIndex=1) mydata name age job city 1 Bob 25 Manager Seattle 2 Sam 30 Developer New York 3 Amy 20 Developer Houston
read.xlsx() vs read.xlsx2()
Both the functions work exactly the same except,
read.xlsx() is slow for large data sets (worksheet with more than 100 000 cells).
On the contrary,
read.xlsx2() is faster on big files.
Specify a File Name
When you specify the filename only, it is assumed that the file is located in the current folder. If it is somewhere else, you can specify the exact path that the file is located at.
Remember! While specifying the exact path, characters prefaced by
(like n r t etc.) are interpreted as special characters.
You can escape them using:
- Changing the backslashes to forward slashes like:
- Using the double backslashes like:
# Specify absolute path like this mydata <- read.csv("C:/data/mydata.xlsx") # or like this mydata <- read.csv("C:\data\mydata.xlsx")
When you use
read.xlsx() function, along with a filename you also need to specify the worksheet that you want to import data from.
To specify the worksheet, you can pass either an integer indicating the position of the worksheet (for example,
sheetIndex=1) or the name of the worksheet (for example,
The following two lines do exactly the same thing; they both import the data in the first worksheet (called Sheet1):
mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1) mydata <- read.xlsx("mydata.xlsx", sheetIndex = "Sheet1")
Import the Data as is
read.xlsx() function automatically coerces character data into a factor (categorical variable). You can see that by inspecting the structure of your data frame.
# By default, character data is coerced into a factor mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1) str(mydata) 'data.frame': 3 obs. of 4 variables: $ name: Factor w/ 3 levels "Amy","Bob","Sam": 2 3 1 $ age : num 25 30 20 $ job : Factor w/ 2 levels "Developer","Manager": 2 1 1 $ city: Factor w/ 3 levels "Houston","New York",..: 3 2 1
If you want your data interpreted as string rather than a factor, set the
stringsAsFactors parameter to FALSE.
# Set stringsAsFactors parameter to TRUE to interpret the data as is mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1, stringsAsFactors = FALSE) str(mydata) 'data.frame': 3 obs. of 4 variables: $ name: chr "Bob" "Sam" "Amy" $ age : num 25 30 20 $ job : chr "Manager" "Developer" "Developer" $ city: chr "Seattle" "New York" "Houston"
Read Specific Range
If you want to read a range of rows, specify the rowIndex argument.
# Read first three lines of a file mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1, rowIndex = 1:3) mydata name age job city 1 Bob 25 Manager Seattle 2 Sam 30 Developer New York
If you want to read a range of columns, specify the colIndex argument.
# Read first two columns of a file mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1, colIndex = 1:2) mydata name age 1 Bob 25 2 Sam 30 3 Amy 20
Specify Starting Row
Sometimes the excel file (like the file below) may contain notes, comments, headers, etc. at the beginning which you may not want to include.
To start reading data from a specified row in the excel worksheet, pass startRow argument.
# Read excel file from third row mydata <- read.xlsx("mydata.xlsx", sheetIndex = 1, startRow = 3) mydata name age job city 1 Bob 25 Manager Seattle 2 Sam 30 Developer New York 3 Amy 20 Developer Houston
Write Data to an Excel File
To write to an existing file, use
write.xlsx() method and pass the data in the form of matrix or data frame.
# Export data from R to an excel workbook df name age job city 1 Bob 25 Manager Seattle 2 Sam 30 Developer New York 3 Amy 20 Developer Houston write.xlsx(df, file = "mydata.xlsx")
Notice that the
write.xlsx() function prepends each row with a row name by default. If you don’t want row labels in your excel file, set
row.names to FALSE.
# Remove row labels while writing an excel File write.xlsx(df, file="mydata.xlsx", row.names = FALSE)
To set the name of the current worksheet, specify sheetName argument.
# Rename current worksheet write.xlsx(df, file="mydata.xlsx", row.names = FALSE, sheetName = "Records")
Add Multiple Datasets at once
To add multiple data sets in the same Excel workbook, you have to set the append argument to TRUE.
# Write the first data set write.xlsx(iris, file = "mydata.xlsx", sheetName = "IRIS", append = FALSE) # Add a second data set write.xlsx(mtcars, file = "mydata.xlsx", sheetName = "CARS", append = TRUE) # Add a third data set write.xlsx(Titanic, file = "mydata.xlsx", sheetName = "TITANIC", append = TRUE)
Create and Format an Excel Workbook
Sometimes you may wish to create a .xlsx file with some formatting. With the help of xlsx package, you can edit titles, borders, column width, format data table, add plot and much more.
The following example shows how to do so:
Step 1. Create a new excel workbook
You can create a new workbook using the
# create new workbook wb <- createWorkbook()
Step 2. Define cell styles for formatting the workbook
In R, using the
CellStyle() function you can create your own cell styles to change the appearance of, for example:
- The sheet title
- The row and column names
- Text alignment for the columns
- Cell borders around the columns
# define style for title title_style <- CellStyle(wb) + Font(wb, heightInPoints = 16, isBold = TRUE) # define style for row and column names rowname_style <- CellStyle(wb) + Font(wb, isBold = TRUE) colname_style <- CellStyle(wb) + Font(wb, isBold = TRUE) + Alignment(wrapText = TRUE, horizontal = "ALIGN_CENTER") + Border(color = "black", position =c("TOP", "BOTTOM"), pen =c("BORDER_THIN", "BORDER_THIN"))
Step 3. Create worksheet and add title
Before you add data, you have to create an empty worksheet in the workbook. You can do this by using the
# create a worksheet named 'Data' ws <- createSheet(wb, sheetName = "Data")
Step 4. Add sheet title
Here’s how you can add a title.
# create a new row rows <- createRow(ws, rowIndex = 1) # create a cell in the row to contain the title. sheetTitle <- createCell(rows, colIndex = 1) # set the cell value setCellValue(sheetTitle[[1,1]], "Vapor Pressure of Mercury") # set the cell style setCellStyle(sheetTitle[[1,1]], title_style)
Step 5. Add a table into a worksheet
addDataframe() function, you can add the data table in the newly created worksheet.
Below example adds built-in pressure dataset on row #3.
# add data table to worksheet addDataFrame(pressure, sheet = ws, startRow = 3, startColumn = 1, colnamesStyle = colname_style, rownamesStyle = rowname_style, row.names = FALSE)
Step 6. Add a plot into a worksheet
You can add a plot in the worksheet using the
# create a png plot png("plot.png", height=900, width=1600, res=250, pointsize=8) plot(pressure, xlab = "Temperature (deg C)", ylab = "Pressure (mm of Hg)", main = "pressure data: Vapor Pressure of Mercury", col="red", pch=19, type="b") dev.off() # Create a new sheet to contain the plot sheet <-createSheet(wb, sheetName = "plot") # Add the plot created previously addPicture("plot.png", sheet, scale = 1, startRow = 2, startColumn = 1) # Remove the plot from the disk res<-file.remove("plot.png")
Step 7. Change column width
Now change the column width to fit the contents.
# change column width of first 2 columns setColumnWidth(sheet = ws, colIndex = 1:2, colWidth = 15)
Step 8. Save the workbook
Finally, save the workbook with the
# save workbook saveWorkbook(wb, file = "mydata.xlsx")
Step 9. View the result
Python Example for Beginners
Two Machine Learning Fields
There are two sides to machine learning:
- Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
- Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.
Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes
Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!
Latest end-to-end Learn by Coding Recipes in Project-Based Learning:
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.