HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R
This article presents the fs R package, which provides a cross-platform, uniform interface to file system operations.
fs functions are divided into four main categories:
path_
for manipulating and constructing pathsfile_
for filesdir_
for directorieslink_
for links
Contents:
- Prerequistes
- Some Key R functions
- Basic usage
- Filter files
- Read a collection of files into one data frame
Prerequistes
Install the package from CRAN (install.packages("fs")
) or from GitHub (devtools::install_github("r-lib/fs")
)
Load required packages:
library("fs") # File manipulations
library(tidyverse) # Data manipulation
Some Key R functions
File manipulation:
file_copy(), dir_copy(), link_copy()
: Copy files, directories or linksfile_create(), dir_create(), link_create()
: Create files, directories, or linksfile_delete(), dir_delete(), link_delete()
: Delete files, directories, or linksfile_access(), file_exists(), dir_exists(), link_exists()
: Query for existence and access permissionsfile_chmod()
: Change file permissionsfile_chown()
: Change owner or group of a filefile_info()
: Query file metadatafile_move()
: Move or rename files
Path manipulation:
path(), path_wd()
: Construct path to a file or directoryfile_temp(), path_temp()
: Create names for temporary filespath_expand(), path_expand_r(), path_home(), path_home_r()
: Finding the User Home Directorypath_file() path_dir() path_ext() path_ext_remove() path_ext_set()
: Manipulate file paths- path_file() returns the filename portion of the path,
- path_dir() returns the directory portion,
- path_ext() returns the last extension (if any) for a path,
- path_ext_remove() removes the last extension and returns the rest of the path,
- path_ext_set() replaces the extension with a new extension. If there is no existing extension the new extension is appended.
path_filter()
: Filter pathspath_real() path_split() path_join() path_abs() path_norm() path_rel() path_common() path_has_parent()
: Path computations- path_real: returns the canonical path
- path_split: splits paths into parts
- path_abs: returns a normalized, absolute version of a path
- path_norm: eliminates . references and rationalizes up-level .. references, so A/./B and A/foo/../B both become A/B, but ../B is not changed. If one of the paths is a symbolic link, this may change the meaning of the path, so consider using path_real() instead.
- path_common: finds the common parts of two (or more) paths.
- path_has_parent: determine if a path has a given parent.
Helpers:
is_file(), is_dir(), is_link()
: Functions to test for file types
Basic usage
- List the files in a directory/folder
- Create and delete files/directory
# Construct a path to a file with `path()`
path("foo", "bar", letters[1:3], ext = "txt")
## foo/bar/a.txt foo/bar/b.txt foo/bar/c.txt
# list files in the current directory
dir_ls()
## 002-create-icon.html
## 003-r-histogram-example.html
## _output.yaml
## _settings.R
## _settings.Rmd
## book.bib
## correlation-matrix-analysis-in-r-using-corrr.html
## correlation-network-using-corrr.html
## figures
## file-and-directory-manipulation.Rmd
## gganimate.html
## gghighlight.html
## include
## interactive-data-summary.html
## libs
## mathjax.Rmd
## packages.bib
## plot-all-variables-in-a-dataset.html
## plot-one-variable-against-multiples-others.html
## wp-content
# create a new directory
tmp <- dir_create(file_temp())
tmp
## /var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/Rtmp6lCt2d/filed958126c105c
# create new files in that directory
file_create(path(tmp, "my-file.txt"))
dir_ls(tmp)
## /var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/Rtmp6lCt2d/filed958126c105c/my-file.txt
# remove files from the directory
file_delete(path(tmp, "my-file.txt"))
dir_ls(tmp)
## character(0)
# remove the directory
dir_delete(tmp)
Filter files
Filter files by type, permission and size
dir_info(path = ".", recursive = FALSE) %>%
filter(type == "file", permissions == "u+r", size > "10KB") %>%
arrange(desc(size)) %>%
select(path, permissions, size, modification_time)
## # A tibble: 2 x 4
## path permissions size
## <fs::path> <fs::perms> <fs:>
## 1 correlation-matrix-analysis-in-r-using-corrr.html rw-r--r-- 20.3K
## 2 gganimate.html rw-r--r-- 15.1K
## # … with 1 more variable: modification_time <dttm>
Tabulate and display folder size.
dir_info(path = ".", recursive = TRUE) %>%
group_by(directory = path_dir(path)) %>%
tally(wt = size, sort = TRUE)
## # A tibble: 37 x 2
## directory n
## <fs::path> <fs::bytes>
## 1 https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-tutorial/images 11.76M
## 2 https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-tutorial/figures 2.36M
## 3 libs/bootstrap-3.3.5/css 2.31M
## 4 libs/plotlyjs-1.16.3 1.66M
## 5 libs/bootstrap-3.3.5/css/fonts 953.23K
## 6 libs/font-awesome-4.1.0/fonts 611.77K
## # … with 31 more rows
Read a collection of files into one data frame
dir_ls()
returns a named vector, so it can be used directly with purrr::map_df(.id)
.
# Create separate files for each species
iris %>%
split(.$Species) %>%
map(select, -Species) %>%
iwalk(~ write_tsv(.x, paste0(.y, ".tsv")))
# Show the files
iris_files <- dir_ls(glob = "*.tsv")
iris_files
## setosa.tsv versicolor.tsv virginica.tsv
# Read the data into a single table, including the filenames
iris_files %>%
map_df(read_tsv, .id = "file", col_types = cols(), n_max = 2)
## # A tibble: 6 x 5
## file Sepal.Length Sepal.Width Petal.Length Petal.Width
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 setosa.tsv 5.1 3.5 1.4 0.2
## 2 setosa.tsv 4.9 3 1.4 0.2
## 3 versicolor.tsv 7 3.2 4.7 1.4
## 4 versicolor.tsv 6.4 3.2 4.5 1.5
## 5 virginica.tsv 6.3 3.3 6 2.5
## 6 virginica.tsv 5.8 2.7 5.1 1.9
file_delete(iris_files)
Python Example for Beginners
Two Machine Learning Fields
There are two sides to machine learning:
- Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
- Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.
Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes
Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!
Latest end-to-end Learn by Coding Recipes in Project-Based Learning:
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Google –> SETScholars