Data Analytics – HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R

HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R

 

This article presents the fs R package, which provides a cross-platform, uniform interface to file system operations.

fs functions are divided into four main categories:

  • path_ for manipulating and constructing paths
  • file_ for files
  • dir_ for directories
  • link_ for links

 

Contents:

  • Prerequistes
  • Some Key R functions
  • Basic usage
  • Filter files
  • Read a collection of files into one data frame

 

Prerequistes

Install the package from CRAN (install.packages("fs")) or from GitHub (devtools::install_github("r-lib/fs"))

Load required packages:

library("fs")  # File manipulations
library(tidyverse)  # Data manipulation

Some Key R functions

File manipulation:

  • file_copy(), dir_copy(), link_copy(): Copy files, directories or links
  • file_create(), dir_create(), link_create(): Create files, directories, or links
  • file_delete(), dir_delete(), link_delete(): Delete files, directories, or links
  • file_access(), file_exists(), dir_exists(), link_exists(): Query for existence and access permissions
  • file_chmod(): Change file permissions
  • file_chown(): Change owner or group of a file
  • file_info(): Query file metadata
  • file_move(): Move or rename files

 

Path manipulation:

  • path(), path_wd(): Construct path to a file or directory
  • file_temp(), path_temp(): Create names for temporary files
  • path_expand(), path_expand_r(), path_home(), path_home_r(): Finding the User Home Directory
  • path_file() path_dir() path_ext() path_ext_remove() path_ext_set(): Manipulate file paths
    • path_file() returns the filename portion of the path,
    • path_dir() returns the directory portion,
    • path_ext() returns the last extension (if any) for a path,
    • path_ext_remove() removes the last extension and returns the rest of the path,
    • path_ext_set() replaces the extension with a new extension. If there is no existing extension the new extension is appended.
  • path_filter(): Filter paths
  • path_real() path_split() path_join() path_abs() path_norm() path_rel() path_common() path_has_parent(): Path computations
    • path_real: returns the canonical path
    • path_split: splits paths into parts
    • path_abs: returns a normalized, absolute version of a path
    • path_norm: eliminates . references and rationalizes up-level .. references, so A/./B and A/foo/../B both become A/B, but ../B is not changed. If one of the paths is a symbolic link, this may change the meaning of the path, so consider using path_real() instead.
    • path_common: finds the common parts of two (or more) paths.
    • path_has_parent: determine if a path has a given parent.

 

Helpers:

  • is_file(), is_dir(), is_link(): Functions to test for file types

 

Basic usage

  • List the files in a directory/folder
  • Create and delete files/directory
# Construct a path to a file with `path()`
path("foo", "bar", letters[1:3], ext = "txt")
## foo/bar/a.txt foo/bar/b.txt foo/bar/c.txt
# list files in the current directory
dir_ls()
## 002-create-icon.html
## 003-r-histogram-example.html
## _output.yaml
## _settings.R
## _settings.Rmd
## book.bib
## correlation-matrix-analysis-in-r-using-corrr.html
## correlation-network-using-corrr.html
## figures
## file-and-directory-manipulation.Rmd
## gganimate.html
## gghighlight.html
## include
## interactive-data-summary.html
## libs
## mathjax.Rmd
## packages.bib
## plot-all-variables-in-a-dataset.html
## plot-one-variable-against-multiples-others.html
## wp-content
# create a new directory
tmp <- dir_create(file_temp())
tmp
## /var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/Rtmp6lCt2d/filed958126c105c
# create new files in that directory
file_create(path(tmp, "my-file.txt"))
dir_ls(tmp)
## /var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/Rtmp6lCt2d/filed958126c105c/my-file.txt
# remove files from the directory
file_delete(path(tmp, "my-file.txt"))
dir_ls(tmp)
## character(0)
# remove the directory
dir_delete(tmp)

Filter files

Filter files by type, permission and size

dir_info(path = ".", recursive = FALSE) %>%
  filter(type == "file", permissions == "u+r", size > "10KB") %>%
  arrange(desc(size)) %>%
  select(path, permissions, size, modification_time)
## # A tibble: 2 x 4
##   path                                              permissions  size
##   <fs::path>                                        <fs::perms> <fs:>
## 1 correlation-matrix-analysis-in-r-using-corrr.html rw-r--r--   20.3K
## 2 gganimate.html                                    rw-r--r--   15.1K
## # … with 1 more variable: modification_time <dttm>

Tabulate and display folder size.

dir_info(path = ".", recursive = TRUE) %>%
  group_by(directory = path_dir(path)) %>%
  tally(wt = size, sort = TRUE)
## # A tibble: 37 x 2
##   directory                                                    n
##   <fs::path>                                         <fs::bytes>
## 1 https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-tutorial/images       11.76M
## 2 https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-tutorial/figures       2.36M
## 3 libs/bootstrap-3.3.5/css                                 2.31M
## 4 libs/plotlyjs-1.16.3                                     1.66M
## 5 libs/bootstrap-3.3.5/css/fonts                         953.23K
## 6 libs/font-awesome-4.1.0/fonts                          611.77K
## # … with 31 more rows

Read a collection of files into one data frame

dir_ls() returns a named vector, so it can be used directly with purrr::map_df(.id).

# Create separate files for each species
iris %>%
  split(.$Species) %>%
  map(select, -Species) %>%
  iwalk(~ write_tsv(.x, paste0(.y, ".tsv")))
  
# Show the files
iris_files <- dir_ls(glob = "*.tsv")
iris_files
## setosa.tsv     versicolor.tsv virginica.tsv
# Read the data into a single table, including the filenames
iris_files %>%
  map_df(read_tsv, .id = "file", col_types = cols(), n_max = 2)
## # A tibble: 6 x 5
##   file           Sepal.Length Sepal.Width Petal.Length Petal.Width
##   <chr>                 <dbl>       <dbl>        <dbl>       <dbl>
## 1 setosa.tsv              5.1         3.5          1.4         0.2
## 2 setosa.tsv              4.9         3            1.4         0.2
## 3 versicolor.tsv          7           3.2          4.7         1.4
## 4 versicolor.tsv          6.4         3.2          4.5         1.5
## 5 virginica.tsv           6.3         3.3          6           2.5
## 6 virginica.tsv           5.8         2.7          5.1         1.9
file_delete(iris_files)

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

 

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  

Google –> SETScholars