Tag Archives: Data Science in R

Data Analytics – HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R

HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R   This article presents the fs R package, which provides a cross-platform, uniform interface to file system operations. fs functions are divided into four main categories: path_ for manipulating and constructing paths file_ for files dir_ for directories link_ for links   Contents: Prerequistes Some Key R functions Basic usage Filter files Read …

Data Analytics – GGHIGHLIGHT: EASY WAY TO HIGHLIGHT A GGPLOT IN R

GGHIGHLIGHT: EASY WAY TO HIGHLIGHT A GGPLOT IN R   This article presents how to easily highlight a ggplot using the gghighlight package. Contents: Prerequisites Line plot Histogram Scatter plot Bar plot   Prerequisites Load required packages and set the default ggplot2 theme to theme_bw(). library(tidyverse) library(gghighlight) theme_set(theme_bw()) Line plot Basic line plot p <- ggplot( airquality, …

Data Analytics – GGPLOT THEME BACKGROUND COLOR AND GRIDS

GGPLOT THEME BACKGROUND COLOR AND GRIDS   This article shows how to change a ggplot theme background color and grid lines. The default theme of a ggplot2 graph has a grey background color. You can easily and quickly change this to a white background color by using the theme functions, such as theme_bw(), theme_classic(), theme_minimal() or theme_light() (See ggplot2 themes gallery). Another alternative is to modify directly …

Data Analytics – GGPLOT AXIS LIMITS AND SCALES

GGPLOT AXIS LIMITS AND SCALES   This article describes R functions for changing ggplot axis limits (or scales). We’ll describe how to specify the minimum and the maximum values of axes. Among the different functions available in ggplot2 for setting the axis range, the coord_cartesian() function is the most preferred, because it zoom the plot without clipping the data. In …

Data Analytics – CLUSTER ANALYSIS IN R SIMPLIFIED AND ENHANCED

CLUSTER ANALYSIS IN R SIMPLIFIED AND ENHANCED In R software, standard clustering methods (partitioning and hierarchical clustering) can be computed using the R packages stats and cluster. However the workflow, generally, requires multiple steps and multiple lines of R codes. This article describes some easy-to-use wrapper functions, in the factoextra R package, for simplifying and improving cluster analysis in R. These …

Statistics with R for Business Analysts – Nonlinear Least Square

(R Tutorials for Citizen Data Scientist) Statistics with R for Business Analysts – Nonlinear Least Square When modeling real world data for regression analysis, we observe that it is rarely the case that the equation of the model is a linear equation giving a linear graph. Most of the time, the equation of the model …

Data Science and Machine Learning for Beginners in R – Tensorflow and Keras with Dropout layers using Mushroom Dataset

  TensorFlow and Keras are two popular open-source tools used for machine learning and deep learning. They are often used together to build and train neural networks, which are a type of model that can be used for tasks such as image recognition, natural language processing, and more. One important technique used in training neural …

Data Science and Machine Learning for Beginners in R – Tensorflow and Keras using Mushroom Dataset

Tensorflow is an open-source software library developed by Google for machine learning. It is a powerful tool that can be used to build and train neural networks. Keras is a high-level library that runs on top of Tensorflow and is used to simplify the process of building and training neural networks. Together, Tensorflow and Keras …

Data Science and Machine Learning for Beginners in R – Boosting Ensembles with Grid Search using Mushroom Dataset

  Boosting is another ensemble learning method that is used to improve the performance of machine learning models. Like bagging, boosting combines the predictions of multiple models, but it does so in a different way. Instead of generating multiple subsets of the data and training a model on each subset, boosting trains a model on …

Data Science and Machine Learning for Beginners in R – Random Forest with Grid Search using Mushroom Dataset

  Random Forest is a type of ensemble learning algorithm that can be used for both classification and regression tasks. It works by building multiple decision trees and combining their predictions to make a final prediction. One of the advantages of Random Forest is that it can help to reduce overfitting, which is a common …