Data Science

Visualization of Text Data Using Word Cloud in R

Hits: 2 Visualization of Text Data Using Word Cloud in R   Introduction Visualization plays an important role in exploratory data analysis and feature engineering. However, visualizing text data can be tricky because it is unstructured. Word Cloud provides an excellent option to visualize the text data in the form of tags, or words, where …

Time Series Forecasting Using R

Hits: 1 Time Series Forecasting Using R   Introduction In this guide, you will learn how to implement the following time series forecasting techniques using the statistical programming language ‘R’: 1. Naive Method 2. Simple Exponential Smoothing 3. Holt’s Trend Method 4. ARIMA 5. TBATS We will begin by exploring the data. Problem Statement Unemployment …

Data Science in R: Interpreting Data Using Descriptive Statistics with R

Hits: 0 Interpreting Data Using Descriptive Statistics with R   Introduction Descriptive Statistics is the foundation block of summarizing data. It is divided into the measures of central tendency and the measures of dispersion. Measures of central tendency include mean, median, and the mode, while the measures of variability include standard deviation, variance, and the …

Understanding ROC Curves with Python

Hits: 5  Understanding ROC Curves with Python In the current age where Data Science / AI is booming, it is important to understand how Machine Learning is used in the industry to solve complex business problems. In order to select which Machine Learning model should be used in production, a selection metric is chosen upon …

Hierarchical Clustering with Python and Scikit-Learn

Hits: 4  Hierarchical Clustering with Python and Scikit-Learn Hierarchical clustering is a type of unsupervised machine learning algorithm used to cluster unlabeled data points. Like K-means clustering, hierarchical clustering also groups together the data points with similar characteristics. In some cases the result of hierarchical and K-Means clustering can be similar. Before implementing hierarchical clustering using Scikit-Learn, let’s first …

The Naive Bayes Algorithm in Python with Scikit-Learn

Hits: 1  The Naive Bayes Algorithm in Python with Scikit-Learn When studying Probability & Statistics, one of the first and most important theorems students learn is the Bayes’ Theorem. This theorem is the foundation of deductive reasoning, which focuses on determining the probability of an event occurring based on prior knowledge of conditions that might be …

How to implement Random Forest Algorithm with Python and Scikit-Learn

Hits: 2  How to implement Random Forest Algorithm with Python and Scikit-Learn Random forest is a type of supervised machine learning algorithm based on ensemble learning. Ensemble learning is a type of learning where you join different types of algorithms or same algorithm multiple times to form a more powerful prediction model. The random forest algorithm combines multiple …

How to Implement LDA in Python with Scikit-Learn

Hits: 2  How to Implement LDA in Python with Scikit-Learn In our previous article Implementing PCA in Python with Scikit-Learn, we studied how we can reduce dimensionality of the feature set using PCA. In this article we will study another very important dimensionality reduction technique: linear discriminant analysis (or LDA). But first let’s briefly discuss how PCA and …

How to Implement PCA in Python with Scikit-Learn

Hits: 3  How to Implement PCA in Python with Scikit-Learn With the availability of high performance CPUs and GPUs, it is pretty much possible to solve every regression, classification, clustering and other related problems using machine learning and deep learning models. However, there are still various factors that cause performance bottlenecks while developing such models. …

How to implement Decision Trees in Python with Scikit-Learn

Hits: 1 How to implement Decision Trees in Python with Scikit-Learn Introduction A decision tree is one of most frequently and widely used supervised machine learning algorithms that can perform both regression and classification tasks. The intuition behind the decision tree algorithm is simple, yet also very powerful. For each attribute in the dataset, the decision …

How to implement K-Nearest Neighbors Algorithm in Python and Scikit-Learn

Hits: 2  How to implement K-Nearest Neighbors Algorithm in Python and Scikit-Learn The K-nearest neighbors (KNN) algorithm is a type of supervised machine learning algorithms. KNN is extremely easy to implement in its most basic form, and yet performs quite complex classification tasks. It is a lazy learning algorithm since it doesn’t have a specialized training phase. …

Introduction to Neural Networks with Scikit-Learn in Python

Hits: 1 Introduction to Neural Networks with Scikit-Learn in Python What is a Neural Network? Humans have an ability to identify patterns within the accessible information with an astonishingly high degree of accuracy. Whenever you see a car or a bicycle you can immediately recognize what they are. This is because we have learned over …

How to do K-Means Clustering with Scikit-Learn in Python

Hits: 5 How to do K-Means Clustering with Scikit-Learn in Python Introduction K-means clustering is one of the most widely used unsupervised machine learning algorithms that forms clusters of data based on the similarity between data instances. For this particular algorithm to work, the number of clusters has to be defined beforehand. The K in …

Python tutorial on Append vs Extend in Python Lists

Hits: 3 Python tutorial on Append vs Extend in Python Lists Adding Elements to a List Lists are one of the most useful data structures available in Python, or really any programming language, since they’re used in so many different algorithms and solutions. Once we have created a list, often times we may need to …

Data Analytics – DISPLAY A BEAUTIFUL SUMMARY STATISTICS IN R USING SKIMR PACKAGE

Hits: 4 DISPLAY A BEAUTIFUL SUMMARY STATISTICS IN R USING SKIMR PACKAGE   This article describes how to quickly display summary statistics using the R package skimr. skimr handles different data types and returns a skim_df object which can be included in a tidyverse pipeline or displayed nicely for the human reader. Key features of skimr: Provides a larger set of …

Data Analytics – HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R

Hits: 3 HOW TO EASILY MANIPULATE FILES AND DIRECTORIES IN R   This article presents the fs R package, which provides a cross-platform, uniform interface to file system operations. fs functions are divided into four main categories: path_ for manipulating and constructing paths file_ for files dir_ for directories link_ for links   Contents: Prerequistes Some Key R functions Basic usage Filter …