Reshaping Data with R

Reshaping Data with R   Introduction In predictive modeling, it is often necessary to reshape the data to make it ready for conducting analysis or building models. The process of transforming the data into a clear, simple, and desirable form is an integral component of data science. The most common reshaping process is converting the …

Coping with Missing, Invalid and Duplicate Data in R

Coping with Missing, Invalid and Duplicate Data in R   Introduction A vital component of data science is cleaning the data and getting it ready for predictive modeling. The most common problem related to data cleaning is coping with missing data, invalid records and duplicate values. In this guide, you will learn about techniques for …

Visualization of Text Data Using Word Cloud in R

Visualization of Text Data Using Word Cloud in R   Introduction Visualization plays an important role in exploratory data analysis and feature engineering. However, visualizing text data can be tricky because it is unstructured. Word Cloud provides an excellent option to visualize the text data in the form of tags, or words, where the importance …

Machine Learning with Text Data Using R

Machine Learning with Text Data Using R   Introduction The domain of analytics that addresses how computers understand text is called Natural Language Processing (NLP). NLP has multiple applications like sentiment analysis, chatbots, AI agents, social media analytics, as well as text classification. In this guide, you will learn how to build a supervised machine …

Hypothesis Testing – Interpreting Data with Statistical Models

Hypothesis Testing – Interpreting Data with Statistical Models   Introduction Building predictive models, or carrying out data science research, depends on formulating a hypothesis and drawing conclusions using statistical tests. In this guide, you will learn about how to perform these tests using the statistical programming language, ‘R’. The most widely used inferential statistic techniques …

Time Series Forecasting Using R

Time Series Forecasting Using R   Introduction In this guide, you will learn how to implement the following time series forecasting techniques using the statistical programming language ‘R’: 1. Naive Method 2. Simple Exponential Smoothing 3. Holt’s Trend Method 4. ARIMA 5. TBATS We will begin by exploring the data. Problem Statement Unemployment is a …

Interpreting Data Using Statistical Models with R

Interpreting Data Using Statistical Models with R   Introduction Statistical models are useful not only in machine learning, but also in interpreting data and understanding the relationships between the variables. In this guide, the reader will learn how to fit and analyze statistical models on the quantitative (linear regression) and qualitative (logistic regression) target variables. …

Data Science in R: Interpreting Data Using Descriptive Statistics with R

Interpreting Data Using Descriptive Statistics with R   Introduction Descriptive Statistics is the foundation block of summarizing data. It is divided into the measures of central tendency and the measures of dispersion. Measures of central tendency include mean, median, and the mode, while the measures of variability include standard deviation, variance, and the interquartile range. …

Understanding ROC Curves with Python

  Understanding ROC Curves with Python In the current age where Data Science / AI is booming, it is important to understand how Machine Learning is used in the industry to solve complex business problems. In order to select which Machine Learning model should be used in production, a selection metric is chosen upon which …

How to do Cross Validation and Grid Search for Model Selection in Python

How to do Cross Validation and Grid Search for Model Selection in Python Introduction A typical machine learning process involves training different models on the dataset and selecting the one with best performance. However, evaluating the performance of algorithm is not always a straight forward task. There are several factors that can help you determine …