KNN Regression in R: Analyzing the Boston Housing Dataset

KNN Regression in R: Analyzing the Boston Housing Dataset

Introduction

The K-Nearest Neighbors (KNN) algorithm, primarily known for classification, can also be adapted for regression tasks. By averaging the values of the ‘k’ nearest neighbors, KNN Regression offers an intuitive way to predict continuous variables. This article delves deep into KNN Regression in R, leveraging the `caret` package and the renowned Boston Housing dataset. By the end, readers will gain a comprehensive understanding of the method and its application in predicting housing prices.

The Boston Housing Dataset: A Quick Overview

The Boston Housing dataset is a classic in the domain of regression analysis. Consisting of 506 observations across 14 attributes, it captures various characteristics of houses in the Boston suburbs. The dataset’s primary objective is predicting the median value of homes based on attributes like crime rate, age of the home, property tax rate, and more.

KNN Regression: A Brief Primer

While the fundamental principle of KNN remains the same for regression—finding the ‘k’ nearest neighbors—the prediction is the average of the dependent variable for these neighbors instead of a majority vote.

Implementing KNN Regression in R using `caret`

1. Preparing the Environment

Begin by loading the essential libraries and the dataset:

```R
# Load the required libraries
library(caret)
library(mlbench)

# Load the Boston Housing dataset
data(BostonHousing)
```

The `chas` variable in the dataset is a factor. For modeling purposes, convert it to numeric:

```R
BostonHousing$chas <- as.numeric(as.character(BostonHousing$chas))
```

Then, segregate the predictors and the response variable:

```R
x <- as.matrix(BostonHousing[,1:13])
y <- as.matrix(BostonHousing[,14])
```

2. Training the KNN Regression Model

Utilize the `knnreg()` function from the `caret` package to train the KNN regression model:

```R
# Fit the KNN regression model
fit <- knnreg(x, y, k=3)

# Display the model summary
print(fit)
```

3. Predicting House Prices

Once the model is trained, predict the housing prices:

```R
# Generate predictions using the model
predictions <- predict(fit, x)
```

4. Evaluating the Model’s Performance

Assess the accuracy of the regression model using the Mean Squared Error (MSE):

```R
# Compute and display the Mean Squared Error (MSE)
mse <- mean((BostonHousing$medv - predictions)^2)
print(mse)
```

Conclusion

KNN Regression stands as a testament to the versatility of the K-Nearest Neighbors algorithm. This article provided a deep dive into KNN Regression in R, from understanding its nuances to implementing it on the Boston Housing dataset. The process highlighted the simplicity and efficiency of the method, making it a valuable tool for predicting continuous variables.

End-to-End Coding Example:

For a comprehensive hands-on experience, here’s the combined code:

```R
# KNN Regression with the Boston Housing Dataset in R

# Load the necessary libraries
library(caret)
library(mlbench)

# Import the Boston Housing dataset
data(BostonHousing)

# Convert 'chas' variable to numeric
BostonHousing$chas <- as.numeric(as.character(BostonHousing$chas))

# Prepare the data
x <- as.matrix(BostonHousing[,1:13])
y <- as.matrix(BostonHousing[,14])

# Train the KNN regression model
fit <- knnreg(x, y, k=3)

# Display the model details
print(fit)

# Predict housing prices
predictions <- predict(fit, x)

# Evaluate the model's performance
mse <- mean((BostonHousing$medv - predictions)^2)
print(mse)
```

Running this unified code will offer insights into KNN Regression’s capabilities in R, specifically applied to the Boston Housing dataset.

 

Essential Gigs