Select Important Features In Random Forest Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets from sklearn.feature_selection import SelectFromModel Load Iris Flower Data /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Random Forest Classifier /* Create random forest classifier */ clf = RandomForestClassifier(random_state=0, n_jobs=-1) Select …
Random Forest Classifier Example This tutorial is based on Yhat’s 2013 tutorial on Random Forests in Python. If you want a good summary of the theory and uses of random forests, I suggest you check out their guide. In the tutorial below, I annotate, correct, and expand on a short code example of random forests they …
Random Forest Classifier Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets Load Iris Data /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Random Forest Classifier /* Create random forest classifer object that uses entropy */ clf = RandomForestClassifier(criterion=’entropy’, random_state=0, n_jobs=-1) Train Random Forest Classifier …
Handle Imbalanced Classes In Random Forest Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier import numpy as np from sklearn import datasets Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Adjust Iris Dataset To Make Classes Imbalanced /* Make class highly imbalanced by removing first …
Feature Selection Using Random Forest Often in data science we have hundreds or even millions of features and we want a way to create a model that only includes the most important features. This has three benefits. First, we make our model more simple to interpret. Second, we can reduce the variance of the model, …
Feature Importance Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets import numpy as np import matplotlib.pyplot as plt Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Train A Decision Tree Model /* Create decision tree classifer object */ clf = RandomForestClassifier(random_state=0, …
Decision Tree Regression Preliminaries /* Load libraries */ from sklearn.tree import DecisionTreeRegressor from sklearn import datasets Load Boston Housing Dataset /* Load data with only two features */ boston = datasets.load_boston() X = boston.data[:,0:2] y = boston.target Create Decision Tree Decision tree regression works similar to decision tree classification, however instead of reducing Gini impurity …
Adaboost Classifier Preliminaries /* Load libraries */ from sklearn.ensemble import AdaBoostClassifier from sklearn import datasets Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Adaboost Classifier The most important parameters are base_estimator, n_estimators, and learning_rate. base_estimator is the learning algorithm to use to train the weak models. This will almost …
One Vs. Rest Logistic Regression On their own, logistic regressions are only binary classifiers, meaning they cannot handle target vectors with more than two classes. However, there are clever extensions to logistic regression to do just that. In one-vs-rest logistic regression (OVR) a separate model is trained for each class predicted whether an observation is …
Logistic Regression With L1 Regularization L1 regularization (also called least absolute deviations) is a powerful tool in data science. There are many tutorials out there explaining L1 regularization and I will not try to do that here. Instead, this tutorial is show the effect of the regularization parameter C on the coefficients and model accuracy. Preliminaries import …