Tag Archives: data science from scratch

Machine Learning for Beginners in Python: How to Select Important Features In Random Forest

Select Important Features In Random Forest Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets from sklearn.feature_selection import SelectFromModel Load Iris Flower Data /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Random Forest Classifier /* Create random forest classifier */ clf = RandomForestClassifier(random_state=0, n_jobs=-1) Select …

Machine Learning for Beginners in Python: Random Forest Classifier Example

Random Forest Classifier Example This tutorial is based on Yhat’s 2013 tutorial on Random Forests in Python. If you want a good summary of the theory and uses of random forests, I suggest you check out their guide. In the tutorial below, I annotate, correct, and expand on a short code example of random forests they …

Machine Learning for Beginners in Python: Random Forest Classifier

Random Forest Classifier Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets Load Iris Data /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Random Forest Classifier /* Create random forest classifer object that uses entropy */ clf = RandomForestClassifier(criterion=’entropy’, random_state=0, n_jobs=-1) Train Random Forest Classifier …

Machine Learning for Beginners in Python: How to Handle Imbalanced Classes In Random Forest

Handle Imbalanced Classes In Random Forest Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier import numpy as np from sklearn import datasets Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Adjust Iris Dataset To Make Classes Imbalanced /* Make class highly imbalanced by removing first …

Machine Learning for Beginners in Python: Feature Selection Using Random Forest

Feature Selection Using Random Forest Often in data science we have hundreds or even millions of features and we want a way to create a model that only includes the most important features. This has three benefits. First, we make our model more simple to interpret. Second, we can reduce the variance of the model, …

Machine Learning for Beginners in Python: Feature Importance

Feature Importance Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets import numpy as np import matplotlib.pyplot as plt Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Train A Decision Tree Model /* Create decision tree classifer object */ clf = RandomForestClassifier(random_state=0, …

Machine Learning for Beginners in Python: Decision Tree Regression

Decision Tree Regression Preliminaries /* Load libraries */ from sklearn.tree import DecisionTreeRegressor from sklearn import datasets Load Boston Housing Dataset /* Load data with only two features */ boston = datasets.load_boston() X = boston.data[:,0:2] y = boston.target Create Decision Tree Decision tree regression works similar to decision tree classification, however instead of reducing Gini impurity …

Machine Learning for Beginners in Python: Adaboost Classifier

Adaboost Classifier Preliminaries /* Load libraries */ from sklearn.ensemble import AdaBoostClassifier from sklearn import datasets Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Adaboost Classifier The most important parameters are base_estimator, n_estimators, and learning_rate. base_estimator is the learning algorithm to use to train the weak models. This will almost …

Machine Learning for Beginners in Python: One Vs. Rest Logistic Regression

One Vs. Rest Logistic Regression On their own, logistic regressions are only binary classifiers, meaning they cannot handle target vectors with more than two classes. However, there are clever extensions to logistic regression to do just that. In one-vs-rest logistic regression (OVR) a separate model is trained for each class predicted whether an observation is …

Machine Learning for Beginners in Python: Logistic Regression With L1 Regularization

Logistic Regression With L1 Regularization L1 regularization (also called least absolute deviations) is a powerful tool in data science. There are many tutorials out there explaining L1 regularization and I will not try to do that here. Instead, this tutorial is show the effect of the regularization parameter C on the coefficients and model accuracy. Preliminaries import …