Python for Citizen Data Scientist

Machine Learning for Beginners in Python: How to Handle Imbalanced Classes In Random Forest

Handle Imbalanced Classes In Random Forest Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier import numpy as np from sklearn import datasets Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Adjust Iris Dataset To Make Classes Imbalanced /* Make class highly imbalanced by removing first …

Machine Learning for Beginners in Python: Feature Selection Using Random Forest

Feature Selection Using Random Forest Often in data science we have hundreds or even millions of features and we want a way to create a model that only includes the most important features. This has three benefits. First, we make our model more simple to interpret. Second, we can reduce the variance of the model, …

Machine Learning for Beginners in Python: Feature Importance

Feature Importance Preliminaries /* Load libraries */ from sklearn.ensemble import RandomForestClassifier from sklearn import datasets import numpy as np import matplotlib.pyplot as plt Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Train A Decision Tree Model /* Create decision tree classifer object */ clf = RandomForestClassifier(random_state=0, …

Machine Learning for Beginners in Python: Decision Tree Regression

Decision Tree Regression Preliminaries /* Load libraries */ from sklearn.tree import DecisionTreeRegressor from sklearn import datasets Load Boston Housing Dataset /* Load data with only two features */ boston = datasets.load_boston() X = boston.data[:,0:2] y = boston.target Create Decision Tree Decision tree regression works similar to decision tree classification, however instead of reducing Gini impurity …

Machine Learning for Beginners in Python: Adaboost Classifier

Adaboost Classifier Preliminaries /* Load libraries */ from sklearn.ensemble import AdaBoostClassifier from sklearn import datasets Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Adaboost Classifier The most important parameters are base_estimator, n_estimators, and learning_rate. base_estimator is the learning algorithm to use to train the weak models. This will almost …

Machine Learning for Beginners in Python: One Vs. Rest Logistic Regression

One Vs. Rest Logistic Regression On their own, logistic regressions are only binary classifiers, meaning they cannot handle target vectors with more than two classes. However, there are clever extensions to logistic regression to do just that. In one-vs-rest logistic regression (OVR) a separate model is trained for each class predicted whether an observation is …

Machine Learning for Beginners in Python: Logistic Regression With L1 Regularization

Logistic Regression With L1 Regularization L1 regularization (also called least absolute deviations) is a powerful tool in data science. There are many tutorials out there explaining L1 regularization and I will not try to do that here. Instead, this tutorial is show the effect of the regularization parameter C on the coefficients and model accuracy. Preliminaries import …

Machine Learning for Beginners in Python: Logistic Regression On Very Large Data

Logistic Regression On Very Large Data scikit-learn’s LogisticRegression offers a number of techniques for training a logistic regression, called solvers. Most of the time scikit-learn will select the best solver automatically for us or warn us that you cannot do some thing with that solver. However, there is one particular case we should be aware of. While …

Machine Learning for Beginners in Python: Logistic Regression

Logistic Regression Despite having “regression” in its name, a logistic regression is actually a widely used binary classifier (i.e. the target vector can only take two values). Preliminaries /* Load libraries */ from sklearn.linear_model import LogisticRegression from sklearn import datasets from sklearn.preprocessing import StandardScaler Load Iris Flower Dataset /* Load data with only two classes …

Machine Learning for Beginners in Python: Fast C Hyperparameter Tuning

Fast C Hyperparameter Tuning Sometimes the characteristics of a learning algorithm allows us to search for the best hyperparameters significantly faster than either brute force or randomized model search methods. scikit-learn’s LogisticRegressionCV method includes a parameter Cs. If supplied a list, Cs is the candidate hyperparameter values to select from. If supplied a integer, Cs a list of that many candidate values …