Effect Of Alpha On Lasso Regression Often we want conduct a process called regularization, wherein we penalize the number of features in a model in order to only keep the most important features. This can be particularly important when you have a dataset with 100,000+ features. Lasso regression is a common modeling technique to do regularization. The …
Adding Interaction Terms Preliminaries /* Load libraries */ from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston from sklearn.preprocessing import PolynomialFeatures import warnings /* Suppress Warning */ warnings.filterwarnings(action=”ignore”, module=”scipy”, message=”^internal gelsd”) Load Boston Housing Dataset /* Load the data with only two features */ boston = load_boston() X = boston.data[:,0:2] y = boston.target Add Interaction Term …
Hyperparameter Tuning Using Random Search Preliminaries /* Load libraries */ from scipy.stats import uniform from sklearn import linear_model, datasets from sklearn.model_selection import RandomizedSearchCV Load Iris Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Logistic Regression /* Create logistic regression */ logistic = linear_model.LogisticRegression() Create Hyperparameter Search Space /* …
Find Best Preprocessing Steps During Model Selection We have to be careful to properly handle preprocessing when conducting model selection. First, GridSearchCV uses cross-validation to determine which model has the highest performance. However, in cross-validation we are in effect pretending that the fold held out as the test set is not seen, and thus not part of …
Recall Preliminaries /* Load libraries */ from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification Generate Features And Target Data /* Generate features matrix and target vector */ X, y = make_classification(n_samples = 10000, n_features = 3, n_informative = 3, n_redundant = 0, n_classes = 2, random_state = 1) Create Logistic Regression …
Plot The Validation Curve Preliminaries /* Load libraries */ import matplotlib.pyplot as plt import numpy as np from sklearn.datasets import load_digits from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import validation_curve Load Digits Dataset /* Load data */ digits = load_digits() /* Create feature matrix and target vector */ X, y = digits.data, digits.target Plot Validation …
Plot The Receiving Operating Characteristic Curve Preliminaries /* Load libraries */ from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt Generate Features And Target /* Create feature matrix and target vector */ X, y = make_classification(n_samples=10000, n_features=10, n_classes=2, n_informative=3, random_state=3) Split Data Intro …
Plot The Learning Curve Preliminaries /* Load libraries */ import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_digits from sklearn.model_selection import learning_curve Load Digits Dataset /* Load data */ digits = load_digits() /* Create feature matrix and target vector */ X, y = digits.data, digits.target Plot Learning …
Nested Cross Validation Often we want to tune the parameters of a model (for example, C in a support vector machine). That is, we want to find the value of a parameter that minimizes our loss function. The best way to do this is cross validation: Set the parameter you want to tune to some value. Split …
Generate Text Reports On Performance Preliminaries /* Load libraries /* from sklearn import datasets from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report Load Iris Flower Data /* Load data */ iris = datasets.load_iris() /* Create feature matrix */ X = iris.data /* Create target vector */ y = iris.target /* Create …