Tag Archives: Python for Citizen Data Scientist

Machine Learning for Beginners in Python: What is Effect Of Alpha On Lasso Regression

Effect Of Alpha On Lasso Regression Often we want conduct a process called regularization, wherein we penalize the number of features in a model in order to only keep the most important features. This can be particularly important when you have a dataset with 100,000+ features. Lasso regression is a common modeling technique to do regularization. The …

Machine Learning for Beginners in Python: How to Add Interaction Terms in Linear Regression

Adding Interaction Terms Preliminaries /* Load libraries */ from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston from sklearn.preprocessing import PolynomialFeatures import warnings /* Suppress Warning */ warnings.filterwarnings(action=”ignore”, module=”scipy”, message=”^internal gelsd”) Load Boston Housing Dataset /* Load the data with only two features */ boston = load_boston() X = boston.data[:,0:2] y = boston.target Add Interaction Term …

Machine Learning for Beginners in Python: Hyperparameter Tuning Using Random Search

Hyperparameter Tuning Using Random Search Preliminaries /* Load libraries */ from scipy.stats import uniform from sklearn import linear_model, datasets from sklearn.model_selection import RandomizedSearchCV Load Iris Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Create Logistic Regression /* Create logistic regression */ logistic = linear_model.LogisticRegression() Create Hyperparameter Search Space /* …

Machine Learning for Beginners in Python: How to Find Best Preprocessing Steps During Model Selection

Find Best Preprocessing Steps During Model Selection We have to be careful to properly handle preprocessing when conducting model selection. First, GridSearchCV uses cross-validation to determine which model has the highest performance. However, in cross-validation we are in effect pretending that the fold held out as the test set is not seen, and thus not part of …

Machine Learning for Beginners in Python: How to Calculate Recall

Recall Preliminaries /* Load libraries */ from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification Generate Features And Target Data /* Generate features matrix and target vector */ X, y = make_classification(n_samples = 10000, n_features = 3, n_informative = 3, n_redundant = 0, n_classes = 2, random_state = 1) Create Logistic Regression …

Machine Learning for Beginners in Python: How to Plot The Validation Curve

Plot The Validation Curve   Preliminaries /* Load libraries */ import matplotlib.pyplot as plt import numpy as np from sklearn.datasets import load_digits from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import validation_curve Load Digits Dataset /* Load data */ digits = load_digits() /* Create feature matrix and target vector */ X, y = digits.data, digits.target Plot Validation …

Machine Learning for Beginners in Python: How to Plot The Receiving Operating Characteristic Curve

Plot The Receiving Operating Characteristic Curve Preliminaries /* Load libraries */ from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt Generate Features And Target /* Create feature matrix and target vector */ X, y = make_classification(n_samples=10000, n_features=10, n_classes=2, n_informative=3, random_state=3) Split Data Intro …

Machine Learning for Beginners in Python: How to Plot The Learning Curve

Plot The Learning Curve   Preliminaries /* Load libraries */ import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_digits from sklearn.model_selection import learning_curve Load Digits Dataset /* Load data */ digits = load_digits() /* Create feature matrix and target vector */ X, y = digits.data, digits.target Plot Learning …

Machine Learning for Beginners in Python: How to do Nested Cross Validation

Nested Cross Validation Often we want to tune the parameters of a model (for example, C in a support vector machine). That is, we want to find the value of a parameter that minimizes our loss function. The best way to do this is cross validation: Set the parameter you want to tune to some value. Split …

Machine Learning for Beginners in Python: How to Generate Text Reports On Performance

Generate Text Reports On Performance Preliminaries /* Load libraries /* from sklearn import datasets from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report Load Iris Flower Data /* Load data */ iris = datasets.load_iris() /* Create feature matrix */ X = iris.data /* Create target vector */ y = iris.target /* Create …