k-Means Clustering Preliminaries /* Load libraries */ from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data Standardize Features /* Standarize features */ scaler = StandardScaler() X_std = scaler.fit_transform(X) Conduct k-Means Clustering /* Create k-mean object */ clt = …
Agglomerative Clustering Preliminaries /* Load libraries */ from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import AgglomerativeClustering Load Iris Flower Data /* Load data */ iris = datasets.load_iris() X = iris.data Standardize Features /* Standarize features */ scaler = StandardScaler() X_std = scaler.fit_transform(X) Conduct Agglomerative Clustering In scikit-learn, AgglomerativeClustering uses the linkage parameter to determine the merging …
Naive Bayes Classifier From Scratch Naive bayes is simple classifier known for doing well when only a small number of observations is available. In this tutorial we will create a gaussian naive bayes classifier from scratch and use it to predict the class of a previously unseen data point. This tutorial is based on an …
Gaussian Naive Bayes Classifier Because of the assumption of the normal distribution, Gaussian Naive Bayes is best used in cases when all our features are continuous. Preliminaries /* Load libraries */ from sklearn import datasets from sklearn.naive_bayes import GaussianNB Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = …
Calibrate Predicted Probabilities Class probabilities are a common and useful part of machine learning models. In scikit-learn, most learning algortihms allow us to see the predicted probabilities of class membership using predict_proba. This can be extremely useful if, for instance, we want to only predict a certain class if the model predicts the probability that they …
Bernoulli Naive Bayes Classifier The Bernoulli naive Bayes classifier assumes that all our features are binary such that they take only two values (e.g. a nominal categorical feature that has been one-hot encoded). Preliminaries /* Load libraries */ import numpy as np from sklearn.naive_bayes import BernoulliNB Create Binary Feature And Target Data /* Create three …
Support Vector Classifier There is a balance between SVC maximizing the margin of the hyperplane and minimizing the misclassification. In SVC, the later is controlled with the hyperparameter C, the penalty imposed on errors. C is a parameter of the SVC learner and is the penalty for misclassifying a data point. When C is small, the …
Find Support Vectors Preliminaries /* Load libraries */ from sklearn.svm import SVC from sklearn import datasets from sklearn.preprocessing import StandardScaler import numpy as np Load Iris Flower Dataset /* Load data with only two classes */ iris = datasets.load_iris() X = iris.data[:100,:] y = iris.target[:100] Standardize Features /* Standarize features */ scaler = StandardScaler() X_std …
Find Nearest Neighbors Preliminaries /* Load libraries */ from sklearn.neighbors import NearestNeighbors from sklearn import datasets from sklearn.preprocessing import StandardScaler import numpy as np Load Iris Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Standardize Iris Data It is important to standardize our data before we calculate any distances. …
K-Nearest Neighbors Classification Preliminaries import pandas as pd from sklearn import neighbors import numpy as np %matplotlib inline import seaborn Create Dataset Here we create three variables, test_1 and test_2 are our independent variables, ‘outcome’ is our dependent variable. We will use this data to train our learner. training_data = pd.DataFrame() training_data[‘test_1’] = [0.3051,0.4949,0.6974,0.3769,0.2231,0.341,0.4436,0.5897,0.6308,0.5] training_data[‘test_2’] = [0.5846,0.2654,0.2615,0.4538,0.4615,0.8308,0.4962,0.3269,0.5346,0.6731] training_data[‘outcome’] = …