Tag Archives: data science from scratch

Machine Learning for Beginners in Python: k-Means Clustering

k-Means Clustering Preliminaries /* Load libraries */ from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data Standardize Features /* Standarize features */ scaler = StandardScaler() X_std = scaler.fit_transform(X) Conduct k-Means Clustering /* Create k-mean object */ clt = …

Machine Learning for Beginners in Python: Agglomerative Clustering

Agglomerative Clustering Preliminaries /* Load libraries */ from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import AgglomerativeClustering Load Iris Flower Data /* Load data */ iris = datasets.load_iris() X = iris.data Standardize Features /* Standarize features */ scaler = StandardScaler() X_std = scaler.fit_transform(X) Conduct Agglomerative Clustering In scikit-learn, AgglomerativeClustering uses the linkage parameter to determine the merging …

Machine Learning for Beginners in Python: Naive Bayes Classifier From Scratch

Naive Bayes Classifier From Scratch Naive bayes is simple classifier known for doing well when only a small number of observations is available. In this tutorial we will create a gaussian naive bayes classifier from scratch and use it to predict the class of a previously unseen data point. This tutorial is based on an …

Machine Learning for Beginners in Python: Gaussian Naive Bayes Classifier

Gaussian Naive Bayes Classifier Because of the assumption of the normal distribution, Gaussian Naive Bayes is best used in cases when all our features are continuous. Preliminaries /* Load libraries */ from sklearn import datasets from sklearn.naive_bayes import GaussianNB Load Iris Flower Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = …

Machine Learning for Beginners in Python: How to Calibrate Predicted Probabilities

Calibrate Predicted Probabilities Class probabilities are a common and useful part of machine learning models. In scikit-learn, most learning algortihms allow us to see the predicted probabilities of class membership using predict_proba. This can be extremely useful if, for instance, we want to only predict a certain class if the model predicts the probability that they …

Machine Learning for Beginners in Python: Bernoulli Naive Bayes Classifier

Bernoulli Naive Bayes Classifier The Bernoulli naive Bayes classifier assumes that all our features are binary such that they take only two values (e.g. a nominal categorical feature that has been one-hot encoded). Preliminaries /* Load libraries */ import numpy as np from sklearn.naive_bayes import BernoulliNB Create Binary Feature And Target Data /* Create three …

Machine Learning for Beginners in Python: Support Vector Classifier

Support Vector Classifier There is a balance between SVC maximizing the margin of the hyperplane and minimizing the misclassification. In SVC, the later is controlled with the hyperparameter C, the penalty imposed on errors. C is a parameter of the SVC learner and is the penalty for misclassifying a data point. When C is small, the …

Machine Learning for Beginners in Python: How to Find Support Vectors

Find Support Vectors Preliminaries /* Load libraries */ from sklearn.svm import SVC from sklearn import datasets from sklearn.preprocessing import StandardScaler import numpy as np Load Iris Flower Dataset /* Load data with only two classes */ iris = datasets.load_iris() X = iris.data[:100,:] y = iris.target[:100] Standardize Features /* Standarize features */ scaler = StandardScaler() X_std …

Machine Learning for Beginners in Python: How to Find Nearest Neighbors

Find Nearest Neighbors Preliminaries /* Load libraries */ from sklearn.neighbors import NearestNeighbors from sklearn import datasets from sklearn.preprocessing import StandardScaler import numpy as np Load Iris Dataset /* Load data */ iris = datasets.load_iris() X = iris.data y = iris.target Standardize Iris Data It is important to standardize our data before we calculate any distances. …

Machine Learning for Beginners in Python: K-Nearest Neighbors Classification

K-Nearest Neighbors Classification Preliminaries import pandas as pd from sklearn import neighbors import numpy as np %matplotlib inline import seaborn Create Dataset Here we create three variables, test_1 and test_2 are our independent variables, ‘outcome’ is our dependent variable. We will use this data to train our learner. training_data = pd.DataFrame() training_data[‘test_1’] = [0.3051,0.4949,0.6974,0.3769,0.2231,0.341,0.4436,0.5897,0.6308,0.5] training_data[‘test_2’] = [0.5846,0.2654,0.2615,0.4538,0.4615,0.8308,0.4962,0.3269,0.5346,0.6731] training_data[‘outcome’] = …