Machine Learning for Beginners in Python: How to Use ANOVA F-value For Feature Selection

ANOVA F-value For Feature Selection If the features are categorical, calculate a chi-square statistic between each feature and the target vector. However, if the features are quantitative, compute the ANOVA F-value between each feature and the target vector. The F-value scores examine if, when we group the numerical feature by the target vector, the means …

Machine Learning for Beginners in Python: How to Select The Best Number Of Components For TSVD

Selecting The Best Number Of Components For TSVD Preliminaries /* Load libraries */ from sklearn.preprocessing import StandardScaler from sklearn.decomposition import TruncatedSVD from scipy.sparse import csr_matrix from sklearn import datasets import numpy as np Load Digits Data And Make Sparse /* Load the data */ digits = datasets.load_digits() /* Standardize the feature matrix */ X = …

Machine Learning for Beginners in Python: How to Group Observations Using K-Means Clustering

Group Observations Using K-Means Clustering Preliminaries /* Load libraries */ from sklearn.datasets import make_blobs from sklearn.cluster import KMeans import pandas as pd Create Data /* Make simulated feature matrix */ X, _ = make_blobs(n_samples = 50, n_features = 2, centers = 3, random_state = 1) /* Create DataFrame */ df = pd.DataFrame(X, columns=[‘feature_1′,’feature_2’]) Train Clusterer …

Machine Learning for Beginners in Python: Dimensionality Reduction With PCA

Dimensionality Reduction With PCA Preliminaries /* Load libraries */ from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn import datasets Load Data /* Load the data */ digits = datasets.load_digits() Standardize Feature Values /* Standardize the feature matrix */ X = StandardScaler().fit_transform(digits.data) Conduct Principal Component Analysis /* Create a PCA that will retain 99% …

Machine Learning for Beginners in Python: Dimensionality Reduction With Kernel PCA

Dimensionality Reduction With Kernel PCA Preliminaries /* Load libraries */ from sklearn.decomposition import PCA, KernelPCA from sklearn.datasets import make_circles Create Linearly Inseparable Data /* Create linearly inseparable data */ X, _ = make_circles(n_samples=1000, random_state=1, noise=0.1, factor=0.1) Conduct Kernel PCA /* Apply kernal PCA with radius basis function (RBF) kernel */ kpca = KernelPCA(kernel=”rbf”, gamma=15, n_components=1) …

Machine Learning for Beginners in Python: Dimensionality Reduction On Sparse Feature Matrix

Dimensionality Reduction On Sparse Feature Matrix Preliminaries /* Load libraries */ from sklearn.preprocessing import StandardScaler from sklearn.decomposition import TruncatedSVD from scipy.sparse import csr_matrix from sklearn import datasets import numpy as np Load Digits Data And Make Sparse /* Load the data */ digits = datasets.load_digits() /* Standardize the feature matrix */ X = StandardScaler().fit_transform(digits.data) /* …

Machine Learning for Beginners in Python: How to Select Date And Time Ranges

Select Date And Time Ranges Preliminaries /* Load library */ import pandas as pd Create pandas Series Time Data /* Create data frame */ df = pd.DataFrame() /* Create datetimes */ df[‘date’] = pd.date_range(‘1/1/2001′, periods=100000, freq=’H’) Select Time Range (Method 1) Use this method if your data frame is not indexed by time. /* Select …

Machine Learning for Beginners in Python: How to Find Rolling Time Window

Rolling Time Window Preliminaries import pandas as pd Create Date Data time_index = pd.date_range(’01/01/2010′, periods=5, freq=’M’) df = pd.DataFrame(index=time_index) df[‘Stock_Price’] = [1,2,3,4,5] Create A Rolling Time Window Of Two Rows df.rolling(window=2).mean() Stock_Price 2010-01-31 NaN 2010-02-28 1.5 2010-03-31 2.5 2010-04-30 3.5 2010-05-31 4.5 /* Identify max value in rolling time window */ df.rolling(window=2).max() Stock_Price 2010-01-31 NaN …

Machine Learning for Beginners in Python: How to Use Lag A Time Feature

Lag A Time Feature Preliminaries import pandas as pd Create Date Data df = pd.DataFrame() df[‘dates’] = pd.date_range(‘1/1/2001′, periods=5, freq=’D’) df[‘stock_price’] = [1.1,2.2,3.3,4.4,5.5] Lag Time Data By One Row df[‘previous_days_stock_price’] = df[‘stock_price’].shift(1) df dates stock_price previous_days_stock_price 0 2001-01-01 1.1 NaN 1 2001-01-02 2.2 1.1 2 2001-01-03 3.3 2.2 3 2001-01-04 4.4 3.3 4 2001-01-05 5.5 …

Machine Learning for Beginners in Python: How to Handle Missing Values In Time Series

Handling Missing Values In Time Series Preliminaries import pandas as pd import numpy as np Create Date Data With Gap In Values time_index = pd.date_range(’01/01/2010′, periods=5, freq=’M’) df = pd.DataFrame(index=time_index) df[‘Sales’] = [1.0,2.0,np.nan,np.nan,5.0] Interpolate Missing Values df.interpolate() Sales 2010-01-31 1.0 2010-02-28 2.0 2010-03-31 3.0 2010-04-30 4.0 2010-05-31 5.0 Forward-fill Missing Values df.ffill() Sales 2010-01-31 1.0 …