SETScholars Team, Author at Towards Advanced Analytics Specialist & Analytics Engineer

Machine Learning for Beginners in Python: How to Use ANOVA F-value For Feature Selection

By SETScholars Team on Monday, May 24, 2021

ANOVA F-value For Feature Selection If the features are categorical, calculate a chi-square statistic between each feature and the target vector. However, if the features are quantitative, compute the ANOVA F-value between each feature and the target vector. The F-value scores examine if, when we group the numerical feature by the target vector, the means …

Data Science Python Example for Beginners Python for Business Analyst Python for Citizen Data Scientist Python Machine Learning

Machine Learning for Beginners in Python: How to Select The Best Number Of Components For TSVD

By SETScholars Team on Monday, May 24, 2021

Selecting The Best Number Of Components For TSVD Preliminaries /* Load libraries */ from sklearn.preprocessing import StandardScaler from sklearn.decomposition import TruncatedSVD from scipy.sparse import csr_matrix from sklearn import datasets import numpy as np Load Digits Data And Make Sparse /* Load the data */ digits = datasets.load_digits() /* Standardize the feature matrix */ X = …

Clustering Data Analytics Data Science Python Example for Beginners Python for Citizen Data Scientist

Machine Learning for Beginners in Python: How to Group Observations Using K-Means Clustering

By SETScholars Team on Monday, May 24, 2021

Group Observations Using K-Means Clustering Preliminaries /* Load libraries */ from sklearn.datasets import make_blobs from sklearn.cluster import KMeans import pandas as pd Create Data /* Make simulated feature matrix */ X, _ = make_blobs(n_samples = 50, n_features = 2, centers = 3, random_state = 1) /* Create DataFrame */ df = pd.DataFrame(X, columns=[‘feature_1′,’feature_2’]) Train Clusterer …

Python Machine Learning Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: Dimensionality Reduction With PCA

By SETScholars Team on Monday, May 24, 2021

Dimensionality Reduction With PCA Preliminaries /* Load libraries */ from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn import datasets Load Data /* Load the data */ digits = datasets.load_digits() Standardize Feature Values /* Standardize the feature matrix */ X = StandardScaler().fit_transform(digits.data) Conduct Principal Component Analysis /* Create a PCA that will retain 99% …

Python Machine Learning Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: Dimensionality Reduction With Kernel PCA

By SETScholars Team on Monday, May 24, 2021

Dimensionality Reduction With Kernel PCA Preliminaries /* Load libraries */ from sklearn.decomposition import PCA, KernelPCA from sklearn.datasets import make_circles Create Linearly Inseparable Data /* Create linearly inseparable data */ X, _ = make_circles(n_samples=1000, random_state=1, noise=0.1, factor=0.1) Conduct Kernel PCA /* Apply kernal PCA with radius basis function (RBF) kernel */ kpca = KernelPCA(kernel=”rbf”, gamma=15, n_components=1) …

Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: Dimensionality Reduction On Sparse Feature Matrix

By SETScholars Team on Monday, May 24, 2021

Dimensionality Reduction On Sparse Feature Matrix Preliminaries /* Load libraries */ from sklearn.preprocessing import StandardScaler from sklearn.decomposition import TruncatedSVD from scipy.sparse import csr_matrix from sklearn import datasets import numpy as np Load Digits Data And Make Sparse /* Load the data */ digits = datasets.load_digits() /* Standardize the feature matrix */ X = StandardScaler().fit_transform(digits.data) /* …

Python Example for Beginners Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: How to Select Date And Time Ranges

By SETScholars Team on Monday, May 24, 2021

Select Date And Time Ranges Preliminaries /* Load library */ import pandas as pd Create pandas Series Time Data /* Create data frame */ df = pd.DataFrame() /* Create datetimes */ df[‘date’] = pd.date_range(‘1/1/2001′, periods=100000, freq=’H’) Select Time Range (Method 1) Use this method if your data frame is not indexed by time. /* Select …

Python for Business Analyst Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: How to Find Rolling Time Window

By SETScholars Team on Monday, May 24, 2021

Rolling Time Window Preliminaries import pandas as pd Create Date Data time_index = pd.date_range(’01/01/2010′, periods=5, freq=’M’) df = pd.DataFrame(index=time_index) df[‘Stock_Price’] = [1,2,3,4,5] Create A Rolling Time Window Of Two Rows df.rolling(window=2).mean() Stock_Price 2010-01-31 NaN 2010-02-28 1.5 2010-03-31 2.5 2010-04-30 3.5 2010-05-31 4.5 /* Identify max value in rolling time window */ df.rolling(window=2).max() Stock_Price 2010-01-31 NaN …

Python Python Example for Beginners Python Machine Learning Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: How to Use Lag A Time Feature

By SETScholars Team on Sunday, May 23, 2021

Lag A Time Feature Preliminaries import pandas as pd Create Date Data df = pd.DataFrame() df[‘dates’] = pd.date_range(‘1/1/2001′, periods=5, freq=’D’) df[‘stock_price’] = [1.1,2.2,3.3,4.4,5.5] Lag Time Data By One Row df[‘previous_days_stock_price’] = df[‘stock_price’].shift(1) df dates stock_price previous_days_stock_price 0 2001-01-01 1.1 NaN 1 2001-01-02 2.2 1.1 2 2001-01-03 3.3 2.2 3 2001-01-04 4.4 3.3 4 2001-01-05 5.5 …

Python Python Time Series Forecasting Time Series Forecasting

Machine Learning for Beginners in Python: How to Handle Missing Values In Time Series

By SETScholars Team on Sunday, May 23, 2021

Handling Missing Values In Time Series Preliminaries import pandas as pd import numpy as np Create Date Data With Gap In Values time_index = pd.date_range(’01/01/2010′, periods=5, freq=’M’) df = pd.DataFrame(index=time_index) df[‘Sales’] = [1.0,2.0,np.nan,np.nan,5.0] Interpolate Missing Values df.interpolate() Sales 2010-01-31 1.0 2010-02-28 2.0 2010-03-31 3.0 2010-04-30 4.0 2010-05-31 5.0 Forward-fill Missing Values df.ffill() Sales 2010-01-31 1.0 …

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30