Applied Data Science Notebook in Python for Beginners to Professionals¶

Data Science Project – How to Tune Algorithm Parameters with Scikit-Learn¶

Machine Learning for Beginners - How to Tune Algorithm Parameters with Scikit-Learn¶

For more projects visit: https://setscholars.net

There are 5000+ free end-to-end applied machine learning and data science projects available to download at SETSscholar. SETScholars is a Science, Engineering and Technology Scholars community.

# Suppress warnings in Jupyter Notebooks
import warnings
warnings.filterwarnings("ignore")

Machine learning models are parameterized so that their behavior can be tuned for a given problem.

Models can have many parameters and finding the best combination of parameters can be treated as a search problem.

In this project, you will discover how to tune the parameters of machine learning algorithms in Python using the scikit-learn library.

Python Codes¶

Machine Learning Algorithm Parameters¶

Algorithm tuning is a final step in the process of applied machine learning before presenting results.

It is sometimes called Hyperparameter optimization where the algorithm parameters are referred to as hyperparameters whereas the coefficients found by the machine learning algorithm itself are referred to as parameters. Optimization suggests the search-nature of the problem.

Grid Search Parameter Tuning¶

Grid search is an approach to parameter tuning that will methodically build and evaluate a model for each combination of algorithm parameters specified in a grid. The recipe below evaluates different alpha values for the Ridge Regression algorithm on the standard diabetes dataset. This is a one-dimensional grid search.

# Grid Search for Algorithm Tuning
import numpy as np
from sklearn import datasets
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

# load the diabetes datasets
dataset = datasets.load_diabetes()

# prepare a range of alpha values to test
alphas = np.array([1,0.1,0.01,0.001,0.0001,0])

# create and fit a ridge regression model, testing each alpha
model = Ridge()
grid = GridSearchCV(estimator=model, param_grid=dict(alpha=alphas))
grid.fit(dataset.data, dataset.target)
print(grid)

# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_.alpha)

GridSearchCV(estimator=Ridge(),
             param_grid={'alpha': array([1.e+00, 1.e-01, 1.e-02, 1.e-03, 1.e-04, 0.e+00])})
0.48232313841634833
0.0001

Random Search Parameter Tuning¶

Random search is an approach to parameter tuning that will sample algorithm parameters from a random distribution (i.e. uniform) for a fixed number of iterations. A model is constructed and evaluated for each combination of parameters chosen.

The recipe below evaluates different alpha random values between 0 and 1 for the Ridge Regression algorithm on the standard diabetes dataset.

# Randomized Search for Algorithm Tuning
import numpy as np
from scipy.stats import uniform as sp_rand
from sklearn import datasets
from sklearn.linear_model import Ridge
from sklearn.model_selection import RandomizedSearchCV

# load the diabetes datasets
dataset = datasets.load_diabetes()

# prepare a uniform distribution to sample for the alpha parameter
param_grid = {'alpha': sp_rand()}

# create and fit a ridge regression model, testing random alpha values
model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
rsearch.fit(dataset.data, dataset.target)
print(rsearch)

# summarize the results of the random parameter search
print(rsearch.best_score_)
print(rsearch.best_estimator_.alpha)

RandomizedSearchCV(estimator=Ridge(), n_iter=100,
                   param_distributions={'alpha': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f7db847bef0>})
0.48210580951342263
0.003051399943641231

Summary¶

Algorithm parameter tuning is an important step for improving algorithm performance right before presenting results or preparing a system for production.

In this project, you discovered algorithm parameter tuning and two methods that you can use right now in Python and the scikit-learn library to improve your algorithm results. Specifically grid search and random search.