For more projects visit: https://setscholars.net
# Suppress warnings in Jupyter Notebooks
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from pycaret.regression import *
# provide the dataset name as shown in pycaret
whichDataset = 'concrete'
from pycaret.datasets import get_data
dataset = get_data(whichDataset)
Cement (component 1)(kg in a m^3 mixture) | Blast Furnace Slag (component 2)(kg in a m^3 mixture) | Fly Ash (component 3)(kg in a m^3 mixture) | Water (component 4)(kg in a m^3 mixture) | Superplasticizer (component 5)(kg in a m^3 mixture) | Coarse Aggregate (component 6)(kg in a m^3 mixture) | Fine Aggregate (component 7)(kg in a m^3 mixture) | Age (day) | strength | |
---|---|---|---|---|---|---|---|---|---|
0 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1040.0 | 676.0 | 28 | 79.99 |
1 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1055.0 | 676.0 | 28 | 61.89 |
2 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 40.27 |
3 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 365 | 41.05 |
4 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 360 | 44.30 |
dataset.shape
(1030, 9)
dataset.columns.to_list()
['Cement (component 1)(kg in a m^3 mixture)', 'Blast Furnace Slag (component 2)(kg in a m^3 mixture)', 'Fly Ash (component 3)(kg in a m^3 mixture)', 'Water (component 4)(kg in a m^3 mixture)', 'Superplasticizer (component 5)(kg in a m^3 mixture)', 'Coarse Aggregate (component 6)(kg in a m^3 mixture)', 'Fine Aggregate (component 7)(kg in a m^3 mixture)', 'Age (day)', 'strength']
data = dataset.sample(frac=0.75, random_state=1234)
data_unseen = dataset.drop(data.index)
data.reset_index(inplace=True, drop=True)
data_unseen.reset_index(inplace=True, drop=True)
print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))
Data for Modeling: (772, 9) Unseen Data For Predictions: (258, 9)
env_setup = setup(data = data, target = 'strength', session_id=1234)
Description | Value | |
---|---|---|
0 | session_id | 1234 |
1 | Target | strength |
2 | Original Data | (772, 9) |
3 | Missing Values | False |
4 | Numeric Features | 7 |
5 | Categorical Features | 1 |
6 | Ordinal Features | False |
7 | High Cardinality Features | False |
8 | High Cardinality Method | None |
9 | Transformed Train Set | (540, 20) |
10 | Transformed Test Set | (232, 20) |
11 | Shuffle Train-Test | True |
12 | Stratify Train-Test | False |
13 | Fold Generator | KFold |
14 | Fold Number | 10 |
15 | CPU Jobs | -1 |
16 | Use GPU | False |
17 | Log Experiment | False |
18 | Experiment Name | reg-default-name |
19 | USI | 5970 |
20 | Imputation Type | simple |
21 | Iterative Imputation Iteration | None |
22 | Numeric Imputer | mean |
23 | Iterative Imputation Numeric Model | None |
24 | Categorical Imputer | constant |
25 | Iterative Imputation Categorical Model | None |
26 | Unknown Categoricals Handling | least_frequent |
27 | Normalize | False |
28 | Normalize Method | None |
29 | Transformation | False |
30 | Transformation Method | None |
31 | PCA | False |
32 | PCA Method | None |
33 | PCA Components | None |
34 | Ignore Low Variance | False |
35 | Combine Rare Levels | False |
36 | Rare Level Threshold | None |
37 | Numeric Binning | False |
38 | Remove Outliers | False |
39 | Outliers Threshold | None |
40 | Remove Multicollinearity | False |
41 | Multicollinearity Threshold | None |
42 | Remove Perfect Collinearity | True |
43 | Clustering | False |
44 | Clustering Iteration | None |
45 | Polynomial Features | False |
46 | Polynomial Degree | None |
47 | Trignometry Features | False |
48 | Polynomial Threshold | None |
49 | Group Features | False |
50 | Feature Selection | False |
51 | Feature Selection Method | classic |
52 | Features Selection Threshold | None |
53 | Feature Interaction | False |
54 | Feature Ratio | False |
55 | Interaction Threshold | None |
56 | Transform Target | False |
57 | Transform Target Method | box-cox |
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter('ignore')
# --------------------------------------
best_model = compare_models()
# --------------------------------------
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | TT (Sec) | |
---|---|---|---|---|---|---|---|---|
catboost | CatBoost Regressor | 3.3081 | 23.5976 | 4.7792 | 0.9123 | 0.1490 | 0.1142 | 0.9050 |
lightgbm | Light Gradient Boosting Machine | 3.7035 | 27.2496 | 5.1060 | 0.9008 | 0.1595 | 0.1275 | 0.0200 |
xgboost | Extreme Gradient Boosting | 3.5166 | 27.4312 | 5.1828 | 0.8987 | 0.1589 | 0.1206 | 0.1860 |
et | Extra Trees Regressor | 3.7982 | 30.7787 | 5.4529 | 0.8875 | 0.1692 | 0.1298 | 0.2370 |
rf | Random Forest Regressor | 4.2099 | 34.0448 | 5.7267 | 0.8765 | 0.1798 | 0.1459 | 0.2790 |
gbr | Gradient Boosting Regressor | 4.6607 | 38.7036 | 6.1679 | 0.8589 | 0.1966 | 0.1594 | 0.0430 |
lr | Linear Regression | 5.5368 | 51.5198 | 7.1534 | 0.8106 | 0.2378 | 0.1863 | 0.2260 |
br | Bayesian Ridge | 5.5244 | 51.4663 | 7.1504 | 0.8105 | 0.2352 | 0.1853 | 0.0080 |
ridge | Ridge Regression | 5.5228 | 51.5066 | 7.1533 | 0.8102 | 0.2341 | 0.1849 | 0.0080 |
dt | Decision Tree Regressor | 5.7210 | 71.7409 | 8.2959 | 0.7409 | 0.2442 | 0.1934 | 0.0080 |
ada | AdaBoost Regressor | 7.3015 | 78.3505 | 8.8219 | 0.7139 | 0.3286 | 0.3235 | 0.0430 |
lasso | Lasso Regression | 7.9346 | 98.4803 | 9.8864 | 0.6374 | 0.3062 | 0.2880 | 0.0070 |
huber | Huber Regressor | 7.8529 | 100.7622 | 9.9746 | 0.6225 | 0.3160 | 0.2940 | 0.0220 |
en | Elastic Net | 9.5990 | 141.9073 | 11.8764 | 0.4782 | 0.3915 | 0.3899 | 0.0080 |
knn | K Neighbors Regressor | 10.3212 | 165.0389 | 12.8005 | 0.3992 | 0.4207 | 0.4207 | 0.0360 |
omp | Orthogonal Matching Pursuit | 10.1064 | 166.7765 | 12.8551 | 0.3917 | 0.3823 | 0.3699 | 0.0070 |
par | Passive Aggressive Regressor | 13.4382 | 272.7271 | 16.2721 | -0.0092 | 0.5326 | 0.5826 | 0.0080 |
llar | Lasso Least Angle Regression | 13.6220 | 289.1778 | 16.9009 | -0.0305 | 0.5456 | 0.6238 | 0.0070 |
dummy | Dummy Regressor | 13.6220 | 289.1778 | 16.9009 | -0.0305 | 0.5456 | 0.6238 | 0.0050 |
lar | Least Angle Regression | 17133.5708 | 4008150932.3915 | 20028.0485 | -20750010.4767 | 1.1110 | 724.0069 | 0.0080 |
model_1 = create_model('xgboost')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 4.1253 | 31.5405 | 5.6161 | 0.8649 | 0.1544 | 0.1206 |
1 | 3.8673 | 44.0848 | 6.6396 | 0.8736 | 0.1845 | 0.1306 |
2 | 3.6856 | 35.9839 | 5.9987 | 0.8823 | 0.1581 | 0.1143 |
3 | 3.5632 | 25.4533 | 5.0451 | 0.9212 | 0.1641 | 0.1320 |
4 | 2.9107 | 19.7551 | 4.4447 | 0.9469 | 0.1619 | 0.1131 |
5 | 3.2926 | 24.3741 | 4.9370 | 0.8738 | 0.1524 | 0.1155 |
6 | 3.8147 | 33.9729 | 5.8286 | 0.8741 | 0.1594 | 0.1215 |
7 | 3.4773 | 19.2427 | 4.3867 | 0.9388 | 0.1741 | 0.1462 |
8 | 3.2783 | 20.7091 | 4.5507 | 0.9304 | 0.1503 | 0.1143 |
9 | 3.1512 | 19.1955 | 4.3813 | 0.8813 | 0.1297 | 0.0980 |
Mean | 3.5166 | 27.4312 | 5.1828 | 0.8987 | 0.1589 | 0.1206 |
SD | 0.3496 | 8.1389 | 0.7545 | 0.0300 | 0.0138 | 0.0125 |
tuned_model_1 = tune_model(model_1)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 4.5145 | 39.4382 | 6.2800 | 0.8310 | 0.1686 | 0.1306 |
1 | 4.6122 | 50.4473 | 7.1026 | 0.8554 | 0.2586 | 0.1633 |
2 | 3.6108 | 27.1546 | 5.2110 | 0.9112 | 0.1469 | 0.1158 |
3 | 3.6696 | 24.7697 | 4.9769 | 0.9234 | 0.1589 | 0.1266 |
4 | 3.8490 | 24.9892 | 4.9989 | 0.9328 | 0.1976 | 0.1440 |
5 | 3.9740 | 30.0771 | 5.4843 | 0.8443 | 0.1692 | 0.1353 |
6 | 3.8442 | 32.4002 | 5.6921 | 0.8799 | 0.2450 | 0.1355 |
7 | 3.5480 | 18.7706 | 4.3325 | 0.9403 | 0.1648 | 0.1395 |
8 | 3.6130 | 24.5488 | 4.9547 | 0.9175 | 0.1890 | 0.1346 |
9 | 3.8271 | 27.7308 | 5.2660 | 0.8285 | 0.1650 | 0.1253 |
Mean | 3.9063 | 30.0326 | 5.4299 | 0.8864 | 0.1864 | 0.1351 |
SD | 0.3527 | 8.5700 | 0.7408 | 0.0415 | 0.0355 | 0.0121 |
print(tuned_model_1)
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.4, max_delta_step=0, max_depth=3, min_child_weight=3, missing=nan, monotone_constraints='()', n_estimators=170, n_jobs=-1, num_parallel_tree=1, objective='reg:squarederror', random_state=1234, reg_alpha=0.0005, reg_lambda=1e-07, scale_pos_weight=45.400000000000006, subsample=0.9, tree_method='auto', validate_parameters=1, verbosity=0)
plot_model(tuned_model_1, plot = 'residuals')
plot_model(tuned_model_1, plot = 'error')
plot_model(tuned_model_1, plot='feature')
plot_model(tuned_model_1, plot = 'learning')
plot_model(tuned_model_1, plot = 'vc')
#plot_model(tuned_model_1, plot = 'rfe')
model_2 = create_model('rf')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 4.7534 | 37.6110 | 6.1328 | 0.8389 | 0.1672 | 0.1400 |
1 | 5.2548 | 59.4816 | 7.7124 | 0.8295 | 0.2027 | 0.1632 |
2 | 4.7735 | 48.7642 | 6.9831 | 0.8405 | 0.1894 | 0.1483 |
3 | 3.6794 | 26.2264 | 5.1212 | 0.9189 | 0.1978 | 0.1524 |
4 | 3.7123 | 22.0722 | 4.6981 | 0.9406 | 0.1964 | 0.1530 |
5 | 3.2670 | 18.9080 | 4.3483 | 0.9021 | 0.1337 | 0.1089 |
6 | 4.8822 | 50.7829 | 7.1262 | 0.8118 | 0.1875 | 0.1542 |
7 | 3.9527 | 25.4898 | 5.0487 | 0.9189 | 0.2034 | 0.1698 |
8 | 4.1592 | 28.2877 | 5.3186 | 0.9049 | 0.1753 | 0.1495 |
9 | 3.6642 | 22.8242 | 4.7775 | 0.8589 | 0.1444 | 0.1193 |
Mean | 4.2099 | 34.0448 | 5.7267 | 0.8765 | 0.1798 | 0.1459 |
SD | 0.6278 | 13.4996 | 1.1179 | 0.0431 | 0.0232 | 0.0178 |
tuned_model_2 = tune_model(model_2)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 5.6317 | 45.4093 | 6.7386 | 0.8055 | 0.1816 | 0.1648 |
1 | 6.8788 | 76.0995 | 8.7235 | 0.7819 | 0.2354 | 0.2170 |
2 | 6.3207 | 72.1459 | 8.4939 | 0.7640 | 0.2485 | 0.2115 |
3 | 6.0385 | 57.5136 | 7.5838 | 0.8221 | 0.2980 | 0.2735 |
4 | 6.0995 | 56.8440 | 7.5395 | 0.8471 | 0.2785 | 0.2480 |
5 | 4.6515 | 33.6530 | 5.8011 | 0.8258 | 0.1874 | 0.1710 |
6 | 6.4010 | 69.7509 | 8.3517 | 0.7415 | 0.2698 | 0.2452 |
7 | 6.2975 | 57.4959 | 7.5826 | 0.8171 | 0.3385 | 0.3356 |
8 | 5.1165 | 40.8426 | 6.3908 | 0.8627 | 0.2325 | 0.2075 |
9 | 5.4205 | 48.2074 | 6.9432 | 0.7019 | 0.2269 | 0.2003 |
Mean | 5.8856 | 55.7962 | 7.4149 | 0.7970 | 0.2497 | 0.2274 |
SD | 0.6384 | 13.2688 | 0.9033 | 0.0471 | 0.0458 | 0.0481 |
print(tuned_model_2)
RandomForestRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse', max_depth=10, max_features='sqrt', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=7, min_weight_fraction_leaf=0.0, n_estimators=160, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False)
plot_model(tuned_model_2, plot = 'residuals')
plot_model(tuned_model_2, plot = 'error')
plot_model(tuned_model_2, plot = 'feature')
plot_model(tuned_model_2, plot = 'learning')
plot_model(tuned_model_2, plot = 'vc')
#plot_model(tuned_rf, plot = 'rfe')
predict_model(tuned_model_1);
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|---|
0 | Extreme Gradient Boosting | 3.6522 | 28.2654 | 5.3165 | 0.8996 | 0.2019 | 0.1426 |
predict_model(tuned_model_2);
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|---|
0 | Random Forest Regressor | 5.7623 | 53.5552 | 7.3181 | 0.8098 | 0.2777 | 0.2544 |
final_model = finalize_model(tuned_model_1);
# Final model parameters for deployment
print(final_model)
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.4, max_delta_step=0, max_depth=3, min_child_weight=3, missing=nan, monotone_constraints='()', n_estimators=170, n_jobs=-1, num_parallel_tree=1, objective='reg:squarederror', random_state=1234, reg_alpha=0.0005, reg_lambda=1e-07, scale_pos_weight=45.400000000000006, subsample=0.9, tree_method='auto', validate_parameters=1, verbosity=0)
predict_model(final_model);
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|---|
0 | Extreme Gradient Boosting | 1.4211 | 4.0471 | 2.0117 | 0.9856 | 0.0872 | 0.0575 |
unseen_predictions = predict_model(final_model, data=data_unseen)
unseen_predictions.head()
Cement (component 1)(kg in a m^3 mixture) | Blast Furnace Slag (component 2)(kg in a m^3 mixture) | Fly Ash (component 3)(kg in a m^3 mixture) | Water (component 4)(kg in a m^3 mixture) | Superplasticizer (component 5)(kg in a m^3 mixture) | Coarse Aggregate (component 6)(kg in a m^3 mixture) | Fine Aggregate (component 7)(kg in a m^3 mixture) | Age (day) | strength | Label | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1055.0 | 676.0 | 28 | 61.89 | 75.399544 |
1 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 365 | 41.05 | 41.823566 |
2 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 360 | 44.30 | 37.464207 |
3 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 90 | 38.07 | 36.305885 |
4 | 427.5 | 47.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 43.01 | 43.081570 |
from pycaret.utils import check_metric
check_metric(unseen_predictions['strength'], unseen_predictions['Label'], metric = 'R2')
0.8783
check_metric(unseen_predictions['strength'], unseen_predictions['Label'], metric = 'MAE')
3.5879
check_metric(unseen_predictions['strength'], unseen_predictions['Label'], metric = 'MSE')
30.8907
check_metric(unseen_predictions['strength'], unseen_predictions['Label'], metric = 'RMSE')
5.5579
check_metric(unseen_predictions['strength'], unseen_predictions['Label'], metric = 'MAPE')
0.1267
save_model(final_model,'Final_Model')
Transformation Pipeline and Model Successfully Saved
(Pipeline(memory=None, steps=[('dtypes', DataTypes_Auto_infer(categorical_features=[], display_types=True, features_todrop=[], id_columns=[], ml_usecase='regression', numerical_features=[], target='strength', time_features=[])), ('imputer', Simple_Imputer(categorical_strategy='not_available', fill_value_categorical=None, fill_value_numerical=None, numeric_strateg... interaction_constraints='', learning_rate=0.4, max_delta_step=0, max_depth=3, min_child_weight=3, missing=nan, monotone_constraints='()', n_estimators=170, n_jobs=-1, num_parallel_tree=1, objective='reg:squarederror', random_state=1234, reg_alpha=0.0005, reg_lambda=1e-07, scale_pos_weight=45.400000000000006, subsample=0.9, tree_method='auto', validate_parameters=1, verbosity=0)]], verbose=False), 'Final_Model.pkl')
load_saved_model = load_model('Final_Model')
Transformation Pipeline and Model Successfully Loaded
new_prediction = predict_model(load_saved_model, data=data_unseen)
#new_prediction.head(10)
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from pycaret.regression import *
# provide the dataset name as shown in pycaret
whichDataset = 'concrete'
from pycaret.datasets import get_data
dataset = get_data(whichDataset)
data = dataset.sample(frac=0.70, random_state=421)
data_unseen = dataset.drop(data.index)
data.reset_index(inplace=True, drop=True)
data_unseen.reset_index(inplace=True, drop=True)
print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))
Cement (component 1)(kg in a m^3 mixture) | Blast Furnace Slag (component 2)(kg in a m^3 mixture) | Fly Ash (component 3)(kg in a m^3 mixture) | Water (component 4)(kg in a m^3 mixture) | Superplasticizer (component 5)(kg in a m^3 mixture) | Coarse Aggregate (component 6)(kg in a m^3 mixture) | Fine Aggregate (component 7)(kg in a m^3 mixture) | Age (day) | strength | |
---|---|---|---|---|---|---|---|---|---|
0 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1040.0 | 676.0 | 28 | 79.99 |
1 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1055.0 | 676.0 | 28 | 61.89 |
2 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 40.27 |
3 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 365 | 41.05 |
4 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 360 | 44.30 |
Data for Modeling: (721, 9) Unseen Data For Predictions: (309, 9)
clf = setup(data = data, target = 'strength', session_id=1234)
Description | Value | |
---|---|---|
0 | session_id | 1234 |
1 | Target | strength |
2 | Original Data | (721, 9) |
3 | Missing Values | False |
4 | Numeric Features | 7 |
5 | Categorical Features | 1 |
6 | Ordinal Features | False |
7 | High Cardinality Features | False |
8 | High Cardinality Method | None |
9 | Transformed Train Set | (504, 20) |
10 | Transformed Test Set | (217, 20) |
11 | Shuffle Train-Test | True |
12 | Stratify Train-Test | False |
13 | Fold Generator | KFold |
14 | Fold Number | 10 |
15 | CPU Jobs | -1 |
16 | Use GPU | False |
17 | Log Experiment | False |
18 | Experiment Name | reg-default-name |
19 | USI | 7395 |
20 | Imputation Type | simple |
21 | Iterative Imputation Iteration | None |
22 | Numeric Imputer | mean |
23 | Iterative Imputation Numeric Model | None |
24 | Categorical Imputer | constant |
25 | Iterative Imputation Categorical Model | None |
26 | Unknown Categoricals Handling | least_frequent |
27 | Normalize | False |
28 | Normalize Method | None |
29 | Transformation | False |
30 | Transformation Method | None |
31 | PCA | False |
32 | PCA Method | None |
33 | PCA Components | None |
34 | Ignore Low Variance | False |
35 | Combine Rare Levels | False |
36 | Rare Level Threshold | None |
37 | Numeric Binning | False |
38 | Remove Outliers | False |
39 | Outliers Threshold | None |
40 | Remove Multicollinearity | False |
41 | Multicollinearity Threshold | None |
42 | Remove Perfect Collinearity | True |
43 | Clustering | False |
44 | Clustering Iteration | None |
45 | Polynomial Features | False |
46 | Polynomial Degree | None |
47 | Trignometry Features | False |
48 | Polynomial Threshold | None |
49 | Group Features | False |
50 | Feature Selection | False |
51 | Feature Selection Method | classic |
52 | Features Selection Threshold | None |
53 | Feature Interaction | False |
54 | Feature Ratio | False |
55 | Interaction Threshold | None |
56 | Transform Target | False |
57 | Transform Target Method | box-cox |
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter('ignore')
# compare all baseline models and select top 5
top_models = compare_models(n_select = 5)
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | TT (Sec) | |
---|---|---|---|---|---|---|---|---|
catboost | CatBoost Regressor | 3.4790 | 22.9442 | 4.7592 | 0.9119 | 0.1533 | 0.1219 | 1.0090 |
et | Extra Trees Regressor | 3.7646 | 27.9232 | 5.2142 | 0.8923 | 0.1677 | 0.1289 | 0.2350 |
lightgbm | Light Gradient Boosting Machine | 3.9649 | 27.9865 | 5.2424 | 0.8915 | 0.1685 | 0.1365 | 0.0180 |
xgboost | Extreme Gradient Boosting | 3.9348 | 30.6135 | 5.4538 | 0.8842 | 0.1799 | 0.1378 | 0.1850 |
rf | Random Forest Regressor | 4.2409 | 31.7762 | 5.5845 | 0.8772 | 0.1785 | 0.1474 | 0.2640 |
gbr | Gradient Boosting Regressor | 4.4358 | 32.9589 | 5.7035 | 0.8747 | 0.1790 | 0.1535 | 0.0420 |
ridge | Ridge Regression | 5.4886 | 49.4217 | 7.0052 | 0.8143 | 0.2371 | 0.1899 | 0.0070 |
br | Bayesian Ridge | 5.4999 | 49.5873 | 7.0156 | 0.8138 | 0.2395 | 0.1904 | 0.0070 |
lr | Linear Regression | 5.5213 | 49.9405 | 7.0376 | 0.8128 | 0.2436 | 0.1913 | 0.2170 |
dt | Decision Tree Regressor | 5.1600 | 58.4876 | 7.5304 | 0.7726 | 0.2372 | 0.1730 | 0.0070 |
ada | AdaBoost Regressor | 6.9720 | 70.2291 | 8.3597 | 0.7296 | 0.3121 | 0.2984 | 0.0430 |
lasso | Lasso Regression | 7.8978 | 98.2398 | 9.8787 | 0.6244 | 0.3049 | 0.2842 | 0.0080 |
huber | Huber Regressor | 8.1987 | 109.4875 | 10.4337 | 0.5743 | 0.3258 | 0.3009 | 0.0220 |
en | Elastic Net | 9.3672 | 136.7431 | 11.6594 | 0.4727 | 0.3764 | 0.3645 | 0.0080 |
knn | K Neighbors Regressor | 9.9259 | 151.4503 | 12.2761 | 0.4163 | 0.3919 | 0.3822 | 0.0360 |
omp | Orthogonal Matching Pursuit | 9.8286 | 154.1181 | 12.3810 | 0.4154 | 0.3791 | 0.3658 | 0.0070 |
par | Passive Aggressive Regressor | 11.0046 | 183.3445 | 13.4902 | 0.2906 | 0.4315 | 0.4402 | 0.0070 |
llar | Lasso Least Angle Regression | 13.2987 | 274.6899 | 16.5129 | -0.0264 | 0.5278 | 0.5758 | 0.0070 |
dummy | Dummy Regressor | 13.2987 | 274.6899 | 16.5129 | -0.0264 | 0.5278 | 0.5758 | 0.0060 |
lar | Least Angle Regression | 14.4864 | 764.2362 | 19.5198 | -1.4590 | 0.5006 | 0.5359 | 0.0080 |
top_models
[<catboost.core.CatBoostRegressor at 0x7ff57d191810>, ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False), LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=1234, reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0), XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=-1, num_parallel_tree=1, objective='reg:squarederror', random_state=1234, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='auto', validate_parameters=1, verbosity=0), RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False)]
# tune top base models
tuned_top_models = [tune_model(i) for i in top_models]
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 5.3340 | 44.4179 | 6.6647 | 0.8467 | 0.2192 | 0.1883 |
1 | 6.0770 | 56.8739 | 7.5415 | 0.8142 | 0.3636 | 0.3491 |
2 | 5.7597 | 51.1121 | 7.1493 | 0.7801 | 0.2741 | 0.2568 |
3 | 5.5458 | 49.8498 | 7.0604 | 0.7327 | 0.2374 | 0.2143 |
4 | 6.3176 | 62.2563 | 7.8903 | 0.8155 | 0.2251 | 0.2064 |
5 | 6.0436 | 71.1780 | 8.4367 | 0.7457 | 0.2075 | 0.1828 |
6 | 5.3819 | 45.6436 | 6.7560 | 0.7971 | 0.2307 | 0.2099 |
7 | 6.1910 | 64.8687 | 8.0541 | 0.7640 | 0.2601 | 0.2220 |
8 | 5.8985 | 55.8135 | 7.4708 | 0.8244 | 0.2261 | 0.1934 |
9 | 5.7458 | 50.9356 | 7.1369 | 0.7766 | 0.2773 | 0.2496 |
Mean | 5.8295 | 55.2949 | 7.4161 | 0.7897 | 0.2521 | 0.2273 |
SD | 0.3192 | 8.2022 | 0.5448 | 0.0345 | 0.0433 | 0.0467 |
tuned_top_models
[<catboost.core.CatBoostRegressor at 0x7ff5a562f1d0>, ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse', max_depth=10, max_features='sqrt', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=7, min_weight_fraction_leaf=0.0, n_estimators=160, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False), LGBMRegressor(bagging_fraction=0.9, bagging_freq=0, boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, feature_fraction=0.8, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=36, min_child_weight=0.001, min_split_gain=0.1, n_estimators=30, n_jobs=-1, num_leaves=100, objective=None, random_state=1234, reg_alpha=0.005, reg_lambda=0.05, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0), XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.4, max_delta_step=0, max_depth=3, min_child_weight=3, missing=nan, monotone_constraints='()', n_estimators=170, n_jobs=-1, num_parallel_tree=1, objective='reg:squarederror', random_state=1234, reg_alpha=0.0005, reg_lambda=1e-07, scale_pos_weight=45.400000000000006, subsample=0.9, tree_method='auto', validate_parameters=1, verbosity=0), RandomForestRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse', max_depth=10, max_features='sqrt', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=7, min_weight_fraction_leaf=0.0, n_estimators=160, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False)]
# ensemble top tuned models
bagged_top_models = [ensemble_model(i) for i in tuned_top_models]
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 6.5350 | 63.8660 | 7.9916 | 0.7796 | 0.2673 | 0.2396 |
1 | 7.3736 | 79.2473 | 8.9021 | 0.7411 | 0.4108 | 0.4187 |
2 | 6.5319 | 67.1506 | 8.1945 | 0.7110 | 0.3123 | 0.2968 |
3 | 6.5592 | 63.2349 | 7.9520 | 0.6610 | 0.2780 | 0.2638 |
4 | 7.5712 | 86.0936 | 9.2787 | 0.7449 | 0.2678 | 0.2520 |
5 | 6.9182 | 86.1921 | 9.2840 | 0.6920 | 0.2399 | 0.2189 |
6 | 6.1638 | 57.3084 | 7.5702 | 0.7453 | 0.2551 | 0.2415 |
7 | 6.9748 | 77.8492 | 8.8232 | 0.7168 | 0.2789 | 0.2460 |
8 | 6.7490 | 73.9250 | 8.5980 | 0.7674 | 0.2593 | 0.2254 |
9 | 6.6436 | 69.9978 | 8.3665 | 0.6930 | 0.3211 | 0.2978 |
Mean | 6.8020 | 72.4865 | 8.4961 | 0.7252 | 0.2890 | 0.2700 |
SD | 0.4000 | 9.3514 | 0.5505 | 0.0351 | 0.0469 | 0.0556 |
bagged_top_models
[BaggingRegressor(base_estimator=<catboost.core.CatBoostRegressor object at 0x7ff57d1685d0>, bootstrap=True, bootstrap_features=False, max_features=1.0, max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False, random_state=1234, verbose=0, warm_start=False), BaggingRegressor(base_estimator=ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse', max_depth=10, max_features='sqrt', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=7, min_weight_fraction_leaf=0.0, n_estimators=160, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False), bootstrap=True, bootstrap_features=False, max_features=1.0, max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False, random_state=1234, verbose=0, warm_start=False), BaggingRegressor(base_estimator=LGBMRegressor(bagging_fraction=0.9, bagging_freq=0, boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, feature_fraction=0.8, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=36, min_child_weight=0.001, min_split_gain=0.1, n_estimators=30, n_jobs=-1, num_leaves=100, objective=None, random_state=1234, reg_alpha=0.005, reg_lambda=0.05, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0), bootstrap=True, bootstrap_features=False, max_features=1.0, max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False, random_state=1234, verbose=0, warm_start=False), BaggingRegressor(base_estimator=XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.4, max_delta_step=0, max_depth=3, min_child_weight=3, missing=nan, monotone_constraints='()', n_estimators=170, n_jobs=-1, num_parallel_tree=1, objective='reg:squarederror', random_state=1234, reg_alpha=0.0005, reg_lambda=1e-07, scale_pos_weight=45.400000000000006, subsample=0.9, tree_method='auto', validate_parameters=1, verbosity=0), bootstrap=True, bootstrap_features=False, max_features=1.0, max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False, random_state=1234, verbose=0, warm_start=False), BaggingRegressor(base_estimator=RandomForestRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse', max_depth=10, max_features='sqrt', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=7, min_weight_fraction_leaf=0.0, n_estimators=160, n_jobs=-1, oob_score=False, random_state=1234, verbose=0, warm_start=False), bootstrap=True, bootstrap_features=False, max_features=1.0, max_samples=1.0, n_estimators=10, n_jobs=None, oob_score=False, random_state=1234, verbose=0, warm_start=False)]
# select best model based on AUC
best1 = automl(optimize = 'R2')
best2 = automl(optimize = 'MAE')
best3 = automl(optimize = 'MSE')
best4 = automl(optimize = 'RMSE')
best5 = automl(optimize = 'MAPE')
print(); print("Best model based on R2: "); print(best1)
print(); print("Best model based on MAE: "); print(best2)
print(); print("Best model based on MSE: "); print(best3)
print(); print("Best model based on RMSE: "); print(best4)
print(); print("Best model based on MAPE: "); print(best5)
Best model based on R2: <catboost.core.CatBoostRegressor object at 0x7ff56f6b6d50> Best model based on MAE: <catboost.core.CatBoostRegressor object at 0x7ff56f731290> Best model based on MSE: <catboost.core.CatBoostRegressor object at 0x7ff57c0469d0> Best model based on RMSE: <catboost.core.CatBoostRegressor object at 0x7ff57c09b350> Best model based on MAPE: <catboost.core.CatBoostRegressor object at 0x7ff56f6943d0>
plot_model(best2, plot = 'residuals')
plot_model(best2, plot = 'error')
plot_model(best2, plot = 'learning')
save_model(best2,'Final_Model')
Transformation Pipeline and Model Successfully Saved
(Pipeline(memory=None, steps=[('dtypes', DataTypes_Auto_infer(categorical_features=[], display_types=True, features_todrop=[], id_columns=[], ml_usecase='regression', numerical_features=[], target='strength', time_features=[])), ('imputer', Simple_Imputer(categorical_strategy='not_available', fill_value_categorical=None, fill_value_numerical=None, numeric_strateg... ('binn', 'passthrough'), ('rem_outliers', 'passthrough'), ('cluster_all', 'passthrough'), ('dummy', Dummify(target='strength')), ('fix_perfect', Remove_100(target='strength')), ('clean_names', Clean_Colum_Names()), ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'), ('dfs', 'passthrough'), ('pca', 'passthrough'), ['trained_model', <catboost.core.CatBoostRegressor object at 0x7ff56f731290>]], verbose=False), 'Final_Model.pkl')
load_saved_model = load_model('Final_Model')
new_prediction = predict_model(load_saved_model, data=data_unseen)
new_prediction.head()
Transformation Pipeline and Model Successfully Loaded
Cement (component 1)(kg in a m^3 mixture) | Blast Furnace Slag (component 2)(kg in a m^3 mixture) | Fly Ash (component 3)(kg in a m^3 mixture) | Water (component 4)(kg in a m^3 mixture) | Superplasticizer (component 5)(kg in a m^3 mixture) | Coarse Aggregate (component 6)(kg in a m^3 mixture) | Fine Aggregate (component 7)(kg in a m^3 mixture) | Age (day) | strength | Label | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 40.27 | 41.803890 |
1 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 360 | 44.30 | 35.595249 |
2 | 266.0 | 114.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 90 | 47.03 | 49.079617 |
3 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 28 | 28.02 | 29.331018 |
4 | 427.5 | 47.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 43.01 | 43.433259 |
In this coding recipe, we discussed how to build a machine learning model in Python using PyCaret.