Lecture 27 - Parallelising Data Analysis with Dask and AutoML
joblib
library, which is a simple and effective way to parallelise your codedask
, which is currently the de facto standard for parallel computing in PythonA Dask Cluster is a collection of Dask workers that can be used to parallelise your computations
In plain English, a Dask Cluster is a group of computational engines (cores, GPUs, servers, etc) that work together to solve a problem
Workers provide two functions:
Workers are the reason why lazy evaluation speeds up computations
A simple example of workers interacting with a scheduler can help explain how lazy evaluation works:
Scheduler -> Eve: Compute a <- multiply(3, 5)!
Eve -> Scheduler: I've computed a and am holding on to it!
Scheduler -> Frank: Compute b <- add(a, 7)!
Frank: You will need a. Eve has a.
Frank -> Eve: Please send me a.
Eve -> Frank: Sure. a is 15!
Frank -> Scheduler: I've computed b and am holding on to it!
concurrent.futures
libraryconda create -n venv-dask -c conda-forge python=3.10 dask-sql=2024.5.0 dask=2024.4.1 dask-ml=2024.4.4 ipykernel=6.29.3 joblib=1.3.2 numpy=1.26.4 pandas=2.2.1 scikit-learn=1.4.2 tpot=0.12.2 prophet -y
# or if you are using pip
python -m venv venv-dask
pip install dask-sql=2024.5.0 dask=2024.4.1 ipykernel=6.29.3 joblib=1.3.2 numpy=1.26.4 pandas=2.2.1 scikit-learn=1.4.2 prophet tpot=0.12.2 dask-ml=2024.4.4 -y
venv
, you can activate the environment with:venv-dask
environment should have all the necessary packages installed!dask.distributed
requires that you set up a Clientdask.distributed
in your analysisClient
object provides a way to interact with the cluster, submit tasks, and monitor the progress of computationsimport dask.array as da
# Create a random array
x = da.random.RandomState(42).random((10000, 10000), chunks=(1000, 1000))
x
|
99987830.48502485
RandomState
object is used to set a seed numberload_digits
dataset is a well-known dataset in machine learning, containing 1797 8x8 pixel images of handwritten digitsparam_space
: A list of settings to try out for the modelC
: Controls how much to punish mistakes (regularisation, smaller values = more regularisation) to prevent overfitting
np.logspace(-6, 6, 13)
will create a list of 13 values between \(10^{-6}\) and \(10^6\) (!)gamma
: Defines how far the influence of a single example reaches (small points = model is less sensitive to the data)
np.logspace(-8, 8, 17)
will create a list of 17 values between \(10^{-8}\) and \(10^8\) (!!)tol
: Tells the model when to stop trying to improveclass_weight
: Options for handling imbalanced datajoblib.parallel_backend('dask')
to parallelise the searchRandomizedSearchCV
object will try out 50 different combinations of hyperparameters and return the best one# Load the digits dataset
digits = load_digits()
# Define the parameter space to search through
param_space = {
'C': np.logspace(-6, 6, 13),
'gamma': np.logspace(-8, 8, 17),
'tol': np.logspace(-4, -1, 4),
'class_weight': [None, 'balanced'],
}
# Create the model
model = SVC()
search = RandomizedSearchCV(
model,
param_space,
cv=3,
n_iter=50,
verbose=10
)
# Perform the search using Dask
start_time = time.time()
with joblib.parallel_backend('dask'):
search.fit(digits.data, digits.target)
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
# Print the best parameters
print("Best parameters found: ", search.best_params_)
print("Best score: ", search.best_score_)
print("Best estimator: ", search.best_estimator_)
print("Time taken: {:.2f} seconds".format(elapsed_time))
Fitting 3 folds for each of 50 candidates, totalling 150 fits
[CV 3/3; 2/50] START C=0.001, class_weight=None, gamma=100000000.0, tol=0.01....
[CV 1/3; 6/50] START C=0.0001, class_weight=balanced, gamma=100000.0, tol=0.001.
[CV 1/3; 5/50] START C=1e-05, class_weight=None, gamma=1e-08, tol=0.1...........
[CV 1/3; 1/50] START C=1000.0, class_weight=balanced, gamma=10000.0, tol=0.1....
[CV 2/3; 1/50] START C=1000.0, class_weight=balanced, gamma=10000.0, tol=0.1....
[CV 3/3; 4/50] START C=1000000.0, class_weight=balanced, gamma=100000000.0, tol=0.0001
[CV 3/3; 5/50] START C=1e-05, class_weight=None, gamma=1e-08, tol=0.1...........
[CV 1/3; 4/50] START C=1000000.0, class_weight=balanced, gamma=100000000.0, tol=0.0001
[CV 3/3; 2/50] END C=0.001, class_weight=None, gamma=100000000.0, tol=0.01;, score=0.102 total time= 0.1s
[CV 2/3; 4/50] START C=1000000.0, class_weight=balanced, gamma=100000000.0, tol=0.0001
[CV 1/3; 5/50] END C=1e-05, class_weight=None, gamma=1e-08, tol=0.1;, score=0.292 total time= 0.1s
[CV 2/3; 3/50] START C=0.0001, class_weight=balanced, gamma=1000.0, tol=0.01....
[CV 1/3; 1/50] END C=1000.0, class_weight=balanced, gamma=10000.0, tol=0.1;, score=0.100 total time= 0.1s
[CV 1/3; 6/50] END C=0.0001, class_weight=balanced, gamma=100000.0, tol=0.001;, score=0.098 total time= 0.2s
[CV 2/3; 5/50] START C=1e-05, class_weight=None, gamma=1e-08, tol=0.1...........
[CV 1/3; 3/50] START C=0.0001, class_weight=balanced, gamma=1000.0, tol=0.01....
[CV 2/3; 1/50] END C=1000.0, class_weight=balanced, gamma=10000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 1/3; 2/50] START C=0.001, class_weight=None, gamma=100000000.0, tol=0.01....
[CV 3/3; 4/50] END C=1000000.0, class_weight=balanced, gamma=100000000.0, tol=0.0001;, score=0.102 total time= 0.2s
[CV 1/3; 4/50] END C=1000000.0, class_weight=balanced, gamma=100000000.0, tol=0.0001;, score=0.100 total time= 0.2s
[CV 2/3; 2/50] START C=0.001, class_weight=None, gamma=100000000.0, tol=0.01....
[CV 3/3; 6/50] START C=0.0001, class_weight=balanced, gamma=100000.0, tol=0.001.
[CV 3/3; 5/50] END C=1e-05, class_weight=None, gamma=1e-08, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 1/50] START C=1000.0, class_weight=balanced, gamma=10000.0, tol=0.1....
[CV 2/3; 3/50] END C=0.0001, class_weight=balanced, gamma=1000.0, tol=0.01;, score=0.100 total time= 0.2s
[CV 1/3; 8/50] START C=1000000.0, class_weight=None, gamma=10.0, tol=0.01.......
[CV 2/3; 4/50] END C=1000000.0, class_weight=balanced, gamma=100000000.0, tol=0.0001;, score=0.102 total time= 0.2s
[CV 2/3; 8/50] START C=1000000.0, class_weight=None, gamma=10.0, tol=0.01.......
[CV 1/3; 2/50] END C=0.001, class_weight=None, gamma=100000000.0, tol=0.01;, score=0.100 total time= 0.2s
[CV 1/3; 3/50] END C=0.0001, class_weight=balanced, gamma=1000.0, tol=0.01;, score=0.098 total time= 0.2s
[CV 2/3; 6/50] START C=0.0001, class_weight=balanced, gamma=100000.0, tol=0.001.[CV 3/3; 7/50] START C=1.0, class_weight=None, gamma=0.1, tol=0.0001............
[CV 2/3; 5/50] END C=1e-05, class_weight=None, gamma=1e-08, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 3/50] START C=0.0001, class_weight=balanced, gamma=1000.0, tol=0.01....
[CV 3/3; 1/50] END C=1000.0, class_weight=balanced, gamma=10000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 6/50] END C=0.0001, class_weight=balanced, gamma=100000.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 2/3; 7/50] START C=1.0, class_weight=None, gamma=0.1, tol=0.0001............
[CV 2/3; 2/50] END C=0.001, class_weight=None, gamma=100000000.0, tol=0.01;, score=0.102 total time= 0.3s
[CV 1/3; 7/50] START C=1.0, class_weight=None, gamma=0.1, tol=0.0001............
[CV 3/3; 8/50] START C=1000000.0, class_weight=None, gamma=10.0, tol=0.01.......
[CV 2/3; 8/50] END C=1000000.0, class_weight=None, gamma=10.0, tol=0.01;, score=0.102 total time= 0.3s
[CV 2/3; 9/50] START C=1.0, class_weight=balanced, gamma=1e-07, tol=0.001.......
[CV 3/3; 7/50] END C=1.0, class_weight=None, gamma=0.1, tol=0.0001;, score=0.102 total time= 0.3s
[CV 3/3; 9/50] START C=1.0, class_weight=balanced, gamma=1e-07, tol=0.001.......
[CV 2/3; 6/50] END C=0.0001, class_weight=balanced, gamma=100000.0, tol=0.001;, score=0.100 total time= 0.3s
[CV 1/3; 10/50] START C=0.0001, class_weight=balanced, gamma=0.1, tol=0.0001....
[CV 3/3; 3/50] END C=0.0001, class_weight=balanced, gamma=1000.0, tol=0.01;, score=0.100 total time= 0.3s
[CV 1/3; 11/50] START C=100000.0, class_weight=None, gamma=1.0, tol=0.1.........
[CV 1/3; 7/50] END C=1.0, class_weight=None, gamma=0.1, tol=0.0001;, score=0.102 total time= 0.2s
[CV 3/3; 10/50] START C=0.0001, class_weight=balanced, gamma=0.1, tol=0.0001....
[CV 2/3; 7/50] END C=1.0, class_weight=None, gamma=0.1, tol=0.0001;, score=0.102 total time= 0.3s
[CV 3/3; 8/50] END C=1000000.0, class_weight=None, gamma=10.0, tol=0.01;, score=0.102 total time= 0.3s
[CV 2/3; 10/50] START C=0.0001, class_weight=balanced, gamma=0.1, tol=0.0001....
[CV 2/3; 11/50] START C=100000.0, class_weight=None, gamma=1.0, tol=0.1.........
[CV 1/3; 8/50] END C=1000000.0, class_weight=None, gamma=10.0, tol=0.01;, score=0.100 total time= 0.4s
[CV 1/3; 9/50] START C=1.0, class_weight=balanced, gamma=1e-07, tol=0.001.......
[CV 2/3; 9/50] END C=1.0, class_weight=balanced, gamma=1e-07, tol=0.001;, score=0.098 total time= 0.2s
[CV 2/3; 13/50] START C=0.001, class_weight=None, gamma=1000.0, tol=0.001.......
[CV 1/3; 11/50] END C=100000.0, class_weight=None, gamma=1.0, tol=0.1;, score=0.202 total time= 0.2s
[CV 3/3; 10/50] END C=0.0001, class_weight=balanced, gamma=0.1, tol=0.0001;, score=0.098 total time= 0.1s
[CV 3/3; 13/50] START C=0.001, class_weight=None, gamma=1000.0, tol=0.001.......
[CV 1/3; 13/50] START C=0.001, class_weight=None, gamma=1000.0, tol=0.001.......
[CV 1/3; 10/50] END C=0.0001, class_weight=balanced, gamma=0.1, tol=0.0001;, score=0.098 total time= 0.2s
[CV 1/3; 14/50] START C=0.01, class_weight=None, gamma=0.1, tol=0.001...........
[CV 2/3; 11/50] END C=100000.0, class_weight=None, gamma=1.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 9/50] END C=1.0, class_weight=balanced, gamma=1e-07, tol=0.001;, score=0.098 total time= 0.2s
[CV 2/3; 12/50] START C=1000.0, class_weight=None, gamma=1000000.0, tol=0.1.....
[CV 3/3; 11/50] START C=100000.0, class_weight=None, gamma=1.0, tol=0.1.........
[CV 1/3; 9/50] END C=1.0, class_weight=balanced, gamma=1e-07, tol=0.001;, score=0.197 total time= 0.2s
[CV 1/3; 12/50] START C=1000.0, class_weight=None, gamma=1000000.0, tol=0.1.....
[CV 2/3; 10/50] END C=0.0001, class_weight=balanced, gamma=0.1, tol=0.0001;, score=0.100 total time= 0.2s
[CV 3/3; 12/50] START C=1000.0, class_weight=None, gamma=1000000.0, tol=0.1.....
[CV 3/3; 13/50] END C=0.001, class_weight=None, gamma=1000.0, tol=0.001;, score=0.102 total time= 0.1s
[CV 3/3; 16/50] START C=0.0001, class_weight=None, gamma=1e-06, tol=0.0001......
[CV 2/3; 13/50] END C=0.001, class_weight=None, gamma=1000.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 16/50] START C=0.0001, class_weight=None, gamma=1e-06, tol=0.0001......
[CV 1/3; 13/50] END C=0.001, class_weight=None, gamma=1000.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 3/3; 15/50] START C=0.1, class_weight=balanced, gamma=1000.0, tol=0.001.....
[CV 1/3; 12/50] END C=1000.0, class_weight=None, gamma=1000000.0, tol=0.1;, score=0.100 total time= 0.2s
[CV 1/3; 16/50] START C=0.0001, class_weight=None, gamma=1e-06, tol=0.0001......
[CV 3/3; 11/50] END C=100000.0, class_weight=None, gamma=1.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 14/50] START C=0.01, class_weight=None, gamma=0.1, tol=0.001...........
[CV 1/3; 14/50] END C=0.01, class_weight=None, gamma=0.1, tol=0.001;, score=0.100 total time= 0.2s
[CV 2/3; 12/50] END C=1000.0, class_weight=None, gamma=1000000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 1/3; 15/50] START C=0.1, class_weight=balanced, gamma=1000.0, tol=0.001.....
[CV 2/3; 14/50] START C=0.01, class_weight=None, gamma=0.1, tol=0.001...........
[CV 3/3; 12/50] END C=1000.0, class_weight=None, gamma=1000000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 2/3; 15/50] START C=0.1, class_weight=balanced, gamma=1000.0, tol=0.001.....
[CV 3/3; 16/50] END C=0.0001, class_weight=None, gamma=1e-06, tol=0.0001;, score=0.102 total time= 0.1s
[CV 1/3; 17/50] START C=0.01, class_weight=balanced, gamma=0.0001, tol=0.0001...
[CV 3/3; 15/50] END C=0.1, class_weight=balanced, gamma=1000.0, tol=0.001;, score=0.098 total time= 0.1s
[CV 3/3; 18/50] START C=10.0, class_weight=None, gamma=0.1, tol=0.0001..........
[CV 2/3; 16/50] END C=0.0001, class_weight=None, gamma=1e-06, tol=0.0001;, score=0.102 total time= 0.2s
[CV 2/3; 18/50] START C=10.0, class_weight=None, gamma=0.1, tol=0.0001..........
[CV 1/3; 15/50] END C=0.1, class_weight=balanced, gamma=1000.0, tol=0.001;, score=0.098 total time= 0.2s
[CV 1/3; 19/50] START C=0.01, class_weight=balanced, gamma=100.0, tol=0.1.......
[CV 3/3; 14/50] END C=0.01, class_weight=None, gamma=0.1, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 17/50] END C=0.01, class_weight=balanced, gamma=0.0001, tol=0.0001;, score=0.199 total time= 0.1s
[CV 3/3; 19/50] START C=0.01, class_weight=balanced, gamma=100.0, tol=0.1.......
[CV 3/3; 17/50] START C=0.01, class_weight=balanced, gamma=0.0001, tol=0.0001...
[CV 2/3; 15/50] END C=0.1, class_weight=balanced, gamma=1000.0, tol=0.001;, score=0.098 total time= 0.2s
[CV 2/3; 19/50] START C=0.01, class_weight=balanced, gamma=100.0, tol=0.1.......
[CV 2/3; 14/50] END C=0.01, class_weight=None, gamma=0.1, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 17/50] START C=0.01, class_weight=balanced, gamma=0.0001, tol=0.0001...
[CV 3/3; 18/50] END C=10.0, class_weight=None, gamma=0.1, tol=0.0001;, score=0.102 total time= 0.2s
[CV 1/3; 20/50] START C=0.001, class_weight=balanced, gamma=1e-06, tol=0.0001...
[CV 2/3; 18/50] END C=10.0, class_weight=None, gamma=0.1, tol=0.0001;, score=0.102 total time= 0.2s
[CV 2/3; 20/50] START C=0.001, class_weight=balanced, gamma=1e-06, tol=0.0001...
[CV 1/3; 16/50] END C=0.0001, class_weight=None, gamma=1e-06, tol=0.0001;, score=0.292 total time= 0.3s
[CV 1/3; 18/50] START C=10.0, class_weight=None, gamma=0.1, tol=0.0001..........
[CV 3/3; 19/50] END C=0.01, class_weight=balanced, gamma=100.0, tol=0.1;, score=0.098 total time= 0.1s
[CV 2/3; 21/50] START C=0.0001, class_weight=balanced, gamma=10.0, tol=0.001....
[CV 2/3; 19/50] END C=0.01, class_weight=balanced, gamma=100.0, tol=0.1;, score=0.098 total time= 0.1s
[CV 1/3; 19/50] END C=0.01, class_weight=balanced, gamma=100.0, tol=0.1;, score=0.098 total time= 0.2s
[CV 3/3; 20/50] START C=0.001, class_weight=balanced, gamma=1e-06, tol=0.0001...
[CV 1/3; 21/50] START C=0.0001, class_weight=balanced, gamma=10.0, tol=0.001....
[CV 3/3; 17/50] END C=0.01, class_weight=balanced, gamma=0.0001, tol=0.0001;, score=0.098 total time= 0.2s
[CV 2/3; 22/50] START C=1000000.0, class_weight=balanced, gamma=0.1, tol=0.1....
[CV 2/3; 17/50] END C=0.01, class_weight=balanced, gamma=0.0001, tol=0.0001;, score=0.098 total time= 0.2s
[CV 1/3; 20/50] END C=0.001, class_weight=balanced, gamma=1e-06, tol=0.0001;, score=0.197 total time= 0.2s
[CV 3/3; 21/50] START C=0.0001, class_weight=balanced, gamma=10.0, tol=0.001....
[CV 1/3; 22/50] START C=1000000.0, class_weight=balanced, gamma=0.1, tol=0.1....
[CV 2/3; 20/50] END C=0.001, class_weight=balanced, gamma=1e-06, tol=0.0001;, score=0.098 total time= 0.1s
[CV 1/3; 25/50] START C=10000.0, class_weight=None, gamma=1e-08, tol=0.001......
[CV 2/3; 21/50] END C=0.0001, class_weight=balanced, gamma=10.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 1/3; 24/50] START C=100.0, class_weight=None, gamma=1e-05, tol=0.1..........
[CV 1/3; 25/50] END C=10000.0, class_weight=None, gamma=1e-08, tol=0.001;, score=0.942 total time= 0.1s
[CV 1/3; 23/50] START C=100.0, class_weight=balanced, gamma=100000000.0, tol=0.01
[CV 3/3; 20/50] END C=0.001, class_weight=balanced, gamma=1e-06, tol=0.0001;, score=0.098 total time= 0.2s
[CV 2/3; 23/50] START C=100.0, class_weight=balanced, gamma=100000000.0, tol=0.01
[CV 1/3; 24/50] END C=100.0, class_weight=None, gamma=1e-05, tol=0.1;, score=0.937 total time= 0.0s
[CV 3/3; 25/50] START C=10000.0, class_weight=None, gamma=1e-08, tol=0.001......
[CV 1/3; 21/50] END C=0.0001, class_weight=balanced, gamma=10.0, tol=0.001;, score=0.098 total time= 0.2s
[CV 2/3; 22/50] END C=1000000.0, class_weight=balanced, gamma=0.1, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 23/50] START C=100.0, class_weight=balanced, gamma=100000000.0, tol=0.01
[CV 3/3; 24/50] START C=100.0, class_weight=None, gamma=1e-05, tol=0.1..........
[CV 3/3; 21/50] END C=0.0001, class_weight=balanced, gamma=10.0, tol=0.001;, score=0.100 total time= 0.1s
[CV 2/3; 24/50] START C=100.0, class_weight=None, gamma=1e-05, tol=0.1..........
[CV 1/3; 18/50] END C=10.0, class_weight=None, gamma=0.1, tol=0.0001;, score=0.102 total time= 0.3s
[CV 1/3; 22/50] END C=1000000.0, class_weight=balanced, gamma=0.1, tol=0.1;, score=0.102 total time= 0.2s
[CV 1/3; 26/50] START C=1000.0, class_weight=None, gamma=1e-08, tol=0.0001......
[CV 3/3; 22/50] START C=1000000.0, class_weight=balanced, gamma=0.1, tol=0.1....
[CV 3/3; 24/50] END C=100.0, class_weight=None, gamma=1e-05, tol=0.1;, score=0.942 total time= 0.0s
[CV 1/3; 27/50] START C=100000.0, class_weight=balanced, gamma=100000000.0, tol=0.001
[CV 2/3; 24/50] END C=100.0, class_weight=None, gamma=1e-05, tol=0.1;, score=0.967 total time= 0.0s
[CV 2/3; 28/50] START C=10.0, class_weight=None, gamma=1.0, tol=0.001...........
[CV 3/3; 25/50] END C=10000.0, class_weight=None, gamma=1e-08, tol=0.001;, score=0.927 total time= 0.1s
[CV 2/3; 26/50] START C=1000.0, class_weight=None, gamma=1e-08, tol=0.0001......
[CV 1/3; 23/50] END C=100.0, class_weight=balanced, gamma=100000000.0, tol=0.01;, score=0.100 total time= 0.1s
[CV 2/3; 25/50] START C=10000.0, class_weight=None, gamma=1e-08, tol=0.001......
[CV 2/3; 23/50] END C=100.0, class_weight=balanced, gamma=100000000.0, tol=0.01;, score=0.102 total time= 0.1s
[CV 1/3; 29/50] START C=0.001, class_weight=None, gamma=1e-08, tol=0.001........
[CV 3/3; 23/50] END C=100.0, class_weight=balanced, gamma=100000000.0, tol=0.01;, score=0.102 total time= 0.2s
[CV 3/3; 26/50] START C=1000.0, class_weight=None, gamma=1e-08, tol=0.0001......
[CV 1/3; 26/50] END C=1000.0, class_weight=None, gamma=1e-08, tol=0.0001;, score=0.890 total time= 0.2s
[CV 2/3; 27/50] START C=100000.0, class_weight=balanced, gamma=100000000.0, tol=0.001
[CV 2/3; 25/50] END C=10000.0, class_weight=None, gamma=1e-08, tol=0.001;, score=0.953 total time= 0.1s
[CV 3/3; 29/50] START C=0.001, class_weight=None, gamma=1e-08, tol=0.001........
[CV 2/3; 26/50] END C=1000.0, class_weight=None, gamma=1e-08, tol=0.0001;, score=0.885 total time= 0.1s
[CV 1/3; 28/50] START C=10.0, class_weight=None, gamma=1.0, tol=0.001...........
[CV 1/3; 27/50] END C=100000.0, class_weight=balanced, gamma=100000000.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 2/3; 29/50] START C=0.001, class_weight=None, gamma=1e-08, tol=0.001........
[CV 2/3; 28/50] END C=10.0, class_weight=None, gamma=1.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 3/3; 28/50] START C=10.0, class_weight=None, gamma=1.0, tol=0.001...........
[CV 3/3; 22/50] END C=1000000.0, class_weight=balanced, gamma=0.1, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 27/50] START C=100000.0, class_weight=balanced, gamma=100000000.0, tol=0.001
[CV 1/3; 29/50] END C=0.001, class_weight=None, gamma=1e-08, tol=0.001;, score=0.292 total time= 0.2s
[CV 2/3; 31/50] START C=100000.0, class_weight=None, gamma=10000000.0, tol=0.1..
[CV 3/3; 26/50] END C=1000.0, class_weight=None, gamma=1e-08, tol=0.0001;, score=0.865 total time= 0.2s
[CV 3/3; 30/50] START C=100000.0, class_weight=None, gamma=100.0, tol=0.001.....
[CV 2/3; 27/50] END C=100000.0, class_weight=balanced, gamma=100000000.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 31/50] START C=100000.0, class_weight=None, gamma=10000000.0, tol=0.1..
[CV 3/3; 29/50] END C=0.001, class_weight=None, gamma=1e-08, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 30/50] START C=100000.0, class_weight=None, gamma=100.0, tol=0.001.....
[CV 2/3; 29/50] END C=0.001, class_weight=None, gamma=1e-08, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 28/50] END C=10.0, class_weight=None, gamma=1.0, tol=0.001;, score=0.202 total time= 0.2s
[CV 1/3; 30/50] START C=100000.0, class_weight=None, gamma=100.0, tol=0.001.....
[CV 3/3; 31/50] START C=100000.0, class_weight=None, gamma=10000000.0, tol=0.1..
[CV 3/3; 28/50] END C=10.0, class_weight=None, gamma=1.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 32/50] START C=1.0, class_weight=balanced, gamma=1e-05, tol=0.0001.....
[CV 3/3; 27/50] END C=100000.0, class_weight=balanced, gamma=100000000.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 33/50] START C=0.01, class_weight=balanced, gamma=10.0, tol=0.0001.....
[CV 2/3; 31/50] END C=100000.0, class_weight=None, gamma=10000000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 3/3; 32/50] START C=1.0, class_weight=balanced, gamma=1e-05, tol=0.0001.....
[CV 3/3; 30/50] END C=100000.0, class_weight=None, gamma=100.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 32/50] START C=1.0, class_weight=balanced, gamma=1e-05, tol=0.0001.....
[CV 1/3; 31/50] END C=100000.0, class_weight=None, gamma=10000000.0, tol=0.1;, score=0.100 total time= 0.2s
[CV 1/3; 34/50] START C=1e-06, class_weight=balanced, gamma=1000.0, tol=0.01....
[CV 1/3; 32/50] END C=1.0, class_weight=balanced, gamma=1e-05, tol=0.0001;, score=0.895 total time= 0.1s
[CV 2/3; 30/50] END C=100000.0, class_weight=None, gamma=100.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 35/50] START C=1000000.0, class_weight=None, gamma=1e-08, tol=0.1......
[CV 3/3; 33/50] START C=0.01, class_weight=balanced, gamma=10.0, tol=0.0001.....
[CV 3/3; 31/50] END C=100000.0, class_weight=None, gamma=10000000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 2/3; 33/50] START C=0.01, class_weight=balanced, gamma=10.0, tol=0.0001.....
[CV 1/3; 35/50] END C=1000000.0, class_weight=None, gamma=1e-08, tol=0.1;, score=0.938 total time= 0.0s
[CV 3/3; 36/50] START C=100.0, class_weight=None, gamma=0.0001, tol=0.01........
[CV 1/3; 30/50] END C=100000.0, class_weight=None, gamma=100.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 2/3; 35/50] START C=1000000.0, class_weight=None, gamma=1e-08, tol=0.1......
[CV 2/3; 35/50] END C=1000000.0, class_weight=None, gamma=1e-08, tol=0.1;, score=0.957 total time= 0.0s
[CV 2/3; 34/50] START C=1e-06, class_weight=balanced, gamma=1000.0, tol=0.01....
[CV 3/3; 36/50] END C=100.0, class_weight=None, gamma=0.0001, tol=0.01;, score=0.952 total time= 0.0s
[CV 3/3; 37/50] START C=1000000.0, class_weight=balanced, gamma=1e-07, tol=0.1..
[CV 3/3; 32/50] END C=1.0, class_weight=balanced, gamma=1e-05, tol=0.0001;, score=0.873 total time= 0.2s
[CV 1/3; 38/50] START C=0.1, class_weight=balanced, gamma=10.0, tol=0.1.........
[CV 3/3; 37/50] END C=1000000.0, class_weight=balanced, gamma=1e-07, tol=0.1;, score=0.937 total time= 0.0s
[CV 1/3; 33/50] END C=0.01, class_weight=balanced, gamma=10.0, tol=0.0001;, score=0.098 total time= 0.2s
[CV 2/3; 36/50] START C=100.0, class_weight=None, gamma=0.0001, tol=0.01........
[CV 3/3; 34/50] START C=1e-06, class_weight=balanced, gamma=1000.0, tol=0.01....
[CV 2/3; 36/50] END C=100.0, class_weight=None, gamma=0.0001, tol=0.01;, score=0.963 total time= 0.0s
[CV 2/3; 39/50] START C=10000.0, class_weight=None, gamma=10000000.0, tol=0.001.
[CV 1/3; 34/50] END C=1e-06, class_weight=balanced, gamma=1000.0, tol=0.01;, score=0.098 total time= 0.2s
[CV 1/3; 37/50] START C=1000000.0, class_weight=balanced, gamma=1e-07, tol=0.1..
[CV 1/3; 37/50] END C=1000000.0, class_weight=balanced, gamma=1e-07, tol=0.1;, score=0.938 total time= 0.0s
[CV 1/3; 36/50] START C=100.0, class_weight=None, gamma=0.0001, tol=0.01........
[CV 2/3; 32/50] END C=1.0, class_weight=balanced, gamma=1e-05, tol=0.0001;, score=0.885 total time= 0.2s
[CV 3/3; 35/50] START C=1000000.0, class_weight=None, gamma=1e-08, tol=0.1......
[CV 2/3; 33/50] END C=0.01, class_weight=balanced, gamma=10.0, tol=0.0001;, score=0.098 total time= 0.2s
[CV 3/3; 33/50] END C=0.01, class_weight=balanced, gamma=10.0, tol=0.0001;, score=0.098 total time= 0.2s
[CV 3/3; 39/50] START C=10000.0, class_weight=None, gamma=10000000.0, tol=0.001.
[CV 1/3; 40/50] START C=100.0, class_weight=balanced, gamma=0.0001, tol=0.001...
[CV 3/3; 35/50] END C=1000000.0, class_weight=None, gamma=1e-08, tol=0.1;, score=0.937 total time= 0.0s
[CV 1/3; 39/50] START C=10000.0, class_weight=None, gamma=10000000.0, tol=0.001.
[CV 1/3; 36/50] END C=100.0, class_weight=None, gamma=0.0001, tol=0.01;, score=0.952 total time= 0.0s
[CV 3/3; 40/50] START C=100.0, class_weight=balanced, gamma=0.0001, tol=0.001...
[CV 2/3; 34/50] END C=1e-06, class_weight=balanced, gamma=1000.0, tol=0.01;, score=0.098 total time= 0.2s
[CV 2/3; 37/50] START C=1000000.0, class_weight=balanced, gamma=1e-07, tol=0.1..
[CV 1/3; 40/50] END C=100.0, class_weight=balanced, gamma=0.0001, tol=0.001;, score=0.952 total time= 0.0s
[CV 3/3; 41/50] START C=0.01, class_weight=None, gamma=0.001, tol=0.001.........
[CV 1/3; 38/50] END C=0.1, class_weight=balanced, gamma=10.0, tol=0.1;, score=0.098 total time= 0.2s
[CV 3/3; 38/50] START C=0.1, class_weight=balanced, gamma=10.0, tol=0.1.........
[CV 3/3; 34/50] END C=1e-06, class_weight=balanced, gamma=1000.0, tol=0.01;, score=0.098 total time= 0.2s
[CV 2/3; 41/50] START C=0.01, class_weight=None, gamma=0.001, tol=0.001.........
[CV 3/3; 40/50] END C=100.0, class_weight=balanced, gamma=0.0001, tol=0.001;, score=0.952 total time= 0.0s
[CV 3/3; 42/50] START C=1.0, class_weight=None, gamma=1e-06, tol=0.001..........
[CV 2/3; 37/50] END C=1000000.0, class_weight=balanced, gamma=1e-07, tol=0.1;, score=0.957 total time= 0.0s
[CV 1/3; 43/50] START C=0.001, class_weight=balanced, gamma=0.0001, tol=0.01....
[CV 2/3; 39/50] END C=10000.0, class_weight=None, gamma=10000000.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 38/50] START C=0.1, class_weight=balanced, gamma=10.0, tol=0.1.........
[CV 1/3; 39/50] END C=10000.0, class_weight=None, gamma=10000000.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 3/3; 43/50] START C=0.001, class_weight=balanced, gamma=0.0001, tol=0.01....
[CV 3/3; 39/50] END C=10000.0, class_weight=None, gamma=10000000.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 44/50] START C=100000.0, class_weight=None, gamma=100000.0, tol=0.1....
[CV 3/3; 41/50] END C=0.01, class_weight=None, gamma=0.001, tol=0.001;, score=0.104 total time= 0.2s
[CV 1/3; 44/50] START C=100000.0, class_weight=None, gamma=100000.0, tol=0.1....
[CV 3/3; 38/50] END C=0.1, class_weight=balanced, gamma=10.0, tol=0.1;, score=0.098 total time= 0.2s
[CV 2/3; 43/50] START C=0.001, class_weight=balanced, gamma=0.0001, tol=0.01....
[CV 1/3; 43/50] END C=0.001, class_weight=balanced, gamma=0.0001, tol=0.01;, score=0.199 total time= 0.2s
[CV 1/3; 42/50] START C=1.0, class_weight=None, gamma=1e-06, tol=0.001..........
[CV 2/3; 41/50] END C=0.01, class_weight=None, gamma=0.001, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 40/50] START C=100.0, class_weight=balanced, gamma=0.0001, tol=0.001...
[CV 3/3; 42/50] END C=1.0, class_weight=None, gamma=1e-06, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 42/50] START C=1.0, class_weight=None, gamma=1e-06, tol=0.001..........
[CV 2/3; 38/50] END C=0.1, class_weight=balanced, gamma=10.0, tol=0.1;, score=0.098 total time= 0.2s
[CV 1/3; 41/50] START C=0.01, class_weight=None, gamma=0.001, tol=0.001.........
[CV 2/3; 40/50] END C=100.0, class_weight=balanced, gamma=0.0001, tol=0.001;, score=0.963 total time= 0.1s
[CV 2/3; 46/50] START C=1000.0, class_weight=balanced, gamma=1.0, tol=0.0001....
[CV 3/3; 43/50] END C=0.001, class_weight=balanced, gamma=0.0001, tol=0.01;, score=0.098 total time= 0.2s
[CV 3/3; 44/50] START C=100000.0, class_weight=None, gamma=100000.0, tol=0.1....
[CV 2/3; 44/50] END C=100000.0, class_weight=None, gamma=100000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 1/3; 47/50] START C=10000.0, class_weight=None, gamma=0.001, tol=0.0001.....
[CV 1/3; 44/50] END C=100000.0, class_weight=None, gamma=100000.0, tol=0.1;, score=0.100 total time= 0.2s
[CV 3/3; 46/50] START C=1000.0, class_weight=balanced, gamma=1.0, tol=0.0001....
[CV 2/3; 42/50] END C=1.0, class_weight=None, gamma=1e-06, tol=0.001;, score=0.102 total time= 0.2s
[CV 1/3; 42/50] END C=1.0, class_weight=None, gamma=1e-06, tol=0.001;, score=0.292 total time= 0.2s
[CV 3/3; 45/50] START C=1.0, class_weight=None, gamma=1000000.0, tol=0.01.......
[CV 1/3; 46/50] START C=1000.0, class_weight=balanced, gamma=1.0, tol=0.0001....
[CV 2/3; 43/50] END C=0.001, class_weight=balanced, gamma=0.0001, tol=0.01;, score=0.098 total time= 0.2s
[CV 2/3; 45/50] START C=1.0, class_weight=None, gamma=1000000.0, tol=0.01.......
[CV 1/3; 41/50] END C=0.01, class_weight=None, gamma=0.001, tol=0.001;, score=0.299 total time= 0.2s[CV 1/3; 47/50] END C=10000.0, class_weight=None, gamma=0.001, tol=0.0001;, score=0.972 total time= 0.1s
[CV 1/3; 45/50] START C=1.0, class_weight=None, gamma=1000000.0, tol=0.01.......
[CV 1/3; 48/50] START C=1.0, class_weight=None, gamma=10000.0, tol=0.1..........
[CV 2/3; 46/50] END C=1000.0, class_weight=balanced, gamma=1.0, tol=0.0001;, score=0.102 total time= 0.2s
[CV 3/3; 47/50] START C=10000.0, class_weight=None, gamma=0.001, tol=0.0001.....
[CV 3/3; 44/50] END C=100000.0, class_weight=None, gamma=100000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 2/3; 47/50] START C=10000.0, class_weight=None, gamma=0.001, tol=0.0001.....
[CV 3/3; 46/50] END C=1000.0, class_weight=balanced, gamma=1.0, tol=0.0001;, score=0.102 total time= 0.2s
[CV 1/3; 49/50] START C=100.0, class_weight=balanced, gamma=1e-08, tol=0.0001...
[CV 3/3; 47/50] END C=10000.0, class_weight=None, gamma=0.001, tol=0.0001;, score=0.975 total time= 0.1s
[CV 3/3; 50/50] START C=1000000.0, class_weight=None, gamma=100.0, tol=0.001....
[CV 2/3; 47/50] END C=10000.0, class_weight=None, gamma=0.001, tol=0.0001;, score=0.982 total time= 0.1s
[CV 2/3; 45/50] END C=1.0, class_weight=None, gamma=1000000.0, tol=0.01;, score=0.102 total time= 0.2s
[CV 3/3; 49/50] START C=100.0, class_weight=balanced, gamma=1e-08, tol=0.0001...
[CV 2/3; 50/50] START C=1000000.0, class_weight=None, gamma=100.0, tol=0.001....
[CV 3/3; 45/50] END C=1.0, class_weight=None, gamma=1000000.0, tol=0.01;, score=0.102 total time= 0.2s
[CV 2/3; 48/50] START C=1.0, class_weight=None, gamma=10000.0, tol=0.1..........
[CV 1/3; 48/50] END C=1.0, class_weight=None, gamma=10000.0, tol=0.1;, score=0.100 total time= 0.2s
[CV 1/3; 45/50] END C=1.0, class_weight=None, gamma=1000000.0, tol=0.01;, score=0.100 total time= 0.2s
[CV 1/3; 50/50] START C=1000000.0, class_weight=None, gamma=100.0, tol=0.001....
[CV 2/3; 49/50] START C=100.0, class_weight=balanced, gamma=1e-08, tol=0.0001...
[CV 1/3; 46/50] END C=1000.0, class_weight=balanced, gamma=1.0, tol=0.0001;, score=0.202 total time= 0.2s
[CV 3/3; 48/50] START C=1.0, class_weight=None, gamma=10000.0, tol=0.1..........
[CV 1/3; 49/50] END C=100.0, class_weight=balanced, gamma=1e-08, tol=0.0001;, score=0.472 total time= 0.2s
[CV 3/3; 50/50] END C=1000000.0, class_weight=None, gamma=100.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 2/3; 48/50] END C=1.0, class_weight=None, gamma=10000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 2/3; 50/50] END C=1000000.0, class_weight=None, gamma=100.0, tol=0.001;, score=0.102 total time= 0.2s
[CV 3/3; 49/50] END C=100.0, class_weight=balanced, gamma=1e-08, tol=0.0001;, score=0.182 total time= 0.2s
[CV 1/3; 50/50] END C=1000000.0, class_weight=None, gamma=100.0, tol=0.001;, score=0.100 total time= 0.2s
[CV 3/3; 48/50] END C=1.0, class_weight=None, gamma=10000.0, tol=0.1;, score=0.102 total time= 0.2s
[CV 2/3; 49/50] END C=100.0, class_weight=balanced, gamma=1e-08, tol=0.0001;, score=0.197 total time= 0.2s
Best parameters found: {'tol': 0.0001, 'gamma': 0.001, 'class_weight': None, 'C': 10000.0}
Best score: 0.9760712298274902
Best estimator: SVC(C=10000.0, gamma=0.001, tol=0.0001)
Time taken: 4.22 seconds
dask_ml
Incremental
class, which can train models on chunks of datadask_ml.model_selection.IncrementalSearchCV()
partial_fit
method. More information hereX
and y
will take up about 16 GB of memorymake_classification
function from dask_ml.datasets
to create the datasetimport time
from dask_ml.datasets import make_classification
X, y = make_classification(n_samples=100000000, n_features=20,
chunks=100000, random_state=0)
# Create the model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(tol=1e-3, penalty='elasticnet', random_state=0)
# Parameters we want to search through
params = {'alpha': np.logspace(-2, 1, num=1000),
'l1_ratio': np.linspace(0, 1, num=1000),
'average': [True, False]}
# Perform the search
from dask_ml.model_selection import IncrementalSearchCV
search = IncrementalSearchCV(model, params, random_state=0)
start_time = time.time()
search.fit(X, y, classes=[0, 1])
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
# Print the best parameters, best score, and the time taken
print("Best parameters found: ", search.best_params_)
print("Best score: ", search.best_score_)
print("Best estimator: ", search.best_estimator_)
print(f"Time taken: {elapsed_time:.2f} seconds")
HyperbandSearchCV
, which is a hyperparameter search algorithm that is based on the Hyperband algorithmfrom dask_ml.model_selection import HyperbandSearchCV
from dask_ml.datasets import make_classification
from sklearn.linear_model import SGDClassifier
X, y = make_classification(chunks=20)
est = SGDClassifier(tol=1e-3)
param_dist = {'alpha': np.logspace(-4, 0, num=1000),
'loss': ['hinge', 'log_loss', 'modified_huber', 'squared_hinge'],
'average': [True, False]}
start_time = time.time()
search = HyperbandSearchCV(est, param_dist)
search.fit(X, y, classes=np.unique(y))
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
print("Best parameters found: ", search.best_params_)
print("Best score: ", search.best_score_)
print("Best estimator: ", search.best_estimator_)
print(f"Time taken: {elapsed_time:.2f} seconds")
HyperbandSearchCV
class to search for the best hyperparametersvenv-dask
environment, but if you don’t, you can install it with:load_digits
datasetimport time
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Load the digits dataset
digits = load_digits()
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)
# Create the TPOTClassifier object
start_time = time.time()
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2, random_state=42)
# Fit the model
tpot.fit(X_train, y_train)
end_time = time.time()
elapsed_time = end_time - start_time
# Print the score
print(tpot.score(X_test, y_test))
print(f"Time taken: {elapsed_time:.2f} seconds")
Generation 1 - Current best internal CV score: 0.9821836706595072
Generation 2 - Current best internal CV score: 0.9821836706595072
Generation 3 - Current best internal CV score: 0.9821836706595072
Generation 4 - Current best internal CV score: 0.9821836706595072
Generation 5 - Current best internal CV score: 0.9829299187663499
Best pipeline: KNeighborsClassifier(Normalizer(input_matrix, norm=l1), n_neighbors=2, p=2, weights=distance)
0.9911111111111112
Time taken: 240.45 seconds
use_dask=True
to the TPOTClassifier
object and you are good to go 😊import time
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Load the digits dataset
digits = load_digits()
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)
# Create the TPOTClassifier object
start_time = time.time()
tpot = TPOTClassifier(generations=5, population_size=20,
verbosity=2, random_state=42, use_dask=True)
# Fit the model
tpot.fit(X_train, y_train)
end_time = time.time()
elapsed_time = end_time - start_time
# Print the score
print(tpot.score(X_test, y_test))
print(f"Time taken: {elapsed_time:.2f} seconds")
Prophet
libraryProphet
is a forecasting tool that is open source and maintained by Facebookpystan
, which is a Python interface to Stan, a probabilistic programming languagevenv-dask
environment, you should already have Prophet
installedProphet
library:
prophet.diagnostics.cross_validation
function method, which uses simulated historical forecasts to provide some idea of a model’s quality