Study Notes
Hyperparameteris a parameter whose value is used to control the learning process. Set HOW the model is trained.
Choosing optimal hyperparameter values for model training is difficult. It must be tuned via Machine Learning mechanisms.
Hyperparameter tuning is accomplished by training the multiple models, using the same algorithm and training data but different hyperparameter values.
Search space= Set of hyperparameter values tried during training experiment.
Hyperparameter types
- Discrete
Hyperparameter values are selected from a particular set of posibilities.
Ex:
Python list
choice([10,20,30])
choice(range(1,100))
Discrete distributions
qnormal
quniform
qlognormal
qloguniform - Continuus
Can take any value along a scale.
Countinous distributions
nornal
uniform
lognormal
loguniform
- Discrete hyperparameter (select discrete values from continues distributions)
- qNormal distribution
- qUniformdistribution
- qLognormal distribution
- qLogUniform distribution
- Continuous hyperparameters
- Normal distribution
- Uniform distribution
- Lognormal distribution
- LogUniform distribution
Normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable
Uniform distribution
Continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds.
Lognormal distribution
Continuous probability distribution that models right-skewed data.
The lognormal distribution is related to logs and the normal distribution.
LogUniform distribution
Continuous probability distribution. It is characterised by its probability density function, within the support of the distribution, being proportional to the reciprocal of the variable.
Defining a search space
Create a dictionary with the appropriate parameter expression for each named hyperparameter.
For example, the following search space indicates that the batch_size hyperparameter can have the value 16, 32, or 64, and the learning_rate hyperparameter can have any value from a normal distribution with a mean of 10 and a standard deviation of 3.
from azureml.train.hyperdrive import choice, normal
param_space = {
'--batch_size': choice(16, 32, 64),
'--learning_rate': normal(10, 3)
}
param_space = {
'--batch_size': choice(16, 32, 64),
'--learning_rate': normal(10, 3)
}
Sampling types
Sampling - how hyperparameters values are selected.
- Grid sampling
Only for discrete hyperparameters
Try every possible combination of parameters in the search space.
from azureml.train.hyperdrive import GridParameterSampling, choice
param_space = {
'--batch_size': choice(16, 32, 64),
'--learning_rate': choice(0.01, 0.1, 1.0)
}
param_sampling = GridParameterSampling(param_space) - Random sampling
Randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values
from azureml.train.hyperdrive import RandomParameterSampling, choice, normal
param_space = {
'--batch_size': choice(16, 32, 64),
'--learning_rate': normal(10, 3)
}
param_sampling = RandomParameterSampling(param_space) - Bayesian sampling
Chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection.
from azureml.train.hyperdrive import BayesianParameterSampling, choice, uniform
param_space = {
'--batch_size': choice(16, 32, 64),
'--learning_rate': uniform(0.05, 0.1)
}
param_sampling = BayesianParameterSampling(param_space)
Particularly useful for deep learning scenarios where adeep neural network (DNN) is trained iteratively over a number ofepochs
To help prevent wasting time, you can set anearly termination policy that abandons runs that are unlikely to produce a better result than previously completed runs.
The policy is evaluated at an evaluation_interval you specify, based on each time the target performance metric is logged.
You can also set a delay_evaluation parameter to avoid evaluating the policy until a minimum number of iterations have been completed.
Bandit policy
Stop a run if the target performance metric underperforms the best run so far by a specified margin.
from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_amount= 0.2,
evaluation_interval=1,
delay_evaluation=5)
This example applies the policy for every iteration after the first five, and abandons runs where the reported target metric is 0.2 or more worse than the best performing run after the same number of intervals.early_termination_policy = BanditPolicy(slack_amount= 0.2,
evaluation_interval=1,
delay_evaluation=5)
Median stopping policy
Abandons runs where the target performance metric is worse than the median of the running averages for all runs.
from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)
Truncation selection policy
Cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.
from azureml.train.hyperdrive import TruncationSelectionPolicy
early_termination_policy = TruncationSelectionPolicy(truncation_percentage=10,
evaluation_interval=1,
delay_evaluation=5)
Running a hyperparameter tuning experiment
Training script must:
- Have an argument for each hyperparameter you want to vary.
- Log the target performance metric
- using a --regularization argument to set the regularization rate hyperparameter
- andlogs the accuracymetric with the name Accuracy
import argparse
..
# Get regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01)
args = parser.parse_args()
reg = args.reg_rate
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01)
args = parser.parse_args()
reg = args.reg_rate
..
..
# calculate and log accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
run.log('Accuracy', np.float(acc))
...
run.complete()
To prepare the hyperdrive experiment, you must use a HyperDriveConfigobject to configure the experiment run
from azureml.core import Experiment
from azureml.train.hyperdrive import HyperDriveConfig, PrimaryMetricGoal
# Assumes ws, script_config and param_sampling are already defined
hyperdrive = HyperDriveConfig(run_config=script_config,
hyperparameter_sampling=param_sampling,
policy=None,
primary_metric_name='Accuracy',
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=6,
max_concurrent_runs=4)
experiment = Experiment(workspace = ws, name = 'hyperdrive_training')
hyperdrive_run = experiment.submit(config=hyperdrive)
The experiment will initiate a child run for each hyperparameter combination to be tried, and you can retrieve the logged metrics these runs
for child_run in run.get_children():
print(child_run.id, child_run.get_metrics())
# list all runs in descending order of performance
for child_run in hyperdrive_run.get_children_sorted_by_primary_metric():
print(child_run)
# retrieve the best performing run
best_run = hyperdrive_run.get_best_run_by_primary_metric()
References:
Tune hyperparameters with Azure Machine Learning - Training | Microsoft Learn
Lognormal Distribution: Uses, Parameters & Examples - Statistics By Jim
Normal Distribution | Examples, Formulas, & Uses (scribbr.com)