Batch Shipyard and Pool Autoscaling
The focus of this article is to describe the Azure Batch Pool autoscaling feature and how Batch Shipyard exposes the functionality.
Azure Batch Pool Autoscale
Azure Batch has the ability to automatically scale your pool up or down depending upon criteria that you specify. Remember, that you are only charged in Azure Batch for the compute resources that are used (e.g., VM hours, disk and network egress), thus by automatically scaling your resources on-demand, you can use Azure resources more efficiently and potentially reduce your costs.
Azure Batch exposes metrics that can be queried as historical sample data. These metrics include usage information such as CPU, memory and disk as well as node counts and task counts. These metrics can then be used to determine what the proper target node counts should be. The logic for what the target node counts should be is expressed as an autoscale formula.
Azure Batch periodically evaluates an autoscale formula as specified by the evaluation interval timespan which can be as frequent as every 5 minutes. If the evaluated target node counts are different after evaluation, the pool automatically resizes to these target node counts without any intervention needed by the user.
For more information about Azure Batch Autoscale, please visit this document.
Batch Shipyard and Pool Autoscale
Batch Shipyard exposes pool autoscale functionality in the pool configuration
file under the property
autoscale. There are two approaches to specifying
how to apply autoscale to a compute pool. Scenario-based autoscaling is
a simple way to specify autoscale for a pool without needing to be an expert
in creating an autoscale formula. There are named autoscale scenarios that
you can select from. Formula-based autoscaling is for users that want
to specify their own custom autoscale formula.
Scenario-based autoscaling allows you to pick from a set of common autoscale scenarios and then Batch Shipyard automatically applies the appropriate transformations to the formula to apply to your pool. These scenarios are:
active_taskswill autoscale the pool using metrics for the number of active tasks for the pool. Tasks categorized under this metric are tasks in active state with satisifed dependencies, in other words, "tasks with satisified dependencies awaiting node assignment".
pending_taskswill autoscale the pool using metrics for the number of pending tasks for the pool. Tasks categorized under this metric are tasks in active state with satisifed dependencies and running tasks, in other words, "tasks pending completion".
workdaywill autoscale the pool according to the workdays specified.
workday_with_offpeak_max_low_prioritywill autoscale the pool according to workdays and for off work time, use maximum number of low priority nodes.
weekdaywill autoscale the pool if it is a weekday.
weekendwill autoscale the pool if it is a weekend.
You can specify a scenario-based autoscale on a pool by populating the
name with one of the scenarios above.
You will also need to specify a
property which can contain both
low_priority counts to
ensure that the formulas cannot evaluate to a target node count higher
than some threshold that you specify. Specifying a negative value for
the count will effectively set the maximum to no limit. Note that the
vm_count specified at the
pool_specification level are automatically
inferred as minimum VM counts. These counts can be set to 0 to allow the
pool to resize down to zero nodes.
Additionally, there are options that can modify and fine-tune these scenarios as needed:
maximum_vm_increment_per_evaluationsets limits on the maximum amount of dedicated or low priority VMs to increase after an evaluation.
node_deallocation_optionwhich specifies when a node is targeted for deallocation but has a running task, what should be the action applied to the task:
retaineddata. Please see this doc for more information about these options. This option applies to all scenarios.
sample_lookback_intervalis the time interval to lookback for past history. This option applies only to
required_sample_percentageis the required percentage of samples that must be present during the
sample_lookback_interval. This option applies only to
bias_last_samplewill bias the autoscale scenario to use the last sample during history computation and metric weighting. This can be enabled to more quickly respond to changes in history with respect to averages. This option applies only to
bias_node_typewill bias the the autoscale scenario to favor one type of node over the other when making a decision on how many of each node to allocate. By default, allocation is equal-weighted but can be selected to favor either
low_priority. This applies to all scenarios.
rebalance_preemption_percentagewill rebalance the compute nodes to bias for dedicated nodes when the pre-empted node count reaches the indicated threshold percentage of the total current dedicated and low priority nodes. This applies only to
time_rangesallows specification of which days of the week should be considered weekdays and which hours should be considered as part of working hours. These options only apply to the day-of-the-week based scenarios.
An example autoscale specification in the pool configuration may be:
autoscale: evaluation_interval: 00:10:00 scenario: name: active_tasks maximum_vm_count: dedicated: 16 low_priority: 8
This example would apply the
active_tasks scenario to the associated
pool with an evaluation interval of every 10 minutes. This means that the
autoscale formula is evaluated by the service and can have updates applied
every 10 minutes. Note that having a small evaluation interval may result
in undesirable behavior of the pool being resized constantly. This can result
in pool stability issues including resize failures if the prior resize is
still ongoing when the autoscale evaluation happens again and results in
a different target node count. For this example, the
also includes a
maximum_vm_count to ensure that the autoscale formula
does not result in target node counts that exceed 16 dedicated and 8 low
Formula-based autoscaling allows users with expertise in creating autoscale
formulas to create their own formula and apply it to a Batch Shipyard pool.
These formulas should be specified on the
formula member within the
For more information about how to create your own custom autoscale formula, please visit this document.