Batch Shipyard Pool Configuration
This page contains in-depth details on how to configure the pool configuration file for Batch Shipyard.
Schema
The pool schema is as follows:
pool_specification: id: batch-shipyard-pool vm_configuration: platform_image: publisher: Canonical offer: UbuntuServer sku: 16.04-LTS version: latest native: false custom_image: arm_image_id: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/images/<image_name> node_agent: <node agent sku id> native: false vm_size: STANDARD_D2_V2 vm_count: dedicated: 4 low_priority: 8 max_tasks_per_node: 1 resize_timeout: 00:20:00 node_fill_type: pack autoscale: evaluation_interval: 00:05:00 scenario: name: active_tasks maximum_vm_count: dedicated: 16 low_priority: 8 node_deallocation_option: taskcompletion sample_lookback_interval: 00:10:00 required_sample_percentage: 70 bias_last_sample: true bias_node_type: low_priority rebalance_preemption_percentage: 50 formula: '' inter_node_communication_enabled: true reboot_on_start_task_failed: false attempt_recovery_on_unusable: false block_until_all_global_resources_loaded: true transfer_files_on_pool_creation: false input_data: azure_batch: - destination: $AZ_BATCH_NODE_SHARED_DIR/jobonanotherpool exclude: - '*.txt' include: - wd/*.dat job_id: jobonanotherpool task_id: mytask azure_storage: - storage_account_settings: mystorageaccount remote_path: poolcontainer/dir local_path: $AZ_BATCH_NODE_SHARED_DIR/pooldata is_file_share: false exclude: - '*.tmp' include: - pooldata*.bin blobxfer_extra_options: null resource_files: - blob_source: https://some.url file_mode: '0750' file_path: path/in/wd/file.bin ssh: username: shipyard expiry_days: 30 ssh_public_key: /path/to/rsa/publickey.pub ssh_public_key_data: ssh-rsa ... ssh_private_key: /path/to/rsa/privatekey generate_docker_tunnel_script: true generated_file_export_path: hpn_server_swap: false rdp: username: shipyard password: null expiry_days: 30 virtual_network: arm_subnet_id: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Network/virtualNetworks/<virtual_network_name>/subnets/<subnet_name> name: myvnet resource_group: vnet-in-another-rg create_nonexistant: false address_space: 10.0.0.0/16 subnet: name: subnet-for-batch-vms address_prefix: 10.0.0.0/20 gpu: nvidia_driver: source: https://some.url additional_node_prep_commands: []
The pool_specification
property has the following members:
- (required)
id
is the compute pool ID. - (required)
vm_configuration
specifies the image configuration for the VM. Eitherplatform_image
orcustom_image
must be specified. You cannot specify both. If using a custom image, please see the Custom Image Guide first.- (required for platform image)
platform_image
defines the Marketplace platform image to use:- (required for platform image)
publisher
is the publisher name of the Marketplace VM image. - (required for platform image)
offer
is the offer name of the Marketplace VM image. - (required for platform image)
sku
is the sku name of the Marketplace VM image. - (optional)
version
is the image version to use. The default islatest
. - (optional)
native
will convert the platform image to use native Docker container support for Azure Batch, if possible. This can potentially lead to faster compute node provisioning and better task management (such as job and task termination while tasks are running), in exchange for some other features that are not available in this mode such as task-level data ingress or task-level data egress that is not bound for Azure Storage Blobs. If there is nonative
conversion equivalent for the specifiedpublisher
,offer
,sku
then no conversion is performed. Note thatnative
mode is not compatible with Singularity conatiners. The default isfalse
.
- (required for platform image)
- (required for custom image)
custom_image
defines the custom image to use. AADbatch
credentials are required to use custom iamges for both Batch service and User Subscription modes.- (required for custom image)
arm_image_id
defines the ARM image id to use as the OS image for the pool. The ARM image must be in the same subscription and region as the Batch account. - (required for custom image)
node_agent
is the node agent sku id to use with this custom image. You can view supported base images and their node agent sku ids with thepool listskus
command. - (optional)
native
will opt to use native Docker container support if possible. This provides better task management (such as job and task termination while tasks are running), in exchange for some other features that are not available in this mode such as task-level data ingress or task-level data egress that is not bound for Azure Storage Blobs. The default isfalse
.
- (required for custom image)
- (required for platform image)
- (required)
vm_size
is the Azure Virtual Machine Instance Size. Please note that not all regions have every VM size available. - (required)
vm_count
is the number of compute nodes to allocate. You may specify a mixed number of compute nodes in the following properties:- (optional)
dedicated
is the number of dedicated compute nodes to allocate. These nodes cannot be pre-empted. The default value is0
. - (optional)
low_priority
is the number of low-priority compute nodes to allocate. These nodes may be pre-empted at any time. Workloads that are amenable tolow_priority
nodes are those that do not have strict deadlines for pickup and completion. Optimally, these types of jobs would checkpoint their progress and be able to recover when re-scheduled. The default value is0
.
- (optional)
- (optional)
resize_timeout
is the amount of time allowed for resize operations (note that creating a pool resizes from 0 to the specified number of nodes). The format for this property is a timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional, if specified. If not specified, the default is 15 minutes. This should not be specified (and is ignored) forautoscale
enabled pools. - (optional)
max_tasks_per_node
is the maximum number of concurrent tasks that can be running at any one time on a compute node. This defaults to a value of 1 if not specified. The maximum value for the property that Azure Batch will accept is4 x <# cores per compute node>
. For instance, for aSTANDARD_F2
instance, because the virtual machine has 2 cores, the maximum allowable value for this property would be8
. - (optional)
node_fill_type
is the task scheduling compute node fill type policy to apply.pack
, which is the default, attempts to pack the maximum number of tasks on a node (controlled throughmax_tasks_per_node
before scheduling tasks to another node).spread
will schedule tasks evenly across compute nodes before packing. - (optional)
autoscale
designates the autoscale settings for the pool. If specified, thevm_count
becomes the minimum number of virtual machines for each node type forscenario
based autoscale.- (optional)
evaluation_interval
is the time interval between autoscale evaluations performed by the service. The format for this property is a timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional, if specified. If not specified, the default is 15 minutes. The smallest value that can be specified is 5 minutes. - (optional)
scenario
is a pre-set autoscale scenario where a formula will be generated with the parameters specified within this property.- (required)
name
is the autoscale scenario name to apply. Valid values areactive_tasks
,pending_tasks
,workday
,workday_with_offpeak_max_low_priority
,weekday
,weekend
. Please see the autoscale guide for more information about these scenarios. - (required)
maximum_vm_count
is the maximum number of compute nodes that can be allocated from an autoscale evaluation. It is useful to have these limits in place as to control the top-end scale of the autoscale scenario. Specifying a negative value for either of the following properties will result in effectively no maximum limit.- (optional)
dedicated
is the maximum number of dedicated compute nodes that can be allocated. - (optional)
low_priority
is the maximum number of low priority compute nodes that can be allocated.
- (optional)
- (optional)
node_deallocation_option
is the node deallocation option to apply. When a pool is resized down and a node is selected for removal, what action is performed for the running task is specified with this option. The valid values are:requeue
,terminate
,taskcompletion
, andretaineddata
. The default istaskcompletion
. Please see this doc for more information. - (optional)
sample_lookback_interval
is the time interval to lookback for past history for certain scenarios such as autoscale based on active and pending tasks. The format for this property is a timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional, if specified. If not specified, the default is 10 minutes. - (optional)
required_sample_percentage
is the required percentage of samples that must be present during thesample_lookback_interval
. If not specified, the default is 70. - (optional)
bias_last_sample
will bias the autoscale scenario, if applicable, to use the last sample during history computation. This can be enabled to more quickly respond to changes in history with respect to averages. The default istrue
. - (optional)
bias_node_type
will bias the the autoscale scenario, if applicable, to favor one type of node over the other when making a decision on how many of each node to allocate. The default isauto
or equal weight to bothdedicated
andlow_priority
nodes. Valid values arenull
(or omitting the property),dedicated
, orlow_priority
. - (optional)
rebalance_preemption_percentage
will rebalance the compute nodes to bias for dedicated nodes when the pre-empted node count reaches the indicated threshold percentage of the total current dedicated and low priority nodes. The default isnull
or no rebalancing is performed.
- (required)
- (optional)
formula
is a custom autoscale formula to apply to the pool. If bothformula
andscenario
are specified, thenformula
is used.
- (optional)
- (optional)
inter_node_communication_enabled
designates if this pool is set up for inter-node communication. This must be set totrue
for any containers that must communicate with each other such as MPI applications. This property cannot be enabled if there are positive values for bothdedicated and
low_priority` compute nodes specified above. This property will be force enabled if peer-to-peer replication is enabled. - (optional)
reboot_on_start_task_failed
allows Batch Shipyard to reboot the compute node in case there is a transient failure in node preparation (e.g., network timeout, resolution failure or download problem). This defaults tofalse
. - (optional)
attempt_recovery_on_unusable
allows Batch Shipyard to attempt to recover nodes that enterunusable
state automatically. Note that enabling this option can lead to infinite wait onpool add
orpool resize
with--wait
. This defaults tofalse
and is ignored forcustom_image
where the behavior is alwaysfalse
. - (optional)
block_until_all_global_resources_loaded
will block the node from entering ready state until all Docker images are loaded. This defaults totrue
. This option has no effect onnative
container support pools (the behavior will effectively reflecttrue
for this property onnative
container support pools). - (optional)
transfer_files_on_pool_creation
will ingress allfiles
specified in theglobal_resources
section of the global configuration file when the pool is created. If files are to be ingressed to Azure Blob or File Storage, then data movement operations are overlapped with the creation of the pool. If files are to be ingressed to a shared file system on the compute nodes, then the files are ingressed after the pool is created and the shared file system is ready. Files can be ingressed to both Azure Blob Storage and a shared file system during the same pool creation invocation. If this property is set totrue
thenblock_until_all_global_resources_loaded
will be force disabled. If omitted, this property defaults tofalse
. - (optional)
input_data
is an object containing data that should be ingressed to all compute nodes as part of node preparation. It is important to note that if you are combining this action withfiles
and are ingressing data to Azure Blob or File storage as part of pool creation, that the blob containers or file shares defined here will be downloaded as soon as the compute node is ready to do so. This may result in the blob container/blobs or file share/files not being ready in time for theinput_data
transfer. It is up to you to ensure that these two operations do not overlap. If there is a possibility of overlap, then you should ingress data defined infiles
prior to pool creation and disable the option abovetransfer_files_on_pool_creation
. This object currently supportsazure_batch
andazure_storage
as members.azure_batch
contains the following members:- (required)
job_id
the job id of the task - (required)
task_id
the id of the task to fetch files from - (optional)
include
is an array of include filters - (optional)
exclude
is an array of exclude filters - (required)
destination
is the destination path to place the files
- (required)
azure_storage
contains the following members:- (required)
storage_account_settings
contains a storage account link as defined in the credentials config. - (required)
remote_path
is required when downloading from Azure Storage. This path on Azure includes either the container or file share path along with all virtual directories. - (required)
local_path
is required when downloading from Azure Storage. This specifies where the files should be downloaded to on the compute node. Please note that you should not specify a destination that is on a shared file system. If you require ingressing to a shared file system location like a GlusterFS volume, then use the global configurationfiles
property and thedata ingress
command. - (optional)
is_file_share
denotes if theremote_path
is on a file share. This defaults tofalse
. - (optional)
include
property defines optional include filters. - (optional)
exclude
property defines optional exclude filters. - (optional)
blobxfer_extra_options
are any extra options to pass toblobxfer
.
- (required)
- (optional)
resource_files
is an array of resource files that should be downloaded as part of the compute node's preparation. Each array entry contains the following information:file_path
is the path within the node prep task working directory to place the file on the compute node. This directory can be referenced by the$AZ_BATCH_NODE_STARTUP_DIR/wd
path.blob_source
is an accessible HTTP/HTTPS URL. This need not be an Azure Blob Storage URL.file_mode
if the file mode to set for the file on the compute node. This is optional.
- (optional)
virtual_network
is the property for specifying an ARM-based virtual network resource for the pool. AADbatch
credentials are required for both Batch service and User Subscription modes. Please see the Virtual Network guide for more information.- (required/optional)
arm_subnet_id
is the full ARM resource id to the subnet on the virtual network. This virtual network must already exist and must exist within the same region and subscription as the Batch account. If this value is specified, the other properties ofvirtual_network
are ignored. AADmanagement
credentials are not strictly required for this case but is recommended to be filled to allow address space validation checks. - (required/optional)
name
is the name of the virtual network. Ifarm_subnet_id
is not specified, this value is required. Note that this requires AADmanagement
credentials. - (optional)
resource_group
containing the virtual network. If the resource group name is not specified here, theresource_group
specified in thebatch
credentials will be used instead. - (optional)
create_nonexistant
specifies if the virtual network and subnet should be created if not found. If not specified, this defaults tofalse
. - (required if creating, optional otherwise)
address_space
is the allowed address space for the virtual network. - (required/optional)
subnet
specifies the subnet properties. This is required ifarm_subnet_id
is not specified, i.e., the virtual networkname
is specified instead. - (required)
name
is the subnet name. - (required)
address_prefix
is the subnet address prefix to use for allocation Batch compute nodes to. The maximum number of compute nodes a subnet can support is 4096 which maps roughly to a CIDR mask of 20-bits.
- (required/optional)
- (optional)
ssh
is the property for creating a user to accomodate SSH sessions to compute nodes. If this property is absent, then an SSH user is not created with pool creation. If you are running Batch Shipyard on Windows, please refer to these instructions on how to generate an SSH keypair for use with Batch Shipyard. This property is ignored for Windows-based pools.- (required)
username
is the user to create on the compute nodes. - (optional)
expiry_days
is the number of days from now for the account on the compute nodes to expire. The default is 30 days from invocation time. - (optional)
ssh_public_key
is the path to an existing SSH public key to use. If not specified, an RSA public/private keypair will be automatically generated ifssh-keygen
orssh-keygen.exe
can be found on thePATH
. This option cannot be specified withssh_public_key_data
. - (optional)
ssh_public_key_data
is the raw RSA public key data in OpenSSH format, e.g., a string starting withssh-rsa ...
. Only one key may be specified. This option cannot be specified withssh_public_key
. - (optional)
ssh_private_key
is the path to an existing SSH private key to use against eitherssh_public_key
orssh_public_key_data
for connecting to compute nodes. This option should only be specified if eitherssh_public_key
orssh_public_key_data
are specified. - (optional)
generate_docker_tunnel_script
property directs script to generate an SSH tunnel script that can be used to connect to the remote Docker engine running on a compute node. This script can only be used on non-Windows systems. - (optional)
generated_file_export_path
is the path to export the generated RSA keypair and docker tunnel script to. If omitted, the current directory is used. - (experimental)
hpn_server_swap
property enables an OpenSSH server with HPN patches to be swapped with the standard distribution OpenSSH server. This is not supported on all Linux distributions and may be force disabled.
- (required)
- (optional)
rdp
is the property for creating a user to accomodate RDP login sessions to compute nodes. If this property is absent, then an RDP user is not created with pool creation. This property is ignored for Linux-based pools.- (required)
username
is the user to create on the compute nodes. - (optional)
expiry_days
is the number of days from now for the account on the compute nodes to expire. The default is 30 days from invocation time. - (optional)
password
is the password to associate with the user. Passwords must meet the minimum complexity requirements as required by Azure Batch. If left omitted, unspecified or set tonull
, then a random password is generated and logged during anypool add
call with this section defined, orpool user add
.
- (required)
- (optional)
gpu
property defines additional information for NVIDIA GPU-enabled VMs. If not specified, Batch Shipyard will automatically download the driver for thevm_size
specified.nvidia_driver
property contains the following required members:source
is the source url to download the driver. This should be the silent-installable driver package.
- (optional)
additional_node_prep_commands
is an array of additional commands to execute on the compute node host as part of node preparation. This can be empty or omitted.
Full template
A full template of a credentials file can be found here. Note that these templates cannot be used as-is and must be modified to fit your scenario.