Batch Shipyard Pool Configuration
This page contains in-depth details on how to configure the pool json file for Batch Shipyard.
Schema
The pool schema is as follows:
{ "pool_specification": { "id": "dockerpool", "vm_configuration": { "platform_image": { "publisher": "Canonical", "offer": "UbuntuServer", "sku": "16.04-LTS", "version": "latest" }, "custom_image": { "image_uris": [ "https://mystorageaccount.blob.core.windows.net/myvhds/mycustomimg.vhd" ], "node_agent": "batch.node.ubuntu 16.04" } }, "vm_size": "STANDARD_H16R", "vm_count": { "dedicated": 8, "low_priority": 0 }, "resize_timeout": "00:20:00", "max_tasks_per_node": 1, "node_fill_type": "pack", "autoscale": { "evaluation_interval": "00:05:00", "scenario": { "name": "active_tasks", "maximum_vm_count": { "dedicated": 16, "low_priority": 8 }, "node_deallocation_option": "taskcompletion", "sample_lookback_interval": "00:10:00", "required_sample_percentage": 70, "bias_last_sample": true, "bias_node_type": "low_priority", "rebalance_preemption_percentage": 50 }, "formula": "" }, "inter_node_communication_enabled": true, "reboot_on_start_task_failed": true, "block_until_all_global_resources_loaded": true, "transfer_files_on_pool_creation": false, "input_data": { "azure_batch": [ { "job_id": "jobonanotherpool", "task_id": "mytask", "include": ["wd/*.dat"], "exclude": ["*.txt"], "destination": "$AZ_BATCH_NODE_SHARED_DIR/jobonanotherpool" } ], "azure_storage": [ { "storage_account_settings": "mystorageaccount", "container": "poolcontainer", "include": ["pooldata*.bin"], "destination": "$AZ_BATCH_NODE_SHARED_DIR/pooldata", "blobxfer_extra_options": null } ] }, "resource_files": [ { "file_path": "", "blob_source": "", "file_mode": "" } ], "virtual_network": { "name": "myvnet", "resource_group": "vnet-in-another-rg", "create_nonexistant": false, "address_space": "10.0.0.0/16", "subnet": { "name": "subnet-for-batch-vms", "address_prefix": "10.0.0.0/20" } }, "ssh": { "username": "docker", "expiry_days": 30, "ssh_public_key": "/path/to/rsa/publickey.pub", "ssh_public_key_data": "ssh-rsa ...", "ssh_private_key": "/path/to/rsa/privatekey", "generate_docker_tunnel_script": true, "generated_file_export_path": null, "hpn_server_swap": false }, "gpu": { "nvidia_driver": { "source": "https://some.url" } }, "additional_node_prep_commands": [ ] } }
The pool_specification
property has the following members:
(required) id
is the compute pool ID.
(required) vm_configuration
specifies the image configuration for the
VM. Either platform_image
or custom_image
must be specified. You cannot
specify both. If using a custom image, please see the
Custom Image Guide first.
* (required for platform image) platform_image
defines the Marketplace
platform image to use:
* (required for platform image) publisher
is the publisher name of the
Marketplace VM image.
* (required for platform image) offer
is the offer name of the
Marketplace VM image.
* (required for platform image) sku
is the sku name of the Marketplace
VM image.
* (optional) version
is the image version to use. The default is
latest
.
* (required for custom image) custom_image
defines the custom image to
use:
* (required for custom image) image_uris
defines a list of page blob
VHDs to use for the pool. These should be bare URLs without SAS keys.
* (required for custom image) node_agent
is the node agent sku id to
use with this custom image. You can view supported base images and
their node agent sku ids with the pool listskus
command.
(required) vm_size
is the
Azure Virtual Machine Instance Size.
Please note that not all regions have every VM size available.
(required) vm_count
is the number of compute nodes to allocate. You may
specify a mixed number of compute nodes in the following properties:
* (optional) dedicated
is the number of dedicated compute nodes to
allocate. These nodes cannot be pre-empted. The default value is 0
.
* (optional) low_priority
is the number of low-priority compute nodes to
allocate. These nodes may be pre-empted at any time. Workloads that
are amenable to low_priority
nodes are those that do not have strict
deadlines for pickup and completion. Optimally, these types of jobs would
checkpoint their progress and be able to recover when re-scheduled.
The default value is 0
.
(optional) resize_timeout
is the amount of time allowed for resize
operations (note that creating a pool resizes from 0 to the specified number
of nodes). The format for this property is a timedelta with a string
representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional,
if specified. If not specified, the default is 15 minutes.
(optional) max_tasks_per_node
is the maximum number of concurrent tasks
that can be running at any one time on a compute node. This defaults to a
value of 1 if not specified. The maximum value for the property that Azure
Batch will accept is 4 x <# cores per compute node>
. For instance, for a
STANDARD_F2
instance, because the virtual machine has 2 cores, the maximum
allowable value for this property would be 8
.
(optional) node_fill_type
is the task scheduling compute node fill type
policy to apply. pack
, which is the default, attempts to pack the
maximum number of tasks on a node (controlled through max_tasks_per_node
before scheduling tasks to another node). spread
will schedule tasks
evenly across compute nodes before packing.
(optional) autoscale
designates the autoscale settings for the pool. If
specified, the vm_count
becomes the minimum number of virtual machines for
each node type for scenario
based autoscale.
* (optional) evaluation_interval
is the time interval between autoscale
evaluations performed by the service. The format for this property is a
timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is
required, but "d" is optional, if specified. If not specified, the default
is 15 minutes. The smallest value that can be specified is 5 minutes.
* (optional) scenario
is a pre-set autoscale scenario where a formula
will be generated with the parameters specified within this property.
* (required) name
is the autoscale scenario name to apply. Valid
values are active_tasks
, pending_tasks
, workday
,
workday_with_offpeak_max_low_priority
, weekday
, weekend
.
Please see the autoscale guide for
more information about these scenarios.
* (required) maximum_vm_count
is the maximum number of compute nodes
that can be allocated from an autoscale evaluation. It is useful to
have these limits in place as to control the top-end scale of the
autoscale scenario. Specifying a negative value for either of the
following properties will result in effectively no maximum limit.
* (optional) dedicated
is the maximum number of dedicated compute
nodes that can be allocated.
* (optional) low_priority
is the maximum number of low priority
compute nodes that can be allocated.
* (optional) node_deallocation_option
is the node deallocation option
to apply. When a pool is resized down and a node is selected for
removal, what action is performed for the running task is specified
with this option. The valid values are: requeue
, terminate
,
taskcompletion
, and retaineddata
. The default is taskcompletion
.
Please see this doc for more information.
* (optional) sample_lookback_interval
is the time interval to lookback
for past history for certain scenarios such as autoscale based on
active and pending tasks. The format for this property is a timedelta
with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required,
but "d" is optional, if specified. If not specified, the default is
10 minutes.
* (optional) required_sample_percentage
is the required percentage of
samples that must be present during the sample_lookback_interval
.
If not specified, the default is 70.
* (optional) bias_last_sample
will bias the autoscale scenario, if
applicable, to use the last sample during history computation. This can
be enabled to more quickly respond to changes in history with respect
to averages. The default is true
.
* (optional) bias_node_type
will bias the the autoscale scenario, if
applicable, to favor one type of node over the other when making a
decision on how many of each node to allocate. The default is auto
or equal weight to both dedicated
and low_priority
nodes. Valid
values are null
(or omitting the property), dedicated
, or
low_priority
.
* (optional) rebalance_preemption_percentage
will rebalance the compute
nodes to bias for dedicated nodes when the pre-empted node count reaches
the indicated threshold percentage of the total current dedicated and
low priority nodes. The default is null
or no rebalancing is performed.
* (optional) formula
is a custom autoscale formula to apply to the pool.
If both formula
and scenario
are specified, then formula
is used.
* (optional) inter_node_communication_enabled
designates if this pool is set
up for inter-node communication. This must be set to true
for any containers
that must communicate with each other such as MPI applications. This
property cannot be enabled if there are positive values for both
dedicated and
low_prioritycompute nodes specified above. This property
will be force enabled if peer-to-peer replication is enabled.
* (optional)
reboot_on_start_task_failedallows Batch Shipyard to reboot the
compute node in case there is a transient failure in node preparation (e.g.,
network timeout, resolution failure or download problem). This defaults to
false.
* (optional)
block_until_all_global_resources_loadedwill block the node
from entering ready state until all Docker images are loaded. This defaults
to
true.
* (optional)
transfer_files_on_pool_creationwill ingress all
filesspecified in the
global_resourcessection of the configuration json when
the pool is created. If files are to be ingressed to Azure Blob or File
Storage, then data movement operations are overlapped with the creation of the
pool. If files are to be ingressed to a shared file system on the compute
nodes, then the files are ingressed after the pool is created and the shared
file system is ready. Files can be ingressed to both Azure Blob Storage and a
shared file system during the same pool creation invocation. If this property
is set to
truethen
block_until_all_global_resources_loadedwill be force
disabled. If omitted, this property defaults to
false.
* (optional)
input_datais an object containing data that should be
ingressed to all compute nodes as part of node preparation. It is
important to note that if you are combining this action with
filesand
are ingressing data to Azure Blob or File storage as part of pool creation,
that the blob containers or file shares defined here will be downloaded as
soon as the compute node is ready to do so. This may result in the blob
container/blobs or file share/files not being ready in time for the
input_datatransfer. It is up to you to ensure that these two operations do
not overlap. If there is a possibility of overlap, then you should ingress
data defined in
filesprior to pool creation and disable the option above
transfer_files_on_pool_creation. This object currently supports
azure_batchand
azure_storageas members.
*
azure_batchcontains the following members:
* (required)
job_idthe job id of the task
* (required)
task_idthe id of the task to fetch files from
* (optional)
includeis an array of include filters
* (optional)
excludeis an array of exclude filters
* (required)
destinationis the destination path to place the files
*
azure_storagecontains the following members:
* (required)
storage_account_settingscontains a storage account link
as defined in the credentials json.
* (required)
containeror
file_shareis required when downloading
from Azure Blob Storage or Azure File Storage, respectively.
containerspecifies which container to download from for Azure Blob
Storage while
file_sharespecifies which file share to download from
for Azure File Storage. Only one of these properties can be specified
per
data_transferobject.
* (optional)
includeproperty defines an optional include filter.
Although this property is an array, it is only allowed to have 1
maximum filter.
* (required)
destinationproperty defines where to place the
downloaded files on the host file system. Please note that you should
not specify a destination that is on a shared file system. If you
require ingressing to a shared file system location like a GlusterFS
volume, then use the global configuration
filesproperty and the
data ingresscommand.
* (optional)
blobxfer_extra_optionsare any extra options to pass to
blobxfer.
* (optional)
resource_filesis an array of resource files that should be
downloaded as part of the compute node's preparation. Each array entry
contains the following information:
*
file_pathis the path within the node prep task working directory to
place the file on the compute node. This directory can be referenced
by the
$AZ_BATCH_NODE_STARTUP_DIR/wdpath.
*
blob_sourceis an accessible HTTP/HTTPS URL. This need not be an Azure
Blob Storage URL.
*
file_modeif the file mode to set for the file on the compute node.
This is optional.
* (optional)
virtual_networkis the property for specifying an ARM-based
virtual network resource for the pool. This is only available for
UserSubscription Batch accounts.
* (required)
nameis the name of the virtual network
* (optional)
resource_groupcontaining the virtual network. If
the resource group name is not specified here, the
resource_groupspecified in the
batchcredentials will be used instead.
* (optional)
create_nonexistantspecifies if the virtual network and
subnet should be created if not found. If not specified, this defaults
to
false.
* (required if creating, optional otherwise)
address_spaceis the
allowed address space for the virtual network.
* (required)
subnetspecifies the subnet properties.
* (required)
nameis the subnet name.
* (required)
address_prefixis the subnet address prefix to use for
allocation Batch compute nodes to. The maximum number of compute nodes
a subnet can support is 4096 which maps roughly to a CIDR mask of
20-bits.
* (optional)
sshis the property for creating a user to accomodate SSH
sessions to compute nodes. If this property is absent, then an SSH user is not
created with pool creation. If you are running Batch Shipyard on Windows,
please refer to [these instructions](85-batch-shipyard-ssh-docker-tunnel.md#ssh-keygen)
on how to generate an SSH keypair for use with Batch Shipyard.
* (required)
usernameis the user to create on the compute nodes.
* (optional)
expiry_daysis the number of days from now for the account on
the compute nodes to expire. The default is 30 days from invocation time.
* (optional)
ssh_public_keyis the path to an existing SSH public key to
use. If not specified, an RSA public/private keypair will be automatically
generated if
ssh-keygenor
ssh-keygen.execan be found on the
PATH.
This option cannot be specified with
ssh_public_key_data.
* (optional)
ssh_public_key_datais the raw RSA public key data in OpenSSH
format, e.g., a string starting with
ssh-rsa .... Only one key may be
specified. This option cannot be specified with
ssh_public_key.
* (optional)
ssh_private_keyis the path to an existing SSH private key
to use against either
ssh_public_keyor
ssh_public_key_datafor
connecting to compute nodes. This option should only be specified
if either
ssh_public_keyor
ssh_public_key_dataare specified.
* (optional)
generate_docker_tunnel_scriptproperty directs script to
generate an SSH tunnel script that can be used to connect to the remote
Docker engine running on a compute node. This script can only be used on
non-Windows systems.
* (optional)
generated_file_export_pathis the path to export the
generated RSA keypair and docker tunnel script to. If omitted, the
current directory is used.
* (experimental)
hpn_server_swapproperty enables an OpenSSH server with
[HPN patches](https://www.psc.edu/index.php/using-joomla/extensions/templates/atomic/636-hpn-ssh)
to be swapped with the standard distribution OpenSSH server. This is not
supported on all Linux distributions and may be force disabled.
* (optional)
gpuproperty defines additional information for NVIDIA
GPU-enabled VMs. If not specified, Batch Shipyard will automatically download
the driver for the
vm_sizespecified.
*
nvidia_driverproperty contains the following required members:
*
sourceis the source url to download the driver.
* (optional)
additional_node_prep_commands` is an array of additional commands
to execute on the compute node host as part of node preparation. This can
be empty or omitted.
Full template
A full template of a credentials file can be found here. Note that this template cannot be used as-is and must be modified to fit your scenario.