Batch Shipyard Global Configuration
This page contains in-depth details on how to configure the global configuration file for Batch Shipyard.
Schema
The global config schema is as follows:
batch_shipyard: storage_account_settings: mystorageaccount storage_entity_prefix: shipyard generated_sas_expiry_days: null autogenerated_task_id: prefix: task- zfill_width: 5 encryption: enabled: true pfx: filename: encrypt.pfx passphrase: mysupersecretpassword sha1_thumbprint: 123456789... public_key_pem: encrypt.pem fallback_registry: myregistry.azurecr.io delay_docker_image_preload: false data_replication: concurrent_source_downloads: null global_resources: additional_registries: docker: - myruntimeserver.azurecr.io docker_images: - busybox singularity_images: - image: shub://singularityhub/busybox - image: docker://busybox - image: oras://myazurecr.azurecr.io/repo/myunsignedimage:1.0.0 - image: library://user/repo/image:1.0.0 - image: library://user/repo/encryptedimage:1.0.0 encryption: certificate: sha1_thumbprint: 123456789... signed: - image: library://sylabs/tests/signed:1.0.0 signing_key: fingerprint: 8883491F4268F173C6E5DC49EDECE4F3F38D871E - image: oras://myazurecr.azurecr.io/repo/mysignedimage:1.0.0 signing_key: fingerprint: 000123000123000123000123000123000123ABCD file: /path/to/key/file - image: library://user/repo/encryptedimage:1.0.0 signing_key: fingerprint: 000123000123000123000123000123000123ABCD file: /path/to/key/file encryption: certificate: sha1_thumbprint: 123456789... volumes: data_volumes: contdatavol: container_path: /abc host_path: bind_options: ro hosttempvol: container_path: /hosttmp host_path: /tmp bind_options: rw shared_data_volumes: azurefile_vol: volume_driver: azurefile storage_account_settings: mystorageaccount azure_file_share_name: myfileshare container_path: $AZ_BATCH_NODE_SHARED_DIR/azfile mount_options: - file_mode=0777 - dir_mode=0777 bind_options: rw azureblob_vol: volume_driver: azureblob storage_account_settings: mystorageaccount azure_blob_container_name: mycontainer container_path: $AZ_BATCH_NODE_SHARED_DIR/azblob mount_options: - --use-https=true bind_options: rw nfs_server: volume_driver: storage_cluster container_path: $AZ_BATCH_NODE_SHARED_DIR/nfs_server mount_options: [] bind_options: ro glusterfs_cluster: volume_driver: storage_cluster container_path: $AZ_BATCH_NODE_SHARED_DIR/glusterfs_cluster mount_options: [] bind_options: null glusterfs_on_compute_vol: volume_driver: glusterfs_on_compute container_path: $AZ_BATCH_NODE_SHARED_DIR/glusterfs_on_compute volume_type: replica volume_options: [] bind_options: rw custom_vol: volume_driver: custom_linux_mount container_path: $AZ_BATCH_NODE_SHARED_DIR/lustre fstab_entry: fs_spec: 10.1.0.4@tcp0:10.1.0.5@tcp0:/lustre fs_vfstype: lustre fs_mntops: defaults,_netdev fs_freq: 0 fs_passno: 0 bind_options: null files: - destination: data_transfer: method: multinode_scp ssh_private_key: id_rsa_shipyard scp_ssh_extra_options: -c aes256-gcm@openssh.com rsync_extra_options: '' split_files_megabytes: 500 max_parallel_transfers_per_node: 2 relative_destination_path: myfiles shared_data_volume: glustervol source: exclude: - '*.bak' include: - '*.dat' path: /some/local/path/dir - destination: storage_account_settings: mystorageaccount data_transfer: remote_path: container/dir is_file_share: false blobxfer_extra_options: null source: exclude: - '*.tmp' include: - '*.bin' path: /some/local/path/bound/for/storage - destination: data_transfer: method: rsync+ssh ssh_private_key: id_rsa_shipyard scp_ssh_extra_options: -c aes256-gcm@openssh.com rsync_extra_options: -v relative_destination_path: relpath/on/host/test2 source: exclude: - '*.tmp' include: - '*.bin' path: /another/local/path/dir
The batch_shipyard property is used to set settings for the tool:
- (required)
storage_account_settingsis a link to the alias of the storage account specified, in this case, it ismystorageaccount. Batch shipyard requires a general purpose type of storage account for storing metadata in order to execute across a distributed environment. The restriction for a general purpose storage account only applies to this account for Batch Shipyard metadata. Additional storage accounts (of varying types) can be specified in the credentials configuration file and referenced where appropriate. - (optional)
storage_entity_prefixproperty is used as a generic qualifier to prefix storage containers (blob containers, tables, queues) with. If not specified, defaults toshipyard. - (optional)
generated_sas_expiry_daysproperty is used to set the number of days any non-resource file generated SAS key by Batch Shipyard is valid for. The default is effectively unlimited. This is useful if you want to set SAS keys are only valid for a preferred period of time. - (optional)
autogenerated_task_idcontrols how autogenerated task ids are named. Note that the total length of an autogenerated task id must not exceed 64 characters.- (optional)
prefixis the task prefix to use with the task id. This can be any combination of alphanumeric characters including hyphens and underscores. Empty string is permitted for theprefix. The default istask-. - (optional)
zfill_widthis the number of zeros to left pad the integral task number. This can be set to zero which may be useful for task dependency range scenarios in combination with an empty stringprefixabove. The default is5.
- (optional)
- (optional)
encryptionobject is used to define credential encryption which contains the following members:- (required)
enabledproperty enables or disables this feature. - (required)
pfxobject defines the PFX certificate - (required)
filenameproperty is the full path and name to the PFX certificate - (required)
passphraseproperty is the passphrase for the PFX certificate. This cannot be empty. - (optional)
sha1_thumbprintis the SHA1 thumbprint of the certificate. If the PFX file is created using thecert createcommand, then the SHA1 thumbprint is output. It is recommended to populate this property such that it does not have to be generated when needed for encryption. - (optional)
public_key_pemproperty is the full path and name to the RSA public key in PEM format. If the PFX file is created using thecert createcommand, then this file is generated along with the PFX file. It is recommended to populate this property with the PEM file path such that it does not have to be generated when needed for encryption.
- (required)
- (optional)
fallback_registryis a property that designates a Docker registry to use as a fallback for retrieving Batch Shipyard system images required to bootstrap each compute node. This is useful to minimize occurrences when Docker Hub is experiencing an outage or degradation. If this property is populated, then the associated login information for this registry server must be specified in the credentials configuration underdocker_registry. Note that this registry must follow naming conventions exactly as if on Docker Hub, except images are naturally prefixed with the server name. To easily replicate/mirror the requisite Batch Shipyard images, please see the commandmisc mirror-images. This command should be run for every Batch Shipyard version that you intend to use in conjunction with this option. - (optional)
delay_docker_image_preloadcontrols when to perform Docker image preloading fornativeLinux pools only. If this property is set totruefornativeLinux pools, then Docker images are loaded during the node prep phase (i.e., the Azure Batch start task). Advantages to delaying preloading to this phase is to decouple potential image preload failures with other problems that can cause a node to go in unusable state. Additionally, enabling this feature will allow configuration ofdata_replicationoptions (see below) fornativeLinux pools, includingconcurrent_source_downloadstuning and other peer-to-peer options. This option has no effect on non-nativepools as images are always "delay" preloaded. Similarly, this option has no effect on Windows pools.
data_replication is an entirely optional section to exert fine-grained
control of the download and data replication behavior for container images.
- (optional)
data_replicationproperty is used to configure the internal image replication mechanism between compute nodes within a compute pool. Theconcurrent_source_downloadsproperty specifies the number of nodes that can concurrently download the source images in parallel. The default, if not specified, is 10.
global_resources contains properties for populating each compute node
with required container images and for data movement directives.
- (required)
global_resourcesproperty contains information regarding required container images, volume configuration and data ingress information. This property is required.- (optional)
additional_registriesspecifies any additional registry login information to load on to the pool, as specified in the credentials configuration. Do not specify any registries here that are already part of eitherdocker_imagesorsingularity_imagesbelow. This option is mainly for accessing container registries that do not have associated images with them to preload on to the pool. - (optional)
dockerspecifies a list of Docker registries to load. If these require login credentials, they must be specified in the credentials configuration file. - (optional)
singularityspecifies a list of Singularity registries to load. - (required if using Docker)
docker_imagesis an array of Docker images that should be installed on every compute node when this configuration file is supplied while creating a compute pool. Image tags are supported. Image names should be fully qualified including any registry server name prefix (unless it exists in Docker Hub and can be omitted). If you are referencing a private registry that requires a login, then you must add the credential for the registry in thedocker_registryproperty in the credentials file. If this property is empty or is not specified, no Docker images will be pre-loaded on to compute nodes which will lead to increased task startup latency. It is highly recommended not to leave this property empty if possible. Note that if you do not specify Docker images to preload, you must specifyallow_run_on_missing_imageastruein your job specification for any tasks that reference images that aren't specified in this property. - (required if using Singularity)
singularity_imagesproperty contains all the Singularity images that should be installed on every compute node when this configuration file is supplied while creating a compute pool. Image tags are supported. Image names should be fully qualified including any registry server name prefix. If you are referencing a private registry that requires a login, then you must add the credential for the registry in thesingularity_registryproperty in the credentials file. If this property is empty or is not specified, no Singularity images will be pre-loaded on to compute nodes which will lead to increased latency to begin task execution. It is highly recommended not to leave this property empty if possible. Due to Singularity limitations, if the image specified at a certain URI changes, the image will automatically be pulled again from the registry the next time that the image is used in a task which can lead to increased latency to begin task execution if the image differs from a previous pull, and lead to potential inconsistencies between task executions. Note thatsingularity_imagesis incompatible withnativecontainer support enabled pools. For encrypted container support, please see the Singularity Encrypted Containers documentation for more details. - (optional)
unsignedis a list of Singularity images that will not be verified when installing on every compute node.shub://,docker://,library://, andoras://URI prefixes are supported.- (required)
imageis the unsigned Singularity image. - (optional)
encryptionis the image encryption properties. Only images encrypted with an asymmetric RSA key pair are currently supported in Batch Shipyard.- (required)
certificateis the PFX decryption certificate with the appropriate private key that has been bound to the Batch account. This cannot be a CER certificate as a private key is required for image decryption.- (required)
sha1_thumbprintis the associated SHA-1 thumbprint of the certificate. This must be associated with the PFX with the private key.
- (required)
- (required)
- (required)
- (optional)
signedis a list of objects containing the Singularity image that will be verified when installing on every compute node as well as the information to verify the image.library://, andoras://URI prefixes are supported.- (required)
imageis the signed Singularity image. - (required)
signing_keyis the signing key properties.- (required)
fingerprintis the key fingerprint of the Singularity image to verify. If nokey_fileis specified, it uses this key fingerprint to pull the key from the default key server "https://keys.sylabs.io" - (optional)
fileis a local path to a public key file. The key fingerprint of the key infilemust match thefingerprint.
- (required)
- (optional)
encryptionis the image encryption properties. Only images encrypted with an asymmetric RSA key pair are currently supported in Batch Shipyard.- (required)
certificateis the PFX decryption certificate with the appropriate private key that has been bound to the Batch account. This cannot be a CER certificate as a private key is required for image decryption.- (required)
sha1_thumbprintis the associated SHA-1 thumbprint of the certificate. This must be associated with the PFX with the private key.
- (required)
- (required)
- (required)
- (optional)
filesproperty specifies data that should be ingressed from a location accessible by the local machine (i.e., machine invokingshipyard.pyto a shared file system location accessible by compute nodes in the pool or Azure Blob or File Storage).filesis a list of objects, which allows for multiple sources to destinations to be ingressed during the same invocation. Note that no Azure Batch environment variables (i.e.,$AZ_BATCH_-style environment variables) are available as path arguments since ingress actions performed withinfilesare done locally on the machine invokingshipyard.py. Each object within thefileslist contains the following members:- (required)
sourceproperty contains the following members:- (required)
pathis a local path. A single file or a directory can be specified. Filters below will be ignored ifpathis a file and not a directory.
- (required)
- (optional)
includeis an array of Unix shell-style wildcard filters where only files matching a filter are included in the data transfer. - (optional)
excludeis an array of Unix shell-style wildcard filters where files matching a filter are excluded from the data transfer. Filters specified inexcludehave precedence over filters specified ininclude. - (required)
destinationproperty contains the following members:- (required or optional)
shared_data_volumeorstorage_account_settingsfor data ingress to a GlusterFS volume or Azure Blob or File Storage. If you are ingressing to a pool with only one compute node, you may omitshared_data_volumes. Otherwise, you may specify one or the other, but not both in the same object. Please see below in theshared_data_volumesfor information on how to set up a GlusterFS share. - (required or optional)
relative_destination_pathspecifies a relative destination path to place the files, with respect to the target root. If transferring to ashared_data_volumethen this is relative to the GlusterFS volume root. If transferring to a pool with one single node in it, thus, noshared_data_volumeis specified in the prior property, then this is relative to $AZ_BATCH_NODE_ROOT_DIR. To place files directly in$AZ_BATCH_NODE_ROOT_DIR(not recommended), you can specify this property as empty string when not ingressing to ashared_data_volume. Note that ifscpis selected while attempting to transfer directly to this aforementioned path, thenscpwill fail with exit code of 1 but the transfer will have succeeded (this is due to some of the permission options). If this property is not specified for ashared_data_volume, then files will be placed directly in the GlusterFS volume root. This property cannot be specified for a Azure Storage destination (i.e.,storage_account_settings).
- (required or optional)
- (required)
data_transferspecifies how the transfer should take place. The following list contains members for GlusterFS ingress when a GlusterFS volume is provided forshared_data_volume(see below for ingressing to Azure Blob or File Storage):- (required)
methodspecified which method should be used to ingress data, which should be one of:scp,multinode_scp,rsync+sshormultinode_rsync+ssh.scpwill use secure copy to copy a file or a directory (recursively) to the remote share path.multinode_scpwill attempt to simultaneously transfer files to many compute nodes usingscpat the same time to speed up data transfer.rsync+sshwill perform an rsync of files through SSH.multinode_rsync+sshwill attempt to simultaneously transfer files usingrsyncto many compute nodes at the same time to speed up data transfer with. Note that you may specify themultinode_*methods even with only 1 compute node in a pool which will allow you to take advantage ofmax_parallel_transfers_per_nodebelow. - (optional)
ssh_private_keylocation of the SSH private key for the username specified in thepool_specification:sshsection when connecting to compute nodes. The default isid_rsa_shipyard, if omitted, which is automatically generated if no SSH key is specified when an SSH user is added to a pool. - (optional)
scp_ssh_extra_optionsare any extra options to pass toscporsshforscp/multinode_scporrsync+ssh/multinode_rsync+sshmethods, respectively. In the example above,-Cenables compression and-c aes256-gcm@openssh.comis passed toscp, which can potentially increase the transfer speed by selecting theaes256-gcm@openssh.comcipher which can exploit Intel AES-NI. - (optional)
rsync_extra_optionsare any extra options to pass torsyncfor thersync+ssh/multinode_rsync+sshtransfer methods. This property is ignored for non-rsync transfer methods. - (optional)
split_files_megabytessplits files into chunks with the specified size in MiB. This can potentially help with very large files. This option forces the transfermethodtomultinode_scp. Note that the destination file system must be able to accommodate up to 2x the size of files which are split. Additionally, transfers involving files which are split will incur reconstruction costs after the transfer is complete, which will increase the total end-to-end ingress time. However, in certain scenarios, by splitting files and transferring chunks in parallel along with reconstruction may end up being faster than transferring a large file without chunking. - (optional)
max_parallel_transfers_per_nodeis the maximum number of parallel transfer to invoke per node with themultinode_scp/multinode_rsync+sshmethods. For example, if there are 3 compute nodes in the pool, and2is given for this option, then there will be up to 2 scp sessions in parallel per compute node for a maximum of 6 concurrent scp sessions to the pool. The default is 1 if not specified or omitted.
- (required)
- (required)
data_transferspecifies how the transfer should take place. When Azure Blob or File Storage is selected as the destination for data ingress, blobxfer is invoked. The following list contains members for Azure Blob or File Storage ingress when a storage account link is provided forstorage_account_settings:- (required)
remote_pathis required when uploading to Azure Storage. This property is the full path to the storage resource, including the container or file share name and all virtual directories. The container or file share need not be created beforehand. - (optional)
is_file_sharespecifies if the destination is a Azure File Share rather than an Azure Blob Container. The default isfalse. - (optional)
blobxfer_extra_optionsare any extra options to pass toblobxfer. Please runblobxfer -hto see available extra options that may be pertinent to your scenario.
- (required)
- (required)
- (optional)
volumesproperty can consist of two different types of volumes:data_volumesandshared_data_volumes.data_volumescan be of two flavors depending upon ifhost_pathis set to null or not. In the former, this is typically used with theVOLUMEkeyword in Dockerfiles to initialize a data volume with existing data inside the image. Ifhost_pathis set, then the path on the host is mounted in the container at the path specified withcontainer_path.- (required)
host_pathhost path to bind - (optional)
container_pathcontainer path to map to the host path. If not specified, the samehost_pathis used in the container. - (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
- (required)
- (optional)
shared_data_volumesproperty defines persistent shared storage volumes. In the first shared volume,azurefilevolis the alias of this volume (please see the following section for information regarding other types of supportedshared_data_volumestypes:volume_driverproperty specifies the Docker Volume Driver to use. Currently Batch Shipyard supportsazureblob,azurefile,glusterfs_on_compute,storage_cluster, orcustom_linux_mountas thevolume_driver. For this volume (azurefilevol), as this is an Azure File shared volume, thevolume_drivershould be set asazurefile.storage_account_settingsis a link to the alias of the storage account specified that holds this Azure File Share. Note that when usingazurefilefor a shared data volume, the storage account that holds the file share must reside within the same Azure region as the Azure Batch compute pool for certain Linux host operating systems. Attempting to mount an Azure File share that is cross-region for operating systems that do not support such functionality will result in failure as those Linux Samba clients do not support share level encryption at this time.azure_file_share_nameis the name of the share name on Azure Files. Note that the Azure File share must be created beforehand, the toolkit does not create Azure File shares, it only mounts them to the compute nodes.container_pathis the path in the container to mount.mount_optionsare the mount options to pass to the mount command. This option is ignored on Windows pools. It is recommended to use0777for bothfile_modeanddir_modeon Linux pools as theuidandgidcannot be reliably determined before the compute pool is allocated and this volume will be mounted as the root user.- (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
- (optional)
Important note: Specifying a shared_data_volumes property and any
number of shared data volumes does not automatically bind these specified
mounts to the container when a task is run. Binding of the mount to the
container when the task is run is specified in the
jobs configuration on a per job
or per task basis.
The second shared volume, azureblobvol is an Azure Blob storage container
mount via blobfuse. Please
carefully review the limitations with using blobfuse and may not necessarily
be the best fit for your workload. If not, consider ingressing and/or
egressing your data from/to blobs using the data movement capabilities of
Batch Shipyard. These volumes have the following properties:
- (required)
volume_driverproperty should be set asazureblob. - (required)
storage_account_settingsis a link to the alias of the storage account specified that holds this Azure File Share. - (required)
azure_blob_container_nameis the name of the container on Azure Blob storage. If the Azure Blob container does not exist, it is created. - (required)
container_pathis the path in the container to mount. - (optional)
mount_optionsare the mount and FUSE options to pass to the blobfuse mount command. Please see the blobfuse documentation for available options. - (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
The third shared volume, nfs_server is an NFS server that is to be
mounted on to compute node hosts. The name nfs_server should match the
remote_fs:storage_cluster:id specified as your NFS server. These NFS
servers can be configured using the fs command in Batch Shipyard. These
volumes have the following properties:
- (required)
volume_driverproperty should be set asstorage_cluster. - (required)
container_pathis the path in the container to mount. - (optional)
mount_optionsproperty defines additional mount options to pass when mounting this file system to the compute node. - (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
The fourth shared volume, glusterfs_cluster is a GlusterFS cluster that is
mounted on to compute node hosts. The name glusterfs_cluster should match
the remote_fs:storage_cluster:id specified as your GlusterFS cluster.
These GlusterFS clusters can be configured using the fs command in Batch
Shipyard. These volumes have the following properties:
- (required)
volume_driverproperty should be set asstorage_cluster. - (required)
container_pathis the path in the container to mount. - (optional)
mount_optionsproperty defines additional mount options to pass when mounting this file system to the compute node. - (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
The fifth shared volume, glustervol, is a
GlusterFS network file system. Please note that
glusterfs_on_compute are GlusterFS volumes co-located on the VM's temporary
local disk space which is a shared resource. Sizes of the local temp disk for
each VM size can be found
here.
If specifying a glusterfs_on_compute volume, you must enable internode
communication in the pool configuration file. These volumes have the following
properties:
- (required)
volume_driverproperty should be set asglusterfs_on_compute. - (required)
container_pathis the path in the container to mount. - (optional)
volume_typeproperty defines the GlusterFS volume type. Currently,replicais the only supported type. - (optional)
volume_optionsproperty defines additional GlusterFS volume options to set. - (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
glusterfs_on_compute volumes are mounted on the host at
$AZ_BATCH_NODE_ROOT_DIR/mounts/gluster_on_compute/gv0. Batch Shipyard will
automatically replace container path references in direct and storage-based
data ingress/egress with their host path equivalents.
Note that when resizing a pool with a glusterfs_on_compute shared file
systems that you must resize with the pool resize command in shipyard.py
and not with Azure Portal, Batch Labs or any other tool.
The sixth shared volume, custom_vol is a custom Linux mount volume. This can
be used to specify a custom filesystem mount where you would join the Batch
compute nodes to an existing filesystem that is accessible (within the virtual
network or publicly). Note that if the software and userland utilities do
not exist by default on the host, mounting of these custom volumes will
fail. Ensure that you have either populated the pool
additional_node_prep_commands:pre with the proper commands to install
the software or have prepared a custom image with the appropriate software.
These volumes have the following properties:
- (required)
volume_driverproperty should be set ascustom_linux_mount. - (required)
container_pathis the path in the container to mount. - (required)
fstab_entryare the required fstab components:- (required)
fs_specis the first field, which is the block special device or the remote filesystem to be mounted - (required)
fs_vfstypeis the third field, which is the filesystem type - (optional)
fs_mntopsis the fourth field, which is the mount options associated by the filesystem. If this is omitted,defaultsis supplied. Note thatmount_optionsproperty used in other shared data volumes is not used. - (optional)
fs_freqis the fifth field, which is used by dump - (optional)
fs_passnois the sixth field, which is used by fsck
- (required)
- (optional)
bind_optionsare the bind options to use, typically one ofrofor read-only,rwfor read-write. If unspecified ornull, this defaults torw.
Finally, note that all volumes can be omitted completely along with
one or all of data_volumes and shared_data_volumes if you do not require
this functionality.
Full template
A full template of a credentials file can be found here. Note that these templates cannot be used as-is and must be modified to fit your scenario.