Batch Shipyard Major Version Migration Guide

This guide is to help you understand potential issues and provide guidance when migrating your workload between major versions of Batch Shipyard. Please pay special attention to breaking changes and potential backward-incompatible actions.

Migrating from 2.x to 3.x

There are significant changes between 2.x and 3.x in terms of configuration format, options/properties and implied behavior. Please read through each section carefully.

Important Notes

  • If you have existing 2.x pools, do not issue pool del with the 3.x CLI until you have migrated all of your jobs to 3.x pools. Failure to do so will render your existing pools unable to resize up (either manually or via autoscale).
  • Do not mix 2.x and 3.x pools with the same storage account used for backing metadata used by Batch Shipyard (i.e., the batch_shipyard:storage_account_settings option in the global configuration file).
  • If you must used a mixed-mode environment, please specify a different storage account (i.e., the batch_shipyard:storage_account_settings option) for metadata between the two versions in the global configuration file.

YAML Configuration Support

Although you can still use your configuration files in JSON format, it is recommended to migrate to YAML as all documentation and recipes are now shown in this format. You need not perform this conversion by hand. To perform automatic conversion, first install the converter program and then run for each config file.

# recommended to install the following with --user or in a virtual env
# if using Python 3.x
pip3 install ruamel.yaml.cmd
# if using Python 2.7
pip install ruamel.yaml.cmd

yaml json credentials.json > credentials.yaml
yaml json config.json > config.yaml
yaml json pool.json > pool.yaml
yaml json jobs.json > jobs.yaml

# verify YAML conversion first, then delete json files (assuming the
# aforementioned json files are the only json files that exist in this
# directory)
rm *.json

You can create an automated conversion script to perform this across multiple files. For example, this simple script takes in a directory as its only argument and automatically performs conversion of all .json files found.

#!/usr/bin/env bash

for file in $1/*; do
    ext="${file##*.}"
    if [ "$ext" == "yaml" ]; then
        continue
    fi
    stem="${file%.*}"
    ymlfile="$stem.yaml"
    yaml json $file > $ymlfile
    rm $file
done

You may wish to edit your YAML configuration files to reorder properties as you see fit (conforming to correct configuration) as the converter alpha orders by key property name.

Commandline Changes

CLI Docker Image Naming

The Docker image name for the CLI has changed. Batch Shipyard Docker images now follow the version-component naming convetion for tags. Thus, the latest CLI version will now be alfpark/batch-shipyard:latest-cli. This will also apply to versioned CLI images. For example, version 3.0.0 will be named as alfpark/batch-shipyard:3.0.0-cli.

Command Renaming

Some commands have been placed under sub-commands for better hierarchy control in the CLI. The following is a table of remapped commands:

Old New
data getfile data files task
data getfilenode data files node
data listfiles data files list
data stream data files stream
jobs deltasks jobs tasks del
jobs listtasks jobs tasks list
jobs termtasks jobs tasks term
pool asu pool user add
pool delnode pool nodes del
pool dsu pool user del
pool grls pool nodes grls
pool listimages pool images list
pool listnodes pool nodes list
pool rebootnodes pool nodes reboot
pool udi pool images update

The old commands no longer work with the new CLI. Please migrate to using the new commands instead.

Environment Variable Renaming

Environment variable names have changed as configuration files are no longer exclusively JSON formatted. The _JSON suffix is now replaced with _CONF. The variable mapping is as follows:

Old New
SHIPYARD_CREDENTIALS_JSON SHIPYARD_CREDENTIALS_CONF
SHIPYARD_CONFIG_JSON SHIPYARD_CONFIG_CONF
SHIPYARD_POOL_JSON SHIPYARD_POOL_CONF
SHIPYARD_JOBS_JSON SHIPYARD_JOBS_CONF
SHIPYARD_FS_JSON SHIPYARD_FS_CONF

--configdir Default

--configdir (or SHIPYARD_CONFIGDIR environment variable) now defaults to the current working directory, i.e., ., if no other configuration file options are specified.

Behavioral Changes

Azure File Mounts

Azure File mounts now mount directly to the host instead of on a per-container invocation basis. A side-effect of this change means that the Azure File share is now visible for all users on the host and container invocations. This change was necessitated due to the unsupported Docker Azure File Volume driver and issues with concurrent tasks mounting the same file share.

General Configuration Changes

input_data with azure_storage

Due to the migration to blobxfer 1.0.0, any specification with data ingress from Azure Storage has been changed to take advantage of the new features.

The old configuration style:

"input_data": {
    "azure_storage": [
        {
            "storage_account_settings": "mystorageaccount",
            "container": "mycontainer",
            "include": ["data/*.bin"],
            "destination": "$AZ_BATCH_NODE_SHARED_DIR/mydata",
            "blobxfer_extra_options": null
        }
    ]
}

container or file_share is no longer a valid property. The new version of blobxfer allows specifying the exact remote Azure path to source data from which gives you greater control and flexibility without having to resort to individual configuration blocks with single include filters. Thus, container or file_share is now simply remote_path which is the Azure storage path including the container or file share name with all virtual directories (if required), and if downloading a single entity, the name of the remote object. Thus, you can specify, for example, mycontainer/dir to download all of the blob objects with the dir directory of mycontainer. Or you can even specify, for example, myfileshare/dir/myfile.dat to download just the single file. To specify that your remote_path is on Azure Files rather than Azure Blob Storage, you will need to specify is_file_share as true.

include is now truly a list property where you can specify zero or more include filters to be applied to the remote_path. Additionally, there is now support for zero or more exclude filters (specified as a list, similar to include) which will be applied after all of the include filters are applied.

destination is now renamed as local_path to conform with the new blobxfer command structure.

For the example above, this old 2.x configuration should be converted to:

input_data:
  azure_storage:
  - storage_account_settings: mystorageaccount
    remote_path: mycontainer/data
    include:
    - '*.bin'
    is_file_share: false
    local_path: $AZ_BATCH_NODE_SHARED_DIR/mydata

Credentials Configuration Changes

aad can be "globally" set

Most of the aad members can now be set at the global level under an aad property which will apply to all services that can or must be accessed via Azure Active Directory. You should only apply this type of configuration if your service principal (application/client) has sufficient permission and action permissions for operations required. Please see the credentials documentation for more information.

Global Configuration Changes

docker_registry is no longer valid

Configuration for Docker image references to add to a pool have now been greatly simplified. This section is no longer valid and should not be specified. Instead, please specify fully qualified Docker image names within the docker_images property of global_resources. See the next section for more information. docker_registry under credentials is still required for registry servers requiring valid logins.

Fully-qualified Docker image names in docker_images

Images specified in the docker_images property of global_resources should be fully-qualified with any Docker registry server prepended to the image as if referencing this image on a local machine with docker pull or docker run. Image names with no server will default to Docker public hub.

If the Docker registry server where the image resides requires a login, then the server must have a corresponding credential in the credentials configuration under docker_registry.

docker_volumes in global_resources has been renamed to volumes

The old docker_volumes property supporting data_volumes and shared_data_volumes under global_resources has been renamed to simply volumes (still nested under global_resources). This change is due to adding Singularity support and volumes supports binding paths under both container types.

Private registries backed to Azure Storage Blob

Private registries backed directly to Azure Storage Blob are no longer supported. This is not to be confused with a "Classic" Azure Container Registry which is still supported.

If you are still using this mechanism, please migrate your images to another Docker registry such as Azure Container Registry.

Additional registries

If you want to execute tasks referencing Docker images that are not specified in the docker_images property under global_resources but require valid logins, then you should specify these registry servers under the additional_registries property.

Pool Configuration Changes

Virtual Network

Virtual networks can now be specified with the ARM Subnet Id directly. Set arm_subnet_id to the full ARM Subnet Id. This will cause other properties within the virtual_network property to be ignored.

You can find an ARM Subnet Id through the portal by selecting Properties on the corresponding virtual network and then appending /subnets/<subnet_name> where <subnet_name> is the name of the subnet.

Note that you must use a aad credential with your Batch account. Please see the Virtual Network guide for more information.

Custom Images

Custom images are now provisioned from an ARM Image Resource rather than a page blob VHD. Set the arm_image_id to the ARM Image Id. You can find an ARM Image Id through the portal by clicking on the ARM Image resource where it will be displayed as RESOURCE ID.

Please see the Custom Image guide for more information.

Native container support pools

Azure Batch can now provision pools with Docker container support built-in. You can specify the native property as true. Batch Shipyard will determine if the specified platform image is compatible with native container support and will enable it, if so. Custom images can also be natively supported, but may fail provisioning if requisite software is not installed. If you follow the Custom Image guide then the image should be native compatible.

Please see this FAQ item regarding when to choose native container support pools.

Jobs Configuration Changes

docker_image preferred over image

In the tasks array, docker_image is now preferred over image for disambiguation.

Fully-qualified Docker image name required

The docker_image (or deprecated image) name specified for the task must be fully qualified with any Docker registry server prefixed (e.g., as if you are on a local machine executing docker pull or docker run). Image names with no server will default to Docker public hub.

Specialized hardware flags

Both gpu and infiniband no longer need to be explicitly set to true. Batch Shipyard will automatically detect if these settings can be enabled and will apply them on your behalf. If you wish to explicitly disable exposing specialized hardware to the container, you can set either or both of these flags to false.

output_data with azure_storage

Due to the migration to blobxfer 1.0.0, any specification within the tasks array with data egress to Azure Storage has been changed to take advantage of the new features.

The old configuration style:

"output_data": {
    "azure_storage": [
        {
            "storage_account_settings": "mystorageaccount",
            "container": "output",
            "source": null,
            "include": ["out*.bin"],
            "blobxfer_extra_options": null
        }
    ]
}

container or file_share is no longer a valid property. The new version of blobxfer allows specifying the exact remote Azure path to place data to which gives you greater control and flexibility without having to resort to individual configuration blocks with single include filters. Thus, container or file_share is now simply remote_path which is the Azure storage path including the container or file share name with all virtual directories (if required), and if uploading a single entity, the name of the remote object. Thus, you can specify, for example, myfileshare/dir to upload all local files to the dir directory directly. To specify that your remote_path is on Azure Files rather than Azure Blob Storage, you will need to specify is_file_share as true. Or you can even specify, for example, myfileshare/dir/myfile.dat to upload just the single file. If you are uploading a single entity, remember to use --rename in the blobxfer_extra_options list.

include is now truly a list property where you can specify zero or more include filters to be applied to the remote_path. Additionally, there is now support for zero or more exclude filters (specified as a list, similar to include) which will be applied after all of the include filters are applied.

source is now renamed as local_path to conform with the new blobxfer command structure. local_path can be empty (which will default to the task's directory, i.e., $AZ_BATCH_TASK_DIR).

For the example above, this old 2.x configuration should be converted to:

output_data:
  azure_storage:
  - storage_account_settings: mystorageaccount
    remote_path: output
    include:
    - 'out*.bin'
    is_file_share: false

File-based task_factory with azure_storage

Due to the migration to blobxfer 1.0.0, any specification within the tasks array with a file task_factory and azure_storage has been changed.

The old configuration style:

"task_factory": {
    "file": {
        "azure_storage": {
            "storage_account_settings": "mystorageaccount",
            "file_share": "myfileshare",
            "include": ["*.png"],
            "exclude": ["*.tmp"]
        },
        "task_filepath": "file_name"
    }
}

container or file_share is no longer a valid property. The new version of blobxfer allows specifying the exact remote Azure path to source data from which gives you greater control and flexibility without having to resort to individual configuration blocks with single include filters. Thus, container or file_share is now simply remote_path which is the Azure storage path including the container or file share name with all virtual directories (if required). Thus, you can specify, for example, mycontainer/dir to generate tasks based on all of the blob objects with the dir directory of mycontainer. To specify that your remote_path is on Azure Files rather than Azure Blob Storage, you will need to specify is_file_share as true.

For the example above, this old 2.x configuration should be converted to:

task_factory:
  file:
    azure_storage:
      storage_account_settings: mystorageaccount
      remote_path: myfileshare
      is_file_share: true
      exclude:
      - '*.tmp'
      include:
      - '*.png'
    task_filepath: file_name