Batch Shipyard Major Version Migration Guide
This guide is to help you understand potential issues and provide guidance when migrating your workload between major versions of Batch Shipyard. Please pay special attention to breaking changes and potential backward-incompatible actions.
Migrating from 2.x to 3.x
There are significant changes between 2.x and 3.x in terms of configuration format, options/properties and implied behavior. Please read through each section carefully.
Important Notes
- If you have existing 2.x pools, do not issue
pool del
with the 3.x CLI until you have migrated all of your jobs to 3.x pools. Failure to do so will render your existing pools unable to resize up (either manually or via autoscale). - Do not mix 2.x and 3.x pools with the same storage account used for
backing metadata used by Batch Shipyard (i.e., the
batch_shipyard
:storage_account_settings
option in the global configuration file). - If you must use a mixed-mode environment, please specify a different
storage account (i.e., the
batch_shipyard
:storage_account_settings
option) for metadata between the two versions in the global configuration file.
YAML Configuration Support
Although you can still use your configuration files in JSON format, it is recommended to migrate to YAML as all documentation and recipes are now shown in this format. You need not perform this conversion by hand. To perform automatic conversion, first install the converter program and then run for each config file.
# recommended to install the following with --user or in a virtual env # if using Python 3.x pip3 install ruamel.yaml.cmd # if using Python 2.7 pip install ruamel.yaml.cmd yaml json credentials.json > credentials.yaml yaml json config.json > config.yaml yaml json pool.json > pool.yaml yaml json jobs.json > jobs.yaml # verify YAML conversion first, then delete json files (assuming the # aforementioned json files are the only json files that exist in this # directory) rm *.json
You can create an automated conversion script to perform this across
multiple files. For example, this simple script takes in a directory
as its only argument and automatically performs conversion of all .json
files found.
#!/usr/bin/env bash for file in $1/*; do ext="${file##*.}" if [ "$ext" == "yaml" ]; then continue fi stem="${file%.*}" ymlfile="$stem.yaml" yaml json $file > $ymlfile rm $file done
You may wish to edit your YAML configuration files to reorder properties as you see fit (conforming to correct configuration) as the converter alpha orders by key property name.
Commandline Changes
CLI Docker Image Naming
The Docker image name for the CLI has changed. Batch Shipyard Docker images
now follow the version-component
naming convetion for tags. Thus, the
latest
CLI version will now be alfpark/batch-shipyard:latest-cli
. This
will also apply to versioned CLI images. For example, version 3.0.0
will
be named as alfpark/batch-shipyard:3.0.0-cli
.
Command Renaming
Some commands have been placed under sub-commands for better hierarchy control in the CLI. The following is a table of remapped commands:
Old | New |
---|---|
data getfile |
data files task |
data getfilenode |
data files node |
data listfiles |
data files list |
data stream |
data files stream |
jobs deltasks |
jobs tasks del |
jobs listtasks |
jobs tasks list |
jobs termtasks |
jobs tasks term |
pool asu |
pool user add |
pool delnode |
pool nodes del |
pool dsu |
pool user del |
pool grls |
pool nodes grls |
pool listimages |
pool images list |
pool listnodes |
pool nodes list |
pool rebootnodes |
pool nodes reboot |
pool udi |
pool images update |
The old commands no longer work with the new CLI. Please migrate to using the new commands instead.
Environment Variable Renaming
Environment variable names have changed as configuration files are no longer
exclusively JSON formatted. The _JSON
suffix is now replaced with _CONF
.
The variable mapping is as follows:
Old | New |
---|---|
SHIPYARD_CREDENTIALS_JSON |
SHIPYARD_CREDENTIALS_CONF |
SHIPYARD_CONFIG_JSON |
SHIPYARD_CONFIG_CONF |
SHIPYARD_POOL_JSON |
SHIPYARD_POOL_CONF |
SHIPYARD_JOBS_JSON |
SHIPYARD_JOBS_CONF |
SHIPYARD_FS_JSON |
SHIPYARD_FS_CONF |
--configdir
Default
--configdir
(or SHIPYARD_CONFIGDIR
environment variable) now defaults
to the current working directory, i.e., .
, if no other configuration file
options are specified.
Behavioral Changes
Azure File Mounts
Azure File mounts now mount directly to the host instead of on a per-container invocation basis. A side-effect of this change means that the Azure File share is now visible for all users on the host and container invocations. This change was necessitated due to the unsupported Docker Azure File Volume driver and issues with concurrent tasks mounting the same file share.
General Configuration Changes
input_data
with azure_storage
Due to the migration to blobxfer 1.0.0
, any specification with data
ingress from Azure Storage has been changed to take advantage of the new
features.
The old configuration style:
"input_data": { "azure_storage": [ { "storage_account_settings": "mystorageaccount", "container": "mycontainer", "include": ["data/*.bin"], "destination": "$AZ_BATCH_NODE_SHARED_DIR/mydata", "blobxfer_extra_options": null } ] }
container
or file_share
is no longer a valid property. The new version
of blobxfer
allows specifying the exact remote Azure path to source data
from which gives you greater control and flexibility without having to resort
to individual configuration blocks with single include filters. Thus,
container
or file_share
is now simply remote_path
which is the
Azure storage path including the container or file share name with all
virtual directories (if required), and if downloading a single entity, the
name of the remote object. Thus, you can specify, for example,
mycontainer/dir
to download all of the blob objects with the dir
directory of mycontainer
. Or you can even specify, for example,
myfileshare/dir/myfile.dat
to download just the single file. To specify
that your remote_path
is on Azure Files rather than Azure
Blob Storage, you will need to specify is_file_share
as true
.
include
is now truly a list property where you can specify zero or more
include filters to be applied to the remote_path
. Additionally, there is
now support for zero or more exclude
filters (specified as a list, similar
to include
) which will be applied after all of the include filters are
applied.
destination
is now renamed as local_path
to conform with the new
blobxfer
command structure.
For the example above, this old 2.x configuration should be converted to:
input_data: azure_storage: - storage_account_settings: mystorageaccount remote_path: mycontainer/data include: - '*.bin' is_file_share: false local_path: $AZ_BATCH_NODE_SHARED_DIR/mydata
Credentials Configuration Changes
aad
can be "globally" set
Most of the aad
members can now be set at the global level under an
aad
property which will apply to all services that can or must be accessed
via Azure Active Directory. You should only apply this type of configuration
if your service principal (application/client) has sufficient permission and
action permissions for operations required. Please see the
credentials documentation
for more information.
Global Configuration Changes
docker_registry
is no longer valid
Configuration for Docker image references to add to a pool have now been
greatly simplified. This section is no longer valid and should not be
specified. Instead, please specify fully qualified Docker image names
within the docker_images
property of global_resources
. See the next
section for more information. docker_registry
under credentials is
still required for registry servers requiring valid logins.
Fully-qualified Docker image names in docker_images
Images specified in the docker_images
property of global_resources
should be fully-qualified with any Docker registry server prepended to the
image as if referencing this image on a local machine with docker pull
or docker run
. Image names with no server will default to Docker public hub.
If the Docker registry server where the image resides requires a login,
then the server must have a corresponding credential in the credentials
configuration under docker_registry
.
docker_volumes
in global_resources
has been renamed to volumes
The old docker_volumes
property supporting data_volumes
and
shared_data_volumes
under global_resources
has been renamed to simply
volumes
(still nested under global_resources
). This change is
due to adding Singularity support and volumes
supports binding paths
under both container types.
Private registries backed to Azure Storage Blob
Private registries backed directly to Azure Storage Blob are no longer supported. This is not to be confused with a "Classic" Azure Container Registry which is still supported.
If you are still using this mechanism, please migrate your images to another Docker registry such as Azure Container Registry.
Additional registries
If you want to execute tasks referencing Docker images that are not specified
in the docker_images
property under global_resources
but require valid
logins, then you should specify these registry servers under the
additional_registries
property.
Pool Configuration Changes
Virtual Network
Virtual networks can now be specified with the ARM Subnet Id directly.
Set arm_subnet_id
to the full ARM Subnet Id. This will cause other
properties within the virtual_network
property to be ignored.
You can find an ARM Subnet Id through the portal by selecting Properties
on the corresponding virtual network and then appending
/subnets/<subnet_name>
where <subnet_name>
is the name of the subnet.
Note that you must use a aad
credential with your Batch account. Please
see the Virtual Network guide for more
information.
Custom Images
Custom images are now provisioned from an ARM Image Resource rather than
a page blob VHD. Set the arm_image_id
to the ARM Image Id. You can find
an ARM Image Id through the portal by clicking on the ARM Image resource
where it will be displayed as RESOURCE ID
.
Please see the Custom Image guide for more information.
Native container support pools
Azure Batch can now provision pools with Docker container support built-in.
You can specify the native
property as true
. Batch Shipyard will determine
if the specified platform image is compatible with native container support
and will enable it, if so. Custom images can also be natively supported, but
may fail provisioning if requisite software is not installed. If you follow
the Custom Image guide then the image
should be native
compatible.
Please see this
FAQ item
regarding when to choose native
container support pools.
Jobs Configuration Changes
docker_image
requires instead of image
In the tasks array, docker_image
is now required over image
for
disambiguation.
Fully-qualified Docker image name required
The docker_image
(or deprecated image
) name specified for the task
must be fully qualified with any Docker registry server prefixed (e.g.,
as if you are on a local machine executing docker pull
or docker run
).
Image names with no server will default to Docker public hub.
Specialized hardware flags
Both gpu
and infiniband
no longer need to be explicitly set to true
.
Batch Shipyard will automatically detect if these settings can be enabled
and will apply them on your behalf. If you wish to explicitly disable
exposing specialized hardware to the container, you can set either or both
of these flags to false
.
output_data
with azure_storage
Due to the migration to blobxfer 1.0.0
, any specification within the tasks
array with data egress to Azure Storage has been changed to take advantage
of the new features.
The old configuration style:
"output_data": { "azure_storage": [ { "storage_account_settings": "mystorageaccount", "container": "output", "source": null, "include": ["out*.bin"], "blobxfer_extra_options": null } ] }
container
or file_share
is no longer a valid property. The new version
of blobxfer
allows specifying the exact remote Azure path to place data
to which gives you greater control and flexibility without having to resort
to individual configuration blocks with single include filters. Thus,
container
or file_share
is now simply remote_path
which is the
Azure storage path including the container or file share name with all
virtual directories (if required), and if uploading a single entity, the
name of the remote object. Thus, you can specify, for example,
myfileshare/dir
to upload all local files to the dir
directory directly.
To specify that your remote_path
is on Azure Files rather than Azure
Blob Storage, you will need to specify is_file_share
as true
.
Or you can even specify, for example, myfileshare/dir/myfile.dat
to upload
just the single file. If you are uploading a single entity, remember to
use --rename
in the blobxfer_extra_options
list.
include
is now truly a list property where you can specify zero or more
include filters to be applied to the remote_path
. Additionally, there is
now support for zero or more exclude
filters (specified as a list, similar
to include
) which will be applied after all of the include filters are
applied.
source
is now renamed as local_path
to conform with the new
blobxfer
command structure. local_path
can be empty (which will default
to the task's directory, i.e., $AZ_BATCH_TASK_DIR
).
For the example above, this old 2.x configuration should be converted to:
output_data: azure_storage: - storage_account_settings: mystorageaccount remote_path: output include: - 'out*.bin' is_file_share: false
File-based task_factory
with azure_storage
Due to the migration to blobxfer 1.0.0
, any specification within the tasks
array with a file
task_factory
and azure_storage
has been changed.
The old configuration style:
"task_factory": { "file": { "azure_storage": { "storage_account_settings": "mystorageaccount", "file_share": "myfileshare", "include": ["*.png"], "exclude": ["*.tmp"] }, "task_filepath": "file_name" } }
container
or file_share
is no longer a valid property. The new version
of blobxfer
allows specifying the exact remote Azure path to source data
from which gives you greater control and flexibility without having to resort
to individual configuration blocks with single include filters. Thus,
container
or file_share
is now simply remote_path
which is the
Azure storage path including the container or file share name with all
virtual directories (if required). Thus, you can specify, for example,
mycontainer/dir
to generate tasks based on all of the blob objects with
the dir
directory of mycontainer
. To specify that your remote_path
is on Azure Files rather than Azure Blob Storage, you will need to specify
is_file_share
as true
.
For the example above, this old 2.x configuration should be converted to:
task_factory: file: azure_storage: storage_account_settings: mystorageaccount remote_path: myfileshare is_file_share: true exclude: - '*.tmp' include: - '*.png' task_filepath: file_name