Batch Shipyard Remote Filesystem Configuration

This page contains in-depth details on how to configure the remote filesystem json file for Batch Shipyard.

Schema

The remote filesystem schema is as follows:

{
    "remote_fs": {
        "resource_group": "my-resource-group",
        "location": "<Azure region, e.g., eastus>",
        "managed_disks": {
            "resource_group": "my-disk-resource-group",
            "premium": true,
            "disk_size_gb": 128,
            "disk_names": [
                "p10-disk0a", "p10-disk1a",
                "p10-disk0b", "p10-disk1b"
            ]
        },
        "storage_cluster": {
            "mystoragecluster": {
                "resource_group": "my-server-resource-group",
                "hostname_prefix": "mystoragecluster",
                "ssh": {
                    "username": "shipyard",
                    "ssh_public_key": "/path/to/rsa/publickey.pub",
                    "ssh_public_key_data": "ssh-rsa ...",
                    "ssh_private_key": "/path/to/rsa/privatekey",
                    "generated_file_export_path": null
                },
                "public_ip": {
                    "enabled": true,
                    "static": false
                },
                "virtual_network": {
                    "name": "myvnet",
                    "resource_group": "my-vnet-resource-group",
                    "existing_ok": false,
                    "address_space": "10.0.0.0/16",
                    "subnet": {
                        "name": "my-server-subnet",
                        "address_prefix": "10.0.0.0/24"
                    }
                },
                "network_security": {
                    "ssh": ["*"],
                    "nfs": ["1.2.3.0/24", "2.3.4.5"],
                    "glusterfs": ["1.2.3.0/24", "2.3.4.5"],
                    "smb": ["6.7.8.9"],
                    "custom_inbound_rules": {
                        "myrule": {
                            "destination_port_range": "5000-5001",
                            "source_address_prefix": ["1.2.3.4", "5.6.7.0/24"],
                            "protocol": "*"
                        }
                    }
                },
                "file_server": {
                    "type": "glusterfs",
                    "mountpoint": "/data",
                    "mount_options": [
                        "noatime",
                        "nodiratime"
                    ],
                    "server_options": {
                        "glusterfs": {
                            "volume_name": "gv0",
                            "volume_type": "distributed",
                            "transport": "tcp",
                            "performance.cache-size": "1 GB"
                        }
                    },
                    "samba": {
                        "share_name": "data",
                        "account":  {
                            "username": "myuser",
                            "password": "",
                            "uid": 1002,
                            "gid": 1002
                        },
                        "read_only": false,
                        "create_mask": "0700",
                        "directory_mask": "0700"
                    }
                },
                "vm_count": 2,
                "vm_size": "STANDARD_F8S",
                "fault_domains": 2,
                "vm_disk_map": {
                    "0": {
                        "disk_array": ["p10-disk0a", "p10-disk1a"],
                        "filesystem": "btrfs",
                        "raid_level": 0
                    },
                    "1": {
                        "disk_array": ["p10-disk0b", "p10-disk1b"],
                        "filesystem": "btrfs",
                        "raid_level": 0
                    }
                }
            }
        }
    }
}

Details

The remote fs schema is constructed from two portions. The first section specifies Azure Managed Disks to use in the storage cluster. The second section defines the storage cluster itself, including networking and virtual machine to disk mapping.

There are two properties which reside outside of these sections: (optional) resource_group this is the default resource group to use for both the managed_disks and storage_clusters sections. This setting is only used if resource_group is not explicitly set in their respective configuration blocks. (required) location is the Azure region name for the resources, e.g., eastus or northeurope. The location specified must match the same region as your Azure Batch account if linking a compute pool with a storage cluster.

Managed Disks: `managed_disks`

This section defines the disks used by the file server as specified in the storage_clusters section. Not all disks specified here need to be used by the storage cluster, but every disk in the storage cluster should be defined in this section. (optional) resource_group this is the resource group to use for the disks. If this is not specified, then the resource_group specified in the parent is used. At least one resource_group must be defined. (optional) premium defines if premium managed disks should be created. Premium storage provisions a guaranteed level of IOPS and bandwidth that scales with disk size. The default is false which creates standard managed disks. Regardless of the type of storage used to back managed disks, all data written is durable and persistent backed to Azure Storage. (required) disk_size_gb is an integral value defining the size of the data disks to create. Note that for managed disks, you are billed rounded up to the nearest provisioned size. If you are unfamiliar with how Azure prices managed disks with regard to the size of disk chosen, please refer to this link. (required) disk_names is an array of disk names to create. All disks will be created identically with the properties defined in the managed_disks section.

Storage Clusters: `storage_clusters`

This section defines the storage clusters containing the file server specification and disk mapping. This section cross-references the managed_disks section so both sections must be populated when performing fs cluster actions.

You can specify multiple storage clusters in the storage_clusters section. Each key in the storage_clusters dictionary is a unique id for the storage cluster that you intend to create. This storage cluster id should be used as the STORAGE_CLUSTER_ID argument for all fs cluster actions in the CLI along with any configuration specified for linking against Azure Batch pools, if specified, for pool add. data ingress will also take this storage cluster id as a parameter if transfering to the file system. Each storage cluster id (key) is paired with a json property specifying the following properties: (optional) resource_group this is the resource group to use for the storage cluster. If this is not specified, then the resource_group specified in the parent is used. At least one resource_group must be defined. (required) hostname_prefix is the DNS label prefix to apply to each virtual machine and resource allocated for the storage cluster. It should be unique. (required) ssh is the SSH admin user to create on the machine. This is not optional in this configuration as it is in the pool specification. If you are running Batch Shipyard on Windows, please refer to these instructions on how to generate an SSH keypair for use with Batch Shipyard. * (required) username is the admin user to create on all virtual machines * (optional) ssh_public_key is the path to a pre-existing ssh public key to use. If this is not specified, an RSA public/private key pair will be generated for use in your current working directory (with a non-colliding name for auto-generated SSH keys for compute pools, i.e., id_rsa_shipyard_remotefs). On Windows only, if this is option is not specified, the SSH keys are not auto-generated (unless ssh-keygen.exe can be invoked in the current working directory or is in %PATH%). This option cannot be specified with ssh_public_key_data. * (optional) ssh_public_key_data is the raw RSA public key data in OpenSSH format, e.g., a string starting with ssh-rsa .... Only one key may be specified. This option cannot be specified with ssh_public_key. * (optional) ssh_private_key is the path to an existing SSH private key to use against either ssh_public_key or ssh_public_key_data for connecting to storage nodes and performing operations that require SSH such as cluster resize and detail status. This option should only be specified if either ssh_public_key or ssh_public_key_data are specified. * (optional) generated_file_export_path is an optional path to specify for where to create the RSA public/private key pair. (optional) public_ip are public IP properties for each virtual machine. * (optional) enabled designates if public IPs should be assigned. The default is true. Note that if public IP is disabled, then you must create an alternate means for accessing the storage cluster virtual machines through a "jumpbox" on the virtual network. If this property is set to false (disabled), then any action requiring SSH, or the SSH command itself, will occur against the private IP address of the virtual machine. * (optional) static is to specify if static public IPs should be assigned to each virtual machine allocated. The default is false which results in dynamic public IP addresses. A "static" FQDN will be provided per virtual machine, regardless of this setting if public IPs are enabled. (required) virtual_network is the virtual network to use for the storage cluster. * (required) name is the virtual network name * (optional) resource_group is the resource group for the virtual network. If this is not specified, the resource group name falls back to the resource group specified in the storage cluster or its parent. * (optional) existing_ok allows use of a pre-existing virtual network. The default is false. * (required if creating, optional otherwise) address_space is the allowed address space for the virtual network. * (required) subnet specifies the subnet properties. This subnet must be exclusive to the storage cluster and cannot be shared with other resources, including Batch compute nodes. Batch compute nodes and storage clusters can co-exist on the same virtual network, but should be in separate subnets. * (required) name is the subnet name. * (required) address_prefix is the subnet address prefix to use for allocation of the storage cluster file server virtual machines to. (required) network_security defines the network security rules to apply to each virtual machine in the storage cluster. * (required) ssh is the rule for which address prefixes to allow for connecting to sshd port 22 on the virtual machine. In the example, "*" allows any IP address to connect. This is an array property which allows multiple address prefixes to be specified. * (optional) nfs rule allows the NFSv4 server port to be exposed to the specified address prefix. Multiple address prefixes can be specified. This property is ignored for glusterfs clusters. * (optional) glusterfs rule allows the various GlusterFS management and brick ports to be exposed to the specified address prefix. Multiple address prefixes can be specified. This property is ignored for nfs clusters. * (optional) smb rule allows the the direct host SMB port to be exposed if a samba configuration is specified under file_server. This requires Windows 2000 or later. Please note the name of this rule is smb which refers to the protocol rather than the samba implementation for providing this service on a non-Windows host. * (optional) custom_inbound_rules are custom inbound rules for other services that you need to expose. * (required) <rule name> is the name of the rule; the example uses myrule. Each rule name should be unique. * (required) destination_port_range is the ports on each virtual machine that will be exposed. This can be a single port and should be a string. * (required) source_address_prefix is an array of address prefixes to allow. * (required) protocol is the protocol to allow. Valid values are tcp, udp and * (which means any protocol). (required) file_server is the file server specification. * (required) type is the type of file server to provision. Valid values are nfs and glusterfs. nfs will provision an NFSv4 server. glusterfs will provision a GlusterFS server. * (required) mountpoint is the path to mount the filesystem. This will also be the export path from the server for NFS. Note that with GlusterFS, if the cluster is suspended then restarted or machines are rebooted, the local gluster volume mount will not automatically mount upon boot, but will mount upon first use. This only applies to local access to the gluster volume mountpath directly on the virtual machine itself. * (optional) mount_options are mount options as an array to specify when mounting the filesystem. The examples here noatime and nodiratime reduce file metadata updates for access times on files and directories. * (optional) server_options is a key-value array of server options with the key of the filesystem type. In this example, we are explicitly definining options for glusterfs. volume_name, volume_type and transport are all special keywords. * (optional) volume_name is the name of the gluster volume. The default is gv0. * (optional) volume_type is the type of volume to create. If not specified, the default is the gluster default of a distributed volume. Please note that the volume_type specified here will have significant impact on performance and data availability delivered by GlusterFS for your workload. It is imperative to understand your data I/O and access patterns and selecting the proper volume type to maximize performance and/or availability. Although written data is durable due to managed disks, VM availability can cause reliability issues if a virtual machine fails or becomes unavailable thus resulting in unavailability of the brick hosting the data. You can view all of the available GlusterFS volume types here. * (optional) transport is the transport type to use. The default and only valid value is tcp. * (optional) Other GlusterFS tuning options can be further specified here as key-value pairs. You can find all of the tuning options here. Please note that nfs-related options, although they can be enabled, are not inherently supported by Batch Shipyard. Batch Shipyard automatically provisions the proper GlusterFS FUSE client on compute nodes that require access to GlusterFS-based storage clusters. * (optional) samba defines properties required for enabling SMB support on storage cluster nodes. This support is accomplished by running Samba alongside the NFS or GlusterFS server software. If this section is omitted, SMB access will be disabled. * (required) share_name name of the share. The path of this share is automatically mapped. * (optional) account is a user identity to mount the file share as. If this is not specified, the share will be created with guest access allowed and files and directories will be created and modified by the nobody account on the server. * (required) username is the username * (required) password is the password for the user. This cannot be null or empty. * (required) uid is the desired uid for the username * (required) gid is the desired gid for the username's group * (optional) read_only designates that the share is read only if this property is set to true. The default is false. * (optional) create_mask is the file creation mask as an octal string. The default is "0700". * (optional) directory_mask is the directory creation mask as an octal string. The default is "0700". (required) vm_count is the number of virtual machines to allocate for the storage cluster. For nfs file servers, the only valid value is 1. pNFS is not supported at this time. For glusterfs storage clusters, this value must be at least 2. (required) vm_size is the virtual machine instance size to use. To attach premium managed disks, you must use a premium storage compatible virtual machine size. (optional) fault_domains is the number of fault domains to configure for the availability set. This only applies to vm_count > 1 and must be in the range [2, 3]. The default is 2 if not specified. Note that some regions do not support 3 fault domains. * (required) vm_disk_map is the virtual machine to managed disk mapping. The number of entries in this map must match the vm_count. * (required) <instance number> is the virtual machine instance number. This value must be a string (although it is integral in nature). * (required) disk_array is the listing of managed disk names to attach to this instance. These disks must be provisioned before creating the storage cluster. * (required) filesystem is the filesystem to use. Valid values are btrfs, ext4, ext3 and ext2. btrfs is generally stable for RAID-0, with better features and data integrity protection. btrfs also allows for RAID-0 expansion and is the only filesystem compatible with the fs cluster expand command. * (optional for single disk, required for multiple disks) raid_level is the RAID level to apply to the disks in the disk_array. The only valid value for multiple disks is 0. Note that if you wish to expand the number of disks in the array in the future, you must use btrfs as the filesystem. At least two disks per virtual machine are required for RAID-0.

Remote Filesystems with Batch Shipyard Guide

Please see the full guide for information on how this feature works in Batch Shipyard.

Full template

A full template of a credentials file can be found here. Note that this template cannot be used as-is and must be modified to fit your scenario.

Sample Recipes

Sample recipes for both NFS and GlusterFS can be found in the recipes area.