Batch Shipyard Resource Monitoring Configuration
This page contains in-depth details on how to configure the resource monitoring configuration file for Batch Shipyard.
Schema
The monitoring schema is as follows:
monitoring: location: <Azure region, e.g., eastus> resource_group: my-prom-server-rg hostname_prefix: prom ssh: username: shipyard ssh_public_key: /path/to/rsa/publickey.pub ssh_public_key_data: ssh-rsa ... ssh_private_key: /path/to/rsa/privatekey generated_file_export_path: null public_ip: enabled: true static: false virtual_network: name: myvnet resource_group: my-vnet-resource-group existing_ok: false address_space: 10.0.0.0/16 subnet: name: my-server-subnet address_prefix: 10.0.0.0/24 network_security: ssh: - '*' grafana: - 1.2.3.0/24 - 2.3.4.5 prometheus: - 2.3.4.5 vm_size: STANDARD_D2_V2 accelerated_networking: false services: resource_polling_interval: 15 lets_encrypt: enabled: true use_staging_environment: true prometheus: port: 9090 scrape_interval: 10s grafana: additional_dashboards: {}
The monitoring property has the following members:
- (required)
locationis the Azure region name for the resources, e.g.,eastusornortheurope. Thelocationspecified must match the same region as your Azure Batch account if monitring compute pools and/or within the same region if monitoring storage clusters. - (required)
resource_groupthis is the resource group to use for the monitoring resource. - (required)
hostname_prefixis the DNS label prefix to apply to each virtual machine and resource allocated for the monitoring resource. It should be unique. - (required)
sshis the SSH admin user to create on the machine. This is not optional in this configuration as it is in the pool specification. If you are running Batch Shipyard on Windows, please refer to these instructions on how to generate an SSH keypair for use with Batch Shipyard.- (required)
usernameis the admin user to create on all virtual machines - (optional)
ssh_public_keyis the path to a pre-existing ssh public key to use. If this is not specified, an RSA public/private key pair will be generated for use in your current working directory (with a non-colliding name for auto-generated SSH keys for compute pools, i.e.,id_rsa_shipyard_remotefs). On Windows only, if this is option is not specified, the SSH keys are not auto-generated (unlessssh-keygen.execan be invoked in the current working directory or is in%PATH%). This option cannot be specified withssh_public_key_data. - (optional)
ssh_public_key_datais the raw RSA public key data in OpenSSH format, e.g., a string starting withssh-rsa .... Only one key may be specified. This option cannot be specified withssh_public_key. - (optional)
ssh_private_keyis the path to an existing SSH private key to use against eitherssh_public_keyorssh_public_key_datafor connecting to storage nodes and performing operations that require SSH such as cluster resize and detail status. This option should only be specified if eitherssh_public_keyorssh_public_key_dataare specified. - (optional)
generated_file_export_pathis an optional path to specify for where to create the RSA public/private key pair.
- (required)
- (optional)
public_ipare public IP properties for the virtual machine.- (optional)
enableddesignates if public IPs should be assigned. The default istrue. Note that if public IP is disabled, then you must create an alternate means for accessing the resource monitor virtual machine through a "jumpbox" on the virtual network. If this property is set tofalse(disabled), then any action requiring SSH, or the SSH command itself, will occur against the private IP address of the virtual machine. - (optional)
staticis to specify if static public IPs should be assigned to each virtual machine allocated. The default isfalsewhich results in dynamic public IP addresses. A "static" FQDN will be provided per virtual machine, regardless of this setting if public IPs are enabled.
- (optional)
- (required)
virtual_networkis the virtual network to use for the resource monitor.- (required)
nameis the virtual network name - (optional)
resource_groupis the resource group for the virtual network. If this is not specified, the resource group name falls back to the resource group specified in the resource monitor. - (optional)
existing_okallows use of a pre-existing virtual network. The default isfalse. - (required if creating, optional otherwise)
address_spaceis the allowed address space for the virtual network. - (required)
subnetspecifies the subnet properties. This subnet should be exclusive to the resource monitor and cannot be shared with other resources, including Batch compute nodes. Batch compute nodes and storage clusters can co-exist on the same virtual network, but should be in separate subnets. It's recommended that the monitor VM be in a separate subnet as well.- (required)
nameis the subnet name. - (required)
address_prefixis the subnet address prefix to use for allocation of the resource monitor virtual machine to.
- (required)
- (required)
- (required)
network_securitydefines the network security rules to apply to the resource monitoring virtual machine.- (required)
sshis the rule for which address prefixes to allow for connecting to sshd port 22 on the virtual machine. In the example,"*"allows any IP address to connect. This is an array property which allows multiple address prefixes to be specified. - (optional)
grafanarule allows grafana HTTPS (443) server port to be exposed to the specified address prefix. Multiple address prefixes can be specified. - (optional)
prometheusrule allows the Prometheus server port to be exposed to the specified address prefix. Multiple address prefixes can be specified. - (optional)
custom_inbound_rulesare custom inbound rules for other services that you need to expose.- (required)
<rule name>is the name of the rule; the example usesmyrule. Each rule name should be unique.- (required)
destination_port_rangeis the ports on each virtual machine that will be exposed. This can be a single port and should be a string. - (required)
source_address_prefixis an array of address prefixes to allow.
- (required)
- (required)
protocolis the protocol to allow. Valid values aretcp,udpand*(which means any protocol).
- (required)
- (required)
- (required)
vm_sizeis the virtual machine instance size to use. - (optional)
accelerated_networkingenables or disables accelerated networking. The default isfalseif not specified. - (required)
servicesdefines the behavior of the services that run on the monitoring resource virtual machine.- (optional)
resource_polling_intervalis the polling interval in seconds for monitored resource discovery. The default is15seconds. - (optional)
lets_encryptdefines options for enabling Let's Encrypt on the nginx reverse proxy for TLS encryption. This can only be enabled if thepublic_ipis enabled.- (required)
enabledcontrols if Let's Encrypt is enabled or not. The default istrue. - (optional)
use_staging_environmentforces the certificate request to happen against Let's Encrypt's staging servers. Although this will enable encryption over HTTP, since the CA is fake, warnings will appear with most browsers when attempting to connect to the service endpoints on the resource monitoring VM. This is useful to ensure your configuration is correct before switching to a production certificate. The default istrue.
- (required)
- (optional)
prometheusconfigures the Prometheus server endpoint on the resource monitoring VM. Note that it is not required to define this section. If it is omitted, then the Prometheus server is not exposed.- (optional)
portis the port to use. If this is value is omitted, the Prometheus server is not exposed. - (optional)
scrape_intervalis the collector scrape interval to use. The default is10s. Note that valid values are Prometheus duration strings.
- (optional)
- (optional)
grafanaconfigures the Grafana endpoint on the resource monitoring VM- (optional)
additional_dashboardsis a dictionary of additional Grafana dashboards to provision. The format of the dictionary isfilename.json: URL. For example,my_custom_dash.json: https://some.url.
- (optional)
- (optional)
Resource Monitoring with Batch Shipyard Guide
Please see the full guide for information on how this feature works in Batch Shipyard.
Full template
A full template of a resource monitoring configuration file can be found here. Note that these templates cannot be used as-is and must be modified to fit your scenario.