Batch Shipyard Resource Monitoring Configuration
This page contains in-depth details on how to configure the resource monitoring configuration file for Batch Shipyard.
Schema
The monitoring schema is as follows:
monitoring: location: <Azure region, e.g., eastus> resource_group: my-prom-server-rg hostname_prefix: prom ssh: username: shipyard ssh_public_key: /path/to/rsa/publickey.pub ssh_public_key_data: ssh-rsa ... ssh_private_key: /path/to/rsa/privatekey generated_file_export_path: null public_ip: enabled: true static: false virtual_network: name: myvnet resource_group: my-vnet-resource-group existing_ok: false address_space: 10.0.0.0/16 subnet: name: my-server-subnet address_prefix: 10.0.0.0/24 network_security: ssh: - '*' grafana: - 1.2.3.0/24 - 2.3.4.5 prometheus: - 2.3.4.5 vm_size: STANDARD_D2_V2 accelerated_networking: false services: resource_polling_interval: 15 lets_encrypt: enabled: true use_staging_environment: true prometheus: port: 9090 scrape_interval: 10s grafana: additional_dashboards: {}
The monitoring
property has the following members:
- (required)
location
is the Azure region name for the resources, e.g.,eastus
ornortheurope
. Thelocation
specified must match the same region as your Azure Batch account if monitring compute pools and/or within the same region if monitoring storage clusters. - (required)
resource_group
this is the resource group to use for the monitoring resource. - (required)
hostname_prefix
is the DNS label prefix to apply to each virtual machine and resource allocated for the monitoring resource. It should be unique. - (required)
ssh
is the SSH admin user to create on the machine. This is not optional in this configuration as it is in the pool specification. If you are running Batch Shipyard on Windows, please refer to these instructions on how to generate an SSH keypair for use with Batch Shipyard.- (required)
username
is the admin user to create on all virtual machines - (optional)
ssh_public_key
is the path to a pre-existing ssh public key to use. If this is not specified, an RSA public/private key pair will be generated for use in your current working directory (with a non-colliding name for auto-generated SSH keys for compute pools, i.e.,id_rsa_shipyard_remotefs
). On Windows only, if this is option is not specified, the SSH keys are not auto-generated (unlessssh-keygen.exe
can be invoked in the current working directory or is in%PATH%
). This option cannot be specified withssh_public_key_data
. - (optional)
ssh_public_key_data
is the raw RSA public key data in OpenSSH format, e.g., a string starting withssh-rsa ...
. Only one key may be specified. This option cannot be specified withssh_public_key
. - (optional)
ssh_private_key
is the path to an existing SSH private key to use against eitherssh_public_key
orssh_public_key_data
for connecting to storage nodes and performing operations that require SSH such as cluster resize and detail status. This option should only be specified if eitherssh_public_key
orssh_public_key_data
are specified. - (optional)
generated_file_export_path
is an optional path to specify for where to create the RSA public/private key pair.
- (required)
- (optional)
public_ip
are public IP properties for the virtual machine.- (optional)
enabled
designates if public IPs should be assigned. The default istrue
. Note that if public IP is disabled, then you must create an alternate means for accessing the resource monitor virtual machine through a "jumpbox" on the virtual network. If this property is set tofalse
(disabled), then any action requiring SSH, or the SSH command itself, will occur against the private IP address of the virtual machine. - (optional)
static
is to specify if static public IPs should be assigned to each virtual machine allocated. The default isfalse
which results in dynamic public IP addresses. A "static" FQDN will be provided per virtual machine, regardless of this setting if public IPs are enabled.
- (optional)
- (required)
virtual_network
is the virtual network to use for the resource monitor.- (required)
name
is the virtual network name - (optional)
resource_group
is the resource group for the virtual network. If this is not specified, the resource group name falls back to the resource group specified in the resource monitor. - (optional)
existing_ok
allows use of a pre-existing virtual network. The default isfalse
. - (required if creating, optional otherwise)
address_space
is the allowed address space for the virtual network. - (required)
subnet
specifies the subnet properties. This subnet should be exclusive to the resource monitor and cannot be shared with other resources, including Batch compute nodes. Batch compute nodes and storage clusters can co-exist on the same virtual network, but should be in separate subnets. It's recommended that the monitor VM be in a separate subnet as well.- (required)
name
is the subnet name. - (required)
address_prefix
is the subnet address prefix to use for allocation of the resource monitor virtual machine to.
- (required)
- (required)
- (required)
network_security
defines the network security rules to apply to the resource monitoring virtual machine.- (required)
ssh
is the rule for which address prefixes to allow for connecting to sshd port 22 on the virtual machine. In the example,"*"
allows any IP address to connect. This is an array property which allows multiple address prefixes to be specified. - (optional)
grafana
rule allows grafana HTTPS (443) server port to be exposed to the specified address prefix. Multiple address prefixes can be specified. - (optional)
prometheus
rule allows the Prometheus server port to be exposed to the specified address prefix. Multiple address prefixes can be specified. - (optional)
custom_inbound_rules
are custom inbound rules for other services that you need to expose.- (required)
<rule name>
is the name of the rule; the example usesmyrule
. Each rule name should be unique.- (required)
destination_port_range
is the ports on each virtual machine that will be exposed. This can be a single port and should be a string. - (required)
source_address_prefix
is an array of address prefixes to allow.
- (required)
- (required)
protocol
is the protocol to allow. Valid values aretcp
,udp
and*
(which means any protocol).
- (required)
- (required)
- (required)
vm_size
is the virtual machine instance size to use. - (optional)
accelerated_networking
enables or disables accelerated networking. The default isfalse
if not specified. - (required)
services
defines the behavior of the services that run on the monitoring resource virtual machine.- (optional)
resource_polling_interval
is the polling interval in seconds for monitored resource discovery. The default is15
seconds. - (optional)
lets_encrypt
defines options for enabling Let's Encrypt on the nginx reverse proxy for TLS encryption. This can only be enabled if thepublic_ip
is enabled.- (required)
enabled
controls if Let's Encrypt is enabled or not. The default istrue
. - (optional)
use_staging_environment
forces the certificate request to happen against Let's Encrypt's staging servers. Although this will enable encryption over HTTP, since the CA is fake, warnings will appear with most browsers when attempting to connect to the service endpoints on the resource monitoring VM. This is useful to ensure your configuration is correct before switching to a production certificate. The default istrue
.
- (required)
- (optional)
prometheus
configures the Prometheus server endpoint on the resource monitoring VM. Note that it is not required to define this section. If it is omitted, then the Prometheus server is not exposed.- (optional)
port
is the port to use. If this is value is omitted, the Prometheus server is not exposed. - (optional)
scrape_interval
is the collector scrape interval to use. The default is10s
. Note that valid values are Prometheus duration strings.
- (optional)
- (optional)
grafana
configures the Grafana endpoint on the resource monitoring VM- (optional)
additional_dashboards
is a dictionary of additional Grafana dashboards to provision. The format of the dictionary isfilename.json: URL
. For example,my_custom_dash.json: https://some.url
.
- (optional)
- (optional)
Resource Monitoring with Batch Shipyard Guide
Please see the full guide for information on how this feature works in Batch Shipyard.
Full template
A full template of a resource monitoring configuration file can be found here. Note that these templates cannot be used as-is and must be modified to fit your scenario.