Skip to content

Latest commit

 

History

History
81 lines (63 loc) · 4.95 KB

File metadata and controls

81 lines (63 loc) · 4.95 KB

Description

This module creates partition of TPU nodeset. TPUs are Google's custom-developed application specific ICs to accelerate machine learning workloads.

Example

The following code snippet creates TPU partition with following attributes.

  • TPU nodeset module is connected to network module.
  • TPU nodeset is of type v2-8 and version 2.10.0, you can check different configuration configuration
  • TPU vms are preemptible.
  • preserve_tpu is set to false. This means, suspended vms will be deleted.
  • Partition module uses this defined tpu_nodeset module and this partition can be accessed as tpu partition.
  - id: tpu_nodeset
    source: ./community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu
    use: [network]
    settings:
      node_type: v2-8
      tf_version: 2.10.0
      disable_public_ips: false
      preemptible: true
      preserve_tpu: false

  - id: tpu_partition
    source: ./community/modules/compute/schedmd-slurm-gcp-v6-partition
    use: [tpu_nodeset]
    settings:
      partition_name: tpu

Requirements

Name Version
terraform >= 1.3

Providers

No providers.

Modules

No modules.

Resources

No resources.

Inputs

Name Description Type Default Required
accelerator_config Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details.
object({
topology = string
version = string
})
{
"topology": "",
"version": ""
}
no
data_disks The data disks to include in the TPU node list(string) [] no
disable_public_ips If set to false. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. bool true no
docker_image The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-4-tf-<var.tf_version> string null no
name Name of the nodeset. Automatically populated by the module id if not set.
If setting manually, ensure a unique value across all nodesets.
string n/a yes
node_count_dynamic_max Maximum number of auto-scaling nodes allowed in this partition. number 5 no
node_count_static Number of nodes to be statically created. number 0 no
node_type Specify a node type to base the vm configuration upon it. string n/a yes
preemptible Should use preemptibles to burst. bool false no
preserve_tpu Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted bool false no
project_id Project ID to create resources in. string n/a yes
reserved Specify whether TPU-vms in this nodeset are created under a reservation. bool false no
service_account Service account to attach to the TPU-vm. If none is given, the default service account and scopes will be used.
object({
email = string
scopes = set(string)
})
null no
subnetwork_self_link The name of the subnetwork to attach the TPU-vm of this nodeset to. string n/a yes
tf_version Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. string "2.14.0" no
zone Zone in which to create compute VMs. TPU partitions can only specify a single zone. string n/a yes

Outputs

Name Description
nodeset_tpu Details of the nodeset tpu. Typically used as input to schedmd-slurm-gcp-v6-partition.