This module creates partition of TPU nodeset. TPUs are Google's custom-developed application specific ICs to accelerate machine learning workloads.
The following code snippet creates TPU partition with following attributes.
- TPU nodeset module is connected to
network
module. - TPU nodeset is of type
v2-8
and version2.10.0
, you can check different configuration configuration - TPU vms are preemptible.
preserve_tpu
is set to false. This means, suspended vms will be deleted.- Partition module uses this defined
tpu_nodeset
module and this partition can be accessed astpu
partition.
- id: tpu_nodeset
source: ./community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu
use: [network]
settings:
node_type: v2-8
tf_version: 2.10.0
disable_public_ips: false
preemptible: true
preserve_tpu: false
- id: tpu_partition
source: ./community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [tpu_nodeset]
settings:
partition_name: tpu
Name | Version |
---|---|
terraform | >= 1.3 |
No providers.
No modules.
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
accelerator_config | Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details. | object({ |
{ |
no |
data_disks | The data disks to include in the TPU node | list(string) |
[] |
no |
disable_public_ips | If set to false. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. | bool |
true |
no |
docker_image | The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-4-tf-<var.tf_version> | string |
null |
no |
name | Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets. |
string |
n/a | yes |
node_count_dynamic_max | Maximum number of auto-scaling nodes allowed in this partition. | number |
5 |
no |
node_count_static | Number of nodes to be statically created. | number |
0 |
no |
node_type | Specify a node type to base the vm configuration upon it. | string |
n/a | yes |
preemptible | Should use preemptibles to burst. | bool |
false |
no |
preserve_tpu | Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted | bool |
false |
no |
project_id | Project ID to create resources in. | string |
n/a | yes |
reserved | Specify whether TPU-vms in this nodeset are created under a reservation. | bool |
false |
no |
service_account | Service account to attach to the TPU-vm. If none is given, the default service account and scopes will be used. | object({ |
null |
no |
subnetwork_self_link | The name of the subnetwork to attach the TPU-vm of this nodeset to. | string |
n/a | yes |
tf_version | Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. | string |
"2.14.0" |
no |
zone | Zone in which to create compute VMs. TPU partitions can only specify a single zone. | string |
n/a | yes |
Name | Description |
---|---|
nodeset_tpu | Details of the nodeset tpu. Typically used as input to schedmd-slurm-gcp-v6-partition . |