Skip to content

Commit

Permalink
Use TensorBoard notebook extension
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 683778406
  • Loading branch information
Minwoo Park authored and copybara-github committed Oct 10, 2024
1 parent df55634 commit e30e042
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 47 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -132,10 +132,8 @@
"# @markdown 3. For serving, **[click here](https://console.cloud.google.com/iam-admin/quotas?location=us-central1&metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_l4_gpus)** to check if your project already has the required 1 L4 GPU in the us-central1 region. If yes, then run this notebook in the us-central1 region. If you need more L4 GPUs for your project, then you can follow [these instructions](https://cloud.google.com/docs/quotas/view-manage#viewing_your_quota_console) to request more. Alternatively, if you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus).\n",
"\n",
"# @markdown > | Machine Type | Accelerator Type | Recommended Regions |\n",
"# @markdown | ----------- | ----------- | ----------- | \n",
"# @markdown | ----------- | ----------- | ----------- |\n",
"# @markdown | a2-ultragpu-1g | 1 NVIDIA_A100_80GB | us-central1, us-east4, europe-west4, asia-southeast1, us-east4 |\n",
"# @markdown | a3-highgpu-2g | 2 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
"# @markdown | a3-highgpu-4g | 4 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
"# @markdown | a3-highgpu-8g | 8 NVIDIA_H100_80GB | us-central1, us-west1, europe-west4, asia-southeast1 |\n",
"\n",
"# @markdown 4. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
Expand Down Expand Up @@ -180,7 +178,7 @@
"# Cloud Storage bucket for storing the experiment artifacts.\n",
"# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
"# prefer using your own GCS bucket, change the value yourself below.\n",
"now = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
"\n",
"if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
Expand Down Expand Up @@ -582,14 +580,22 @@
"outputs": [],
"source": [
"# @title Run TensorBoard\n",
"# @markdown This section shows how to launch TensorBoard in a [Cloud Shell](https://cloud.google.com/shell/docs).\n",
"# @markdown 1. Click the Cloud Shell icon(![terminal](https://github.com/google/material-design-icons/blob/master/png/action/terminal/materialicons/24dp/1x/baseline_terminal_black_24dp.png?raw=true)) on the top right to open the Cloud Shell.\n",
"# @markdown 2. Copy the `tensorboard` command shown below by running this cell.\n",
"# @markdown 3. Paste and run the command in the Cloud Shell to launch TensorBoard.\n",
"# @markdown 4. Once the command runs (You may have to click `Authorize` if prompted), click the link starting with `http://localhost`.\n",
"\n",
"# @markdown This section launches TensorBoard and displays it. You can re-run the cell to display an updated information about the training job.\n",
"# @markdown See the link to the training job in the above cell to see the status of the Custom Training Job.\n",
"# @markdown Note: You may need to wait around 10 minutes after the job starts in order for the TensorBoard logs to be written to the GCS bucket.\n",
"print(f\"Command to copy: tensorboard --logdir {base_output_dir}/logs\")\n"
"\n",
"now = datetime.datetime.now(tz=datetime.timezone.utc)\n",
"\n",
"if train_job.end_time is not None:\n",
" min_since_end = int((now - train_job.end_time).total_seconds() // 60)\n",
" print(f\"Training Job finished {min_since_end} minutes ago.\")\n",
"\n",
"if train_job.has_failed:\n",
" print(\n",
" \"The job has failed. See the link to the training job in the above cell to see the logs.\"\n",
" )\n",
"\n",
"%tensorboard --logdir {base_output_dir}/logs"
]
},
{
Expand Down Expand Up @@ -819,11 +825,12 @@
"# endpoint = aiplatform.Endpoint(aip_endpoint_name)\n",
"\n",
"prompt = \"What is a car?\" # @param {type: \"string\"}\n",
"# @markdown If you encounter the issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
"# @markdown If you encounter an issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
"max_tokens = 50 # @param {type:\"integer\"}\n",
"temperature = 1.0 # @param {type:\"number\"}\n",
"top_p = 1.0 # @param {type:\"number\"}\n",
"top_k = 1 # @param {type:\"integer\"}\n",
"# @markdown Set `raw_response` to `True` to obtain the raw model output. Set `raw_response` to `False` to apply additional formatting in the structure of `\"Prompt:\\n{prompt.strip()}\\nOutput:\\n{output}\"`.\n",
"raw_response = False # @param {type:\"boolean\"}\n",
"\n",
"# Overrides parameters for inferences.\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,10 +129,8 @@
"# @markdown 3. For serving, **[click here](https://console.cloud.google.com/iam-admin/quotas?location=us-central1&metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_l4_gpus)** to check if your project already has the required 1 L4 GPU in the us-central1 region. If yes, then run this notebook in the us-central1 region. If you need more L4 GPUs for your project, then you can follow [these instructions](https://cloud.google.com/docs/quotas/view-manage#viewing_your_quota_console) to request more. Alternatively, if you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus).\n",
"\n",
"# @markdown > | Machine Type | Accelerator Type | Recommended Regions |\n",
"# @markdown | ----------- | ----------- | ----------- | \n",
"# @markdown | ----------- | ----------- | ----------- |\n",
"# @markdown | a2-ultragpu-1g | 1 NVIDIA_A100_80GB | us-central1, us-east4, europe-west4, asia-southeast1, us-east4 |\n",
"# @markdown | a3-highgpu-2g | 2 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
"# @markdown | a3-highgpu-4g | 4 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
"# @markdown | a3-highgpu-8g | 8 NVIDIA_H100_80GB | us-central1, us-west1, europe-west4, asia-southeast1 |\n",
"\n",
"# @markdown 4. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
Expand Down Expand Up @@ -177,7 +175,7 @@
"# Cloud Storage bucket for storing the experiment artifacts.\n",
"# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
"# prefer using your own GCS bucket, change the value yourself below.\n",
"now = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
"\n",
"if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
Expand Down Expand Up @@ -642,19 +640,28 @@
"outputs": [],
"source": [
"# @title Run TensorBoard\n",
"# @markdown This section shows how to launch TensorBoard in a [Cloud Shell](https://cloud.google.com/shell/docs).\n",
"# @markdown 1. Click the Cloud Shell icon(![terminal](https://github.com/google/material-design-icons/blob/master/png/action/terminal/materialicons/24dp/1x/baseline_terminal_black_24dp.png?raw=true)) on the top right to open the Cloud Shell.\n",
"# @markdown 2. Copy the `tensorboard` command shown below by running this cell.\n",
"# @markdown 3. Paste and run the command in the Cloud Shell to launch TensorBoard.\n",
"# @markdown 4. Once the command runs (You may have to click `Authorize` if prompted), click the link starting with `http://localhost`.\n",
"\n",
"# @markdown This section launches TensorBoard and displays it. You can re-run the cell to display an updated information about the training job.\n",
"# @markdown See the link to the training job in the above cell to see the status of the Custom Training Job.\n",
"# @markdown Note: You may need to wait around 10 minutes after the job starts in order for the TensorBoard logs to be written to the GCS bucket.\n",
"print(f\"Command to copy: tensorboard --logdir {base_output_dir}/logs\")\n"
"\n",
"now = datetime.datetime.now(tz=datetime.timezone.utc)\n",
"\n",
"if train_job.end_time is not None:\n",
" min_since_end = int((now - train_job.end_time).total_seconds() // 60)\n",
" print(f\"Training Job finished {min_since_end} minutes ago.\")\n",
"\n",
"if train_job.has_failed:\n",
" print(\n",
" \"The job has failed. See the link to the training job in the above cell to see the logs.\"\n",
" )\n",
"\n",
"%tensorboard --logdir {base_output_dir}/logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"language": "python",
"metadata": {
"cellView": "form",
"id": "qmHW6m8xG_4U"
Expand Down Expand Up @@ -860,6 +867,7 @@
{
"cell_type": "code",
"execution_count": null,
"language": "python",
"metadata": {
"cellView": "form",
"id": "2UYUNn60G_4U"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,8 @@
"# @markdown 3. For serving, **[click here](https://console.cloud.google.com/iam-admin/quotas?location=us-central1&metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_l4_gpus)** to check if your project already has the required 1 L4 GPU in the us-central1 region. If yes, then run this notebook in the us-central1 region. If you need more L4 GPUs for your project, then you can follow [these instructions](https://cloud.google.com/docs/quotas/view-manage#viewing_your_quota_console) to request more. Alternatively, if you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus).\n",
"\n",
"# @markdown > | Machine Type | Accelerator Type | Recommended Regions |\n",
"# @markdown | ----------- | ----------- | ----------- | \n",
"# @markdown | ----------- | ----------- | ----------- |\n",
"# @markdown | a2-ultragpu-1g | 1 NVIDIA_A100_80GB | us-central1, us-east4, europe-west4, asia-southeast1, us-east4 |\n",
"# @markdown | a3-highgpu-2g | 2 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
"# @markdown | a3-highgpu-4g | 4 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
"# @markdown | a3-highgpu-8g | 8 NVIDIA_H100_80GB | us-central1, us-west1, europe-west4, asia-southeast1 |\n",
"\n",
"# @markdown 4. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
Expand Down Expand Up @@ -174,7 +172,7 @@
"# Cloud Storage bucket for storing the experiment artifacts.\n",
"# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
"# prefer using your own GCS bucket, change the value yourself below.\n",
"now = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
"\n",
"if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
Expand Down Expand Up @@ -559,14 +557,22 @@
"outputs": [],
"source": [
"# @title Run TensorBoard\n",
"# @markdown This section shows how to launch TensorBoard in a [Cloud Shell](https://cloud.google.com/shell/docs).\n",
"# @markdown 1. Click the Cloud Shell icon(![terminal](https://github.com/google/material-design-icons/blob/master/png/action/terminal/materialicons/24dp/1x/baseline_terminal_black_24dp.png?raw=true)) on the top right to open the Cloud Shell.\n",
"# @markdown 2. Copy the `tensorboard` command shown below by running this cell.\n",
"# @markdown 3. Paste and run the command in the Cloud Shell to launch TensorBoard.\n",
"# @markdown 4. Once the command runs (You may have to click `Authorize` if prompted), click the link starting with `http://localhost`.\n",
"\n",
"# @markdown This section launches TensorBoard and displays it. You can re-run the cell to display an updated information about the training job.\n",
"# @markdown See the link to the training job in the above cell to see the status of the Custom Training Job.\n",
"# @markdown Note: You may need to wait around 10 minutes after the job starts in order for the TensorBoard logs to be written to the GCS bucket.\n",
"print(f\"Command to copy: tensorboard --logdir {base_output_dir}/logs\")\n"
"\n",
"now = datetime.datetime.now(tz=datetime.timezone.utc)\n",
"\n",
"if train_job.end_time is not None:\n",
" min_since_end = int((now - train_job.end_time).total_seconds() // 60)\n",
" print(f\"Training Job finished {min_since_end} minutes ago.\")\n",
"\n",
"if train_job.has_failed:\n",
" print(\n",
" \"The job has failed. See the link to the training job in the above cell to see the logs.\"\n",
" )\n",
"\n",
"%tensorboard --logdir {base_output_dir}/logs"
]
},
{
Expand Down Expand Up @@ -777,11 +783,12 @@
"# endpoint = aiplatform.Endpoint(aip_endpoint_name)\n",
"\n",
"prompt = \"What is a car?\" # @param {type: \"string\"}\n",
"# @markdown If you encounter the issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
"# @markdown If you encounter an issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
"max_tokens = 50 # @param {type:\"integer\"}\n",
"temperature = 1.0 # @param {type:\"number\"}\n",
"top_p = 1.0 # @param {type:\"number\"}\n",
"top_k = 1 # @param {type:\"integer\"}\n",
"# @markdown Set `raw_response` to `True` to obtain the raw model output. Set `raw_response` to `False` to apply additional formatting in the structure of `\"Prompt:\\n{prompt.strip()}\\nOutput:\\n{output}\"`.\n",
"raw_response = False # @param {type:\"boolean\"}\n",
"\n",
"# Overrides parameters for inferences.\n",
Expand Down
Loading

0 comments on commit e30e042

Please sign in to comment.