-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add Cloud Composer Vertex AI Integration DAG #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,443 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding Prerequisites:
You can delete these items since we can assume they are already done in the ASL environment:
- A Google Cloud Project with billing enabled.
- Vertex AI API enabled in your GCP project.
- BigQuery API enabled in your GCP project.
Also, could you make this a separate cell and write a step by step guide in this notebook? If additional IAM setup is required, write the command in the setup script.
- A Cloud Composer environment provisioned in your GCP project. (This notebook assumes the Cloud Composer instance is already created by following the instructions covered in the Run an Apache Airflow DAG in Cloud Composer. If you haven't run it, please create Cloud Composer instance using above instructions.)
Regarding the yaml file, I think it's also better to make it a separate step by step guide and write a command to 1) create a bucket if not exist, 2) run the asl-ml-immersion/notebooks/kubeflow_pipelines/pipelines/solutions/kfp_pipeline_vertex_lightweight.ipynb
file, 3) and copy the yaml file from the solution directory to the GCS bucket.
- A compiled Kubeflow Pipeline YAML file uploaded to a GCS bucket (e.g.,
gs://your-bucket/covertype_kfp_pipeline.yaml
). This file should define all the steps of your Vertex AI Pipeline. its recommented to use Lab "Continuous Training with Kubeflow Pipeline and Vertex AI" from "asl-ml-immersion/notebooks/kubeflow_pipelines/pipelines/solutions/kfp_pipeline_vertex_lightweight.ipynb" notebook to create "covertype_kfp_pipeline.yaml"
1) Create bucket.
PROJECT = !(gcloud config get-value core/project) PROJECT = PROJECT[0] BUCKET = PROJECT # defaults to PROJECT os.environ["BUCKET"] = BUCKET
%%bash exists=$(gsutil ls -d | grep -w gs://${BUCKET}/) if [ -n "$exists" ]; then echo -e "Bucket gs://${BUCKET} already exists." else echo "Creating a new GCS bucket." gsutil mb -l ${REGION} gs://${BUCKET} echo -e "\nHere are your current buckets:" gsutil ls fi
3) copy the yaml file
!gsutil cp ../../../pipelines/solutions/covertype_kfp_pipeline.yaml gs://$BUCKET
Also, in Setup and Configuration:
1) GCS_VERTEX_AI_PIPELINE_YAML
and GCS_TRAIN_DATASET_PATH
can be prefilled with the bucket name created above.
It seems the pipeline fails if BIGQUERY_DATASET_ID
doesn't exist. Please add a step to create the dataset with bq mk
command.
2) the IAM section can be removed. If additional IAM is required, add it to the setup script.
Reply via ReviewNB
@@ -0,0 +1,443 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb
Show resolved
Hide resolved
@@ -0,0 +1,443 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain where to find the dag bucket path, or explicitly import the python file using gcloud composer environments storage dags import
command.
Reply via ReviewNB
...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb
Show resolved
Hide resolved
* **Trigger**: Executes once `start_vertex_ai_pipeline` successfully completes. | ||
|
||
5. **`delete_vertex_ai_pipeline_job`**: | ||
* **Operator**: `DeletePipelineJobOperator` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think deleting the pipeline is not necessary. Vertex AI pipeline is a serverless service and the resource is automatically shut down after the execution.
This deletion step seems to be deleting the job record (not resource) from the pipeline history, which is not ideal for logging purpose.
This pull request introduces an example of Cloud Composer DAG that orchestrates a Vertex AI pipeline, including data loading to BigQuery, triggering a Vertex AI pipeline, and managing the pipeline job lifecycle.