Add Cloud Composer Vertex AI Integration DAG #605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

olex-snk wants to merge 18 commits into master from osaienko/kubeflow_with_airflow_integration_demo

Collaborator

olex-snk commented May 19, 2025

This pull request introduces an example of Cloud Composer DAG that orchestrates a Vertex AI pipeline, including data loading to BigQuery, triggering a Vertex AI pipeline, and managing the pipeline job lifecycle.


          added_cloud_composer_vertex_ai_integration_dag

7b1c177

olex-snk marked this pull request as draft

May 19, 2025 12:01

olex-snk added 5 commits

May 19, 2025 16:54


          reformatted and fixed pylint errors

9a8a638


          added params template

966213b


          refactored

b3ea16f


          updated Readme file

2b32f34


          style fixes

7bc887a

olex-snk requested review from takumiohym and removed request for takumiohym

May 22, 2025 15:16


          wip-adding-solution-notebook

6ac17ac

review-notebook-app bot commented May 30, 2025

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

olex-snk added 4 commits

June 3, 2025 17:35


          moved_DAG_to_examples

99ed4ae


          added readme for dag example

6b1418b


          fixed solution description

ce84d9a

wip

b4e01a4

olex-snk changed the title ~~added_cloud_composer_vertex_ai_integration_dag~~ WIP: Add Cloud Composer Vertex AI Integration DAG

olex-snk added 5 commits

June 6, 2025 11:47


          WIP_cleaning_and_refactoring

6e2ada8


          WIP_refactoring

647071d


          refactored as a notebook solution

6781cf7


          Merge branch 'master' into osaienko/kubeflow_with_airflow_integration…

e3952aa

…_demo


          fixed typos

2575f23

olex-snk requested review from kylesteckler, sanjanalreddy and takumiohym

June 11, 2025 11:16

olex-snk self-assigned this

olex-snk marked this pull request as ready for review

June 11, 2025 11:22

olex-snk changed the title ~~WIP: Add Cloud Composer Vertex AI Integration DAG~~ Add Cloud Composer Vertex AI Integration DAG

takumiohym reviewed

View reviewed changes

...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb

		@@ -0,0 +1,443 @@
		{

Collaborator

takumiohym Jun 17, 2025 •

edited

Loading

Regarding Prerequisites:

You can delete these items since we can assume they are already done in the ASL environment:

A Google Cloud Project with billing enabled.
Vertex AI API enabled in your GCP project.
BigQuery API enabled in your GCP project.

Also, could you make this a separate cell and write a step by step guide in this notebook? If additional IAM setup is required, write the command in the setup script.

A Cloud Composer environment provisioned in your GCP project. (This notebook assumes the Cloud Composer instance is already created by following the instructions covered in the Run an Apache Airflow DAG in Cloud Composer. If you haven't run it, please create Cloud Composer instance using above instructions.)

Regarding the yaml file, I think it's also better to make it a separate step by step guide and write a command to 1) create a bucket if not exist, 2) run the asl-ml-immersion/notebooks/kubeflow_pipelines/pipelines/solutions/kfp_pipeline_vertex_lightweight.ipynb file, 3) and copy the yaml file from the solution directory to the GCS bucket.

A compiled Kubeflow Pipeline YAML file uploaded to a GCS bucket (e.g., gs://your-bucket/covertype_kfp_pipeline.yaml). This file should define all the steps of your Vertex AI Pipeline. its recommented to use Lab "Continuous Training with Kubeflow Pipeline and Vertex AI" from "asl-ml-immersion/notebooks/kubeflow_pipelines/pipelines/solutions/kfp_pipeline_vertex_lightweight.ipynb" notebook to create "covertype_kfp_pipeline.yaml"

1) Create bucket.

PROJECT = !(gcloud config get-value core/project)
PROJECT = PROJECT[0]
BUCKET = PROJECT  # defaults to PROJECT

os.environ["BUCKET"] = BUCKET

%%bash
exists=$(gsutil ls -d | grep -w gs://${BUCKET}/)

if [ -n "$exists" ]; then
   echo -e "Bucket gs://${BUCKET} already exists."
    
else
   echo "Creating a new GCS bucket."
   gsutil mb -l ${REGION} gs://${BUCKET}
   echo -e "\nHere are your current buckets:"
   gsutil ls
fi

3) copy the yaml file

!gsutil cp ../../../pipelines/solutions/covertype_kfp_pipeline.yaml gs://$BUCKET

Also, in Setup and Configuration:

1) GCS_VERTEX_AI_PIPELINE_YAML and GCS_TRAIN_DATASET_PATH can be prefilled with the bucket name created above.

It seems the pipeline fails if BIGQUERY_DATASET_ID doesn't exist. Please add a step to create the dataset with bq mk command.

2) the IAM section can be removed. If additional IAM is required, add it to the setup script.

Reply via ReviewNB

...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb

		@@ -0,0 +1,443 @@
		{

Collaborator

takumiohym Jun 17, 2025 •

edited

Loading

It seems this dag doesn't contain tasks to create validation dataset csv?

Reply via ReviewNB

...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb Show resolved Hide resolved

...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb

		@@ -0,0 +1,443 @@
		{

Collaborator

takumiohym Jun 17, 2025 •

edited

Loading

Explain where to find the dag bucket path, or explicitly import the python file using gcloud composer environments storage dags import command.

Reply via ReviewNB

...low_pipelines/integration/cloud_composer/solutions/cloud_composer_orchestration_vertex.ipynb Show resolved Hide resolved

olex-snk added 2 commits

June 17, 2025 22:36


          updated license

9385c42


          addded_dags_folder_creation

74bcb7d

takumiohym requested changes

View reviewed changes

notebooks/kubeflow_pipelines/integration/cloud_composer/README.md

+                  * **Trigger**: Executes once `start_vertex_ai_pipeline` successfully completes.
+.  **`delete_vertex_ai_pipeline_job`**:
+                  * **Operator**: `DeletePipelineJobOperator`

Collaborator

takumiohym Jun 18, 2025

I think deleting the pipeline is not necessary. Vertex AI pipeline is a serverless service and the resource is automatically shut down after the execution.
This deletion step seems to be deleting the job record (not resource) from the pipeline history, which is not ideal for logging purpose.

olex-snk marked this pull request as draft

July 15, 2025 16:29

takumiohym added the new label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new