Skip to content

Conversation

AidenYu1673
Copy link
Contributor

@AidenYu1673 AidenYu1673 commented Oct 15, 2025

Description

This PR refactors the maxtext_configs_aot DAG by splitting it into two more focused DAGs:

  1. maxtext_configs_aot: Now exclusively handles all TPU configuration tests.
  2. maxtext_configs_aot_gpu: A new DAG dedicated to running GPU AOT tests.

Why is this change being made?

  • Isolation: Separates TPU and GPU test runs, allowing them to be triggered, monitored, and re-run independently.

Validation Summary

This refactor was tested in the ml-automation-solutions-dev environment. The TPU tests have been validated, while the GPU tests will be addressed later according to team priorities.

Test Results

  • TPU DAG (maxtext_configs_aot): Success ✅

    • All test cases in the refactored TPU DAG passed successfully.
  • GPU DAG ( maxtext_configs_aot_gpu): Failed (Deprioritized) ⚠️

    • The new, dedicated GPU DAG failed during its run.
    • Next Steps: Since the team's current focus is on TPU stability, this failure is acknowledged.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run one-shot tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Separate GPU AOT tests from the main AOT DAG
@andrewyct andrewyct merged commit 80244e6 into GoogleCloudPlatform:master Oct 16, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants