You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The title of this issue needs to be improved but the thought here is that the current PyTorch CI build and test pipeline assumes that it is NOT running from inside a container; either a VM or dedicated host. When we tried in 2024H1 to migrate to ARC a container based runner autoscaler backed by Kubernetes this caused us some issues as we then needed to also support a multi-level nested container pipeline as scripts in PyTorch assumed they can just run docker build and docker run as part of the build pipeline.
We had to use things like DIND at multiple nested container levels causing us to have to write many workaround scripts to support this effort.
The goal of this issue is to discuss how we can decouple the assumption that a job could run docker build|run and move into a more GHA native way to build and run pytorch containers for the build and test pipelines allowing us to more easily adopt Container based self-hosted runner autoscalers.