Skip to content

sidnb13/toolbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

toolbox

Containerization and setup tools for bootstrapping my ML research projects.

Backlog:

  • Skypilot integration to spin up instances from cli
  • Use a per-project config file for advanced configuration
  • integration with ray job queue and gpuboard project for observability
  • cleaner, less hardcoded defaults for dockerfiles, etc.
  • Rich devcontainer support
graph TB
    classDef controller fill:#2E5EAA,stroke:#173A7B,color:white
    classDef ray fill:#0194E2,stroke:#015B8C,color:white
    classDef container fill:#67B7D1,stroke:#4A8396,color:white
    classDef storage fill:#FFA500,stroke:#CC8400,color:white
    classDef cli fill:#4CAF50,stroke:#357935,color:white

    Storage[(Remote Storage<br>FUSE/S3)]
    Controller[Controller Service<br>Job Monitor & Resource Manager]:::controller
    CLI[Local CLI Tool]:::cli

    subgraph Instance1
        I1_Ray[Ray Cluster]:::ray
        I1_C1[Container 1]:::container
        I1_C2[Container 2]:::container
    end

    subgraph Instance2
        I2_Ray[Ray Cluster]:::ray
        I2_C1[Container 1]:::container
    end

    subgraph InstanceN
        IN_Ray[Ray Cluster]:::ray
        IN_C1[Container 1]:::container
        IN_C2[Container 2]:::container
    end

    CLI -->|Commands & Monitoring| Controller
    Controller -->|Monitor/Control| Instance1
    Controller -->|Monitor/Control| Instance2
    Controller -->|Monitor/Control| InstanceN

    I1_C1 -->|Submit Jobs| I1_Ray
    I1_C2 -->|Submit Jobs| I1_Ray

    I2_C1 -->|Submit Jobs| I2_Ray

    IN_C1 -->|Submit Jobs| IN_Ray
    IN_C2 -->|Submit Jobs| IN_Ray

    Instance1 -.->|Data| Storage
    Instance2 -.->|Data| Storage
    InstanceN -.->|Data| Storage

    Controller -.->|Job Status & Metrics| CLI
Loading