Skip to content

refactor: migrate .runpod directory to .flash#221

Open
deanq wants to merge 2 commits intomainfrom
deanq/ae-2257-use-dot-flash
Open

refactor: migrate .runpod directory to .flash#221
deanq wants to merge 2 commits intomainfrom
deanq/ae-2257-use-dot-flash

Conversation

@deanq
Copy link
Member

@deanq deanq commented Feb 25, 2026

Summary

  • Consolidate all local state under .flash/ directory, eliminating the split .runpod/ vs .flash/ paradigm
  • Update resource persistence (resources.pkl), container archive path, ignore patterns, skeleton templates, tests, and docs
  • No legacy/migration handling -- clean replacement per ticket scope

Closes AE-2257

Changed files

Source (4 files): resource_manager.py, run.py, preview.py, ignore.py
Config/templates (3 files): .gitignore, skeleton_template/.gitignore, skeleton_template/.flashignore
Tests (3 files): conftest.py, test_resource_manager.py, test_scanner.py
Docs (4 files): Flash_Deploy_Guide.md, flash-run.md, flash-undeploy.md, undeploy.py docstring

Test plan

  • make quality-check passes (format, lint, typecheck, tests, coverage)
  • 1235 tests pass, 72.96% coverage (threshold 65%)
  • No remaining .runpod directory references in codebase (verified via grep)
  • CI passes on GitHub

@deanq deanq changed the title refactor: migrate .runpod directory to .flash (AE-2257) refactor: migrate .runpod directory to .flash Feb 25, 2026
@deanq deanq requested a review from Copilot February 25, 2026 16:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates all Flash local state under the .flash/ directory, eliminating the previous split paradigm where some state was stored in .runpod/ and other state in .flash/. This is a clean refactoring that improves consistency and simplifies the directory structure for the project.

Changes:

  • Updated all source code references from .runpod/ to .flash/ for resource state files and container archives
  • Removed .runpod/ from ignore patterns across all configuration files and templates
  • Renamed test fixtures and updated test paths to reflect the new directory structure
  • Updated all documentation references to use .flash/resources.pkl instead of .runpod/resources.pkl

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/runpod_flash/core/resources/resource_manager.py Changed RUNPOD_FLASH_DIR constant from .runpod to .flash
src/runpod_flash/cli/commands/run.py Updated _RESOURCE_STATE_FILE path from .runpod/resources.pkl to .flash/resources.pkl
src/runpod_flash/cli/commands/preview.py Changed CONTAINER_ARCHIVE_PATH from /root/.runpod/artifact.tar.gz to /root/.flash/artifact.tar.gz
src/runpod_flash/cli/utils/ignore.py Removed .runpod/ from the always_ignore patterns list
src/runpod_flash/cli/commands/undeploy.py Updated docstring reference from .runpod/resources.pkl to .flash/resources.pkl
.gitignore Consolidated ignore patterns to only use .flash/ instead of separate .runpod/ and .flash/logs/ entries
src/runpod_flash/cli/utils/skeleton_template/.gitignore Removed .runpod/ from template ignore patterns
src/runpod_flash/cli/utils/skeleton_template/.flashignore Changed .runpod/ to .flash/ in flashignore patterns
tests/conftest.py Renamed worker_runpod_dir fixture to worker_flash_dir and updated all path references
tests/unit/resources/test_resource_manager.py Updated mock resource file path from .runpod/resources.pkl to .flash/resources.pkl
tests/unit/cli/commands/build_utils/test_scanner.py Removed test for .runpod directory exclusion (now redundant with existing .flash test)
src/runpod_flash/cli/docs/flash-undeploy.md Updated all documentation references from .runpod/resources.pkl to .flash/resources.pkl
src/runpod_flash/cli/docs/flash-run.md Updated documentation references from .runpod/resources.pkl to .flash/resources.pkl
docs/Flash_Deploy_Guide.md Updated all technical documentation references from .runpod/resources.pkl to .flash/resources.pkl

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@deanq deanq force-pushed the deanq/ae-2257-use-dot-flash branch from 93543f4 to 3fa85d8 Compare February 26, 2026 02:10
@runpod-Henrik
Copy link
Contributor

QA Report

Status: PASS
PR: #221 — refactor: migrate .runpod directory to .flash
Agent: flash-qa (PR mode)

Targeted Test Results

Area Tests Passed Failed Skipped
test_resource_manager.py 19 19 0 0
test_scanner.py 41 41 0 0
Total targeted 60 60 0 0

Full Suite Results

Mode Passed Failed Skipped Notes
Parallel (-n 4, non-serial) 1217 2 0 Pre-existing failures in test_class_execution_integration.py (not modified by PR)
Serial 24 10 1 Pre-existing failures in test_remote_concurrency.py (not modified by PR)

All 12 failures are in files NOT touched by this PR and reproduce on the base branch. CI passes across all 5 Python versions (3.10-3.14).

Migration Completeness

  • All .runpod/ directory path references migrated to .flash/
  • No hardcoded .runpod directory strings remain in src/, tests/, docs/, .gitignore, or .flashignore
  • All remaining .runpod substring matches are legitimate (API URLs like api.runpod.ai, console.runpod.io, docs.runpod.io, or Python module imports like from .runpod import ...)
  • Scanner test updated: test_exclude_runpod_directory removed, test_exclude_flash_directory already exists
  • ignore.py correctly removes .runpod/ from always-ignore list (.flash/ was already present)
  • Test fixtures renamed: worker_runpod_dir -> worker_flash_dir, isolate_resource_state_file updated
  • Skeleton template .flashignore and .gitignore updated
  • No backward compatibility for existing .runpod/resources.pkl (see note below)

PR Diff Analysis

Files changed: 14 (7 source, 3 docs, 2 templates, 2 test files)

What changed:

  • RUNPOD_FLASH_DIR constant in resource_manager.py: Path(".runpod") -> Path(".flash")
  • _RESOURCE_STATE_FILE in run.py: Path(".runpod") -> Path(".flash")
  • CONTAINER_ARCHIVE_PATH in preview.py: /root/.runpod/artifact.tar.gz -> /root/.flash/artifact.tar.gz
  • .runpod/ removed from always-ignore list in ignore.py (.flash/ already present)
  • .runpod/ removed from skeleton .gitignore and .flashignore
  • Docs updated: 4 references in Flash_Deploy_Guide.md, all flash-run.md and flash-undeploy.md references
  • test_exclude_runpod_directory removed from scanner tests (no longer applicable)
  • test_resource_manager.py updated: ".runpod" -> ".flash" in mock path

Backward compatibility note: There is no migration logic for users who have an existing .runpod/resources.pkl file. After upgrading, the ResourceManager will not find the old state file and will start fresh, losing tracked endpoint references. This means:

  • Users must re-run flash deploy to re-track endpoints (endpoints are NOT deleted, just the local tracking is lost)
  • Users can manually mv .runpod/resources.pkl .flash/resources.pkl to preserve state

This is acceptable for a development tool where re-deploy is cheap, but worth noting in release notes.

Recommendation

MERGE — Clean migration with no regressions. All directory path references are consistently updated. The lack of backward compatibility for .runpod/resources.pkl is a minor operational concern that should be mentioned in release notes.


Generated by flash-qa agent

@runpod-Henrik
Copy link
Contributor

QA Report — PR #221: Migrate .runpod directory to .flash

Agent: test-qa | Date: 2026-02-27 | Scope: ESCALATED (14 files, cross-cutting config path migration)


Quality Gate

Check Result
ruff format --check PASS (1290 files formatted)
ruff check PASS (all checks passed)

Test Results

Suite Passed Failed Skipped Result
Full (tests/ -n 4) 1239 14 1 See below
Changed files only 60 0 0 PASS

All 14 failures are pre-existing (not introduced by this PR):

Test File Failures Status
test_class_execution_integration.py 2 P4 known — _class_type / _constructor_args AttributeError
test_remote_concurrency.py 10 P4 known — asyncio.gather unhashable/not-awaitable TypeError
test_file_locking.py 2 Pre-existing flaky — platform-dependent file lock timing

Zero new failures introduced by this PR.


Flaky Test Check

test_resource_manager.py — 3/3 runs passed (19/19 tests each). No flakiness detected.


Coverage

Metric Value
Overall coverage 72.27%
Threshold 65%
Status PASS

No coverage regression from this PR.


PR Diff Analysis

Completeness Checklist

Item Status Notes
All .runpod/ dir refs replaced in src/ PASS Zero remaining .runpod/ path refs in source
All .runpod/ dir refs replaced in tests/ PASS Zero remaining .runpod/ path refs in tests
All .runpod/ dir refs replaced in docs PASS Flash_Deploy_Guide.md updated correctly
.gitignore updated PASS .runpod/ removed, .flash/ covers everything
Skeleton templates updated PASS Both .flashignore and .gitignore templates updated
ignore.py always-ignore list updated PASS .runpod/ entry removed (.flash/ already present)
resource_manager.py constant updated PASS RUNPOD_FLASH_DIR = Path(".flash")
run.py state file path updated PASS _RESOURCE_STATE_FILE = Path(".flash") / "resources.pkl"
preview.py container archive path updated PASS CONTAINER_ARCHIVE_PATH = "/root/.flash/artifact.tar.gz"
undeploy.py docstring updated PASS References .flash/resources.pkl
conftest.py fixture names/paths updated PASS worker_runpod_dir -> worker_flash_dir
Scanner test updated PASS test_exclude_runpod_directory removed (redundant with test_exclude_flash_directory)
Test resource manager paths updated PASS tmp_path / ".flash" / "resources.pkl"

Findings

  1. No migration/backward compatibility: The PR explicitly states "No legacy/migration handling." Users with existing .runpod/resources.pkl will lose their cached resource state. This is acceptable for a development-time cache (resources can be re-provisioned), but could cause a one-time re-deployment for users upgrading. This should be noted in release notes.

  2. Container archive path change (preview.py): Changed from /root/.runpod/artifact.tar.gz to /root/.flash/artifact.tar.gz. This is only used by flash deploy --preview (Docker Compose local preview). The flash-worker Docker images that handle production deployment do NOT reference this path — they use FLASH_* environment variables. No cross-repo impact.

  3. Remaining .runpod strings are all correct: All remaining matches are RunPod API URLs (api.runpod.ai, console.runpod.io, docs.runpod.io) or Python module imports (from runpod_flash.core.api.runpod). These are unrelated to the .runpod/ directory.

  4. Test removed (test_exclude_runpod_directory): This test verified that the scanner excluded .runpod/ directories. Since .runpod/ is no longer a project directory, this test is correctly removed. The equivalent test_exclude_flash_directory test remains and passes.

  5. No anti-patterns detected: No bare except, no print statements, no mutable defaults, no hardcoded secrets, no missing awaits.


Known Issue Check

Known Issue Affected?
P4: test_class_execution_integration.py failures Pre-existing, not related to PR
P4: test_remote_concurrency.py failures Pre-existing, not related to PR
test_file_locking.py flaky tests Pre-existing, not related to PR

Recommendation

APPROVE — Clean migration with complete coverage of all .runpod/ directory references. All changed tests pass consistently. No new failures, no coverage regression, no lint/format issues. The only consideration is documenting the lack of backward migration in release notes for users with existing .runpod/ state directories.

@deanq deanq force-pushed the deanq/ae-2257-use-dot-flash branch from d6e172d to af7dffc Compare March 2, 2026 06:44
Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #221: .runpod/.flash/ migration

Large refactor with significant overlap with PRs #220 and #235. Found three high-severity issues:


Bug 1 (HIGH): Authorization header sent to presigned S3 URLs — breaks upload AND download

app.pyupload_build() and download_tarball()

Both methods switched from hand-crafted requests calls (no Authorization) to get_authenticated_requests_session(), which injects Authorization: Bearer <key> as a default session header.

# OLD — no auth sent to S3 presigned URL
headers = {"User-Agent": get_user_agent(), "Content-Type": TARBALL_CONTENT_TYPE}
resp = requests.put(url, data=fh, headers=headers)

# NEW — auth header sent to S3 presigned URL
with get_authenticated_requests_session() as session:
    session.headers["Content-Type"] = TARBALL_CONTENT_TYPE
    resp = session.put(url, data=fh)  # sends Authorization: Bearer ... to S3

AWS S3 rejects requests with both query-string signature AND Authorization header → HTTP 400 InvalidArgument: "Only one auth mechanism allowed". Build uploads and artifact downloads will fail for all users.

Fix: Use a plain session (no auth) for presigned URL requests, or add include_auth=False parameter to get_authenticated_requests_session().


Bug 2 (HIGH): No migration for existing .runpod/resources.pkl — users lose all endpoint state

resource_manager.py

RUNPOD_FLASH_DIR = Path(".flash")  # was Path(".runpod")

No migration code. Existing .runpod/resources.pkl with tracked endpoints is silently orphaned. flash undeploy lists nothing. Endpoints keep running and billing on RunPod with no CLI management.

Fix: Add migration on first startup:

_LEGACY_STATE_FILE = Path(".runpod") / "resources.pkl"
if not RESOURCE_STATE_FILE.exists() and _LEGACY_STATE_FILE.exists():
    RUNPOD_FLASH_DIR.mkdir(parents=True, exist_ok=True)
    shutil.copy2(_LEGACY_STATE_FILE, RESOURCE_STATE_FILE)
    log.info("Migrated state from .runpod/ to .flash/")

Bug 3 (HIGH): Per-request API key propagation from LB to workers silently removed

Deleted: api_key_context.py, extract_api_key_middleware in lb_handler.py

The middleware extracted the caller's API key from Authorization: Bearer header and propagated it to downstream QB worker calls via ContextVar. Now removed — worker calls fall back to the server-process RUNPOD_API_KEY env var only.

In multi-tenant or API-key-scoped deployments, worker calls use the wrong key. When the server process has no RUNPOD_API_KEY, worker calls fail with 401 even if the client sent a valid key.

Fix: Either preserve the per-request key propagation, or document that LB now requires RUNPOD_API_KEY set server-side.


Bug 4 (MEDIUM): Container archive path renamed but Docker images not updated

preview.py

CONTAINER_ARCHIVE_PATH = "/root/.flash/artifact.tar.gz"  # was /root/.runpod/

If Flash worker Docker images still look for /root/.runpod/artifact.tar.gz, flash deploy --preview will silently fail — containers start but find no user code.

Fix: Verify and update Docker image entrypoints in sync with this PR.

deanq added 2 commits March 12, 2026 11:34
Rename all references from .runpod to .flash across the codebase:
- Update CLI commands, docs, and skeleton templates
- Update .gitignore and .flashignore patterns
- Update ResourceManager config directory path
- Update test fixtures and conftest helpers
- Remove obsolete scanner tests for deleted .runpod patterns
Rename all references from .runpod to .flash across the codebase:
- Update CLI commands, docs, and skeleton templates
- Update .gitignore and .flashignore patterns
- Update ResourceManager config directory path
- Update test fixtures and conftest helpers
- Remove obsolete scanner tests for deleted .runpod patterns
@deanq deanq force-pushed the deanq/ae-2257-use-dot-flash branch from 48b0653 to 3ba9612 Compare March 12, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants