Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Cross_Endpoint_Routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Each deployed function is wrapped with a production wrapper that:

Cross-endpoint calls require authentication. Flash handles this automatically:

1. The deploying environment sets `RUNPOD_API_KEY` as an env var on each endpoint
1. Flash injects `RUNPOD_API_KEY` as an env var on endpoints where `makes_remote_calls=True`
2. At runtime, the wrapper reads `RUNPOD_API_KEY` from the environment
3. All outbound calls to sibling endpoints include the API key

Expand Down
4 changes: 2 additions & 2 deletions docs/Deployment_Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ flash deploy --env production
│ │ ├── If new: create endpoint via GraphQL API
│ │ ├── If exists + config drift: update endpoint
│ │ └── If exists + no drift: skip
│ └── Set environment variables on each endpoint
│ └── Set env vars on each endpoint (explicit env={} + system vars like RUNPOD_API_KEY)
├── 4. Register with State Manager
│ └── Store endpoint IDs for cross-endpoint routing
Expand Down Expand Up @@ -186,7 +186,7 @@ When deploying to an environment that already has endpoints, Flash compares the

When `flash deploy` provisions endpoints:

1. Each endpoint gets `RUNPOD_API_KEY` injected as an env var
1. Endpoints with `makes_remote_calls=True` get `RUNPOD_API_KEY` injected automatically
2. Each endpoint gets the `flash_manifest.json` included in its artifact
3. The State Manager stores `{environment_id, resource_name} -> endpoint_id`
4. At runtime, the `ServiceRegistry` uses the manifest + State Manager to route calls
Expand Down
2 changes: 1 addition & 1 deletion docs/Flash_SDK_Reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Endpoint(
| `accelerate_downloads` | `bool` | `True` | Enable accelerated downloads. |
| `volume` | `NetworkVolume` | `None` | Network volume for persistent storage. |
| `datacenter` | `DataCenter` | `EU_RO_1` | Preferred datacenter. |
| `env` | `dict[str, str]` | `None` | Environment variables for the endpoint. |
| `env` | `dict[str, str]` | `None` | Environment variables for the deployed endpoint. This is the only way to pass env vars to deployed workers; `.env` files are not carried to endpoints. |
| `gpu_count` | `int` | `1` | GPUs per worker. |
| `execution_timeout_ms` | `int` | `0` | Max execution time in ms. 0 = no limit. |
| `flashboot` | `bool` | `True` | Enable Flashboot fast startup. |
Expand Down
2 changes: 1 addition & 1 deletion docs/LoadBalancer_Runtime_Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ graph TD
- Entrypoint: Loads manifest and starts FastAPI server on port 8000
- Runpod exposes this via HTTPS endpoint URL
- Health check: Runpod polls `/ping` every 30 seconds
- Environment: `RUNPOD_API_KEY` injected for outgoing cross-endpoint calls
- Environment: `FLASH_MODULE_PATH` injected automatically, plus any explicit `env={}` vars

### What Gets Deployed

Expand Down
11 changes: 5 additions & 6 deletions docs/Resource_Config_Drift_Detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,15 +82,14 @@ Changes to these `Endpoint` parameters trigger drift detection and endpoint upda
| `flashboot=` | `flashboot` | Yes |
| `volume=` | `networkVolume` | Yes |
| `template=` | `template` | Partially (template is a runtime field for some resource types) |
| `env=` | `env` | **No** (excluded from hash) |
| `env=` | `env` | **Conditionally** -- included when non-None for base `ServerlessResource` (via `exclude_none`); explicitly excluded for `CpuServerlessEndpoint` |
| `name=` | `name` | **No** (identity only) |

### Why env is Excluded
### How env Affects the Hash

Environment variables are excluded from the hash because:
- Different processes may load `.env` files with different values
- Changing env vars shouldn't trigger a full endpoint redeploy
- Env vars are updated separately via the API
For the base `ServerlessResource`, `env` defaults to `None` and is excluded via `exclude_none=True`. When a user sets `env={"HF_TOKEN": "..."}`, it IS included in the hash and triggers drift detection. `CpuServerlessEndpoint` explicitly excludes `env` from its hash by only hashing a fixed set of CPU-relevant fields.

`.env` files only populate `os.environ` for CLI and local dev; they are not carried to deployed endpoints.

## CPU LoadBalancer Special Case

Expand Down
144 changes: 144 additions & 0 deletions docs/plans/2026-03-05-env-separation-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Design: Separate .env from Resource Deployment Env

**Date:** 2026-03-05
**Status:** Approved
**Branch base:** `deanq/ae-1549-env-vars-from-cwd-first`

## Problem

`ServerlessResource.env` defaults to `get_env_vars()`, which reads the entire `.env` file via `dotenv_values()`. Every key-value pair from `.env` (HF_TOKEN, WANDB_API_KEY, dev-only vars, etc.) gets baked into the manifest and sent to RunPod's `saveTemplate` mutation -- even if the user only intended those vars for local CLI usage.

This causes:
- Platform-injected vars (`PORT`, `PORT_HEALTH`) overwritten on template updates
- False config drift from runtime var injection into `self.env`
- User confusion about what actually reaches deployed workers
- The entire class of bugs addressed in the ae-1549 branch

## Solution

Clean separation between two concerns:

1. **`.env` = CLI/runtime only.** Populates `os.environ` via `load_dotenv()` at import time. Used by `get_api_key()`, CLI commands, local dev server. Never auto-carried to deployed endpoints.

2. **Resource `env={}` = explicit deploy-time vars.** Users declare exactly what goes to each endpoint. Flash injects runtime vars (`RUNPOD_API_KEY`, `FLASH_MODULE_PATH`) into `template.env` separately via existing `_inject_runtime_template_vars()`.

3. **Deploy-time env preview table.** Before provisioning, render a Rich table per resource showing all env vars (user-declared + flash-injected). Secret masking applied.

## Detailed Changes

### Core: `env` field default

- `ServerlessResource.env` default changes from `Field(default_factory=get_env_vars)` to `Field(default=None)`
- Delete `get_env_vars()` function in `serverless.py`
- Delete `EnvironmentVars` class and `environment.py` file entirely
- `load_dotenv(find_dotenv(usecwd=True))` in `__init__.py` stays unchanged

### Manifest pipeline

- `_extract_deployment_config` in `manifest.py`: reads `resource.env` as-is (now `None` or explicit dict). If `None` or otherwise falsy, the manifest omits the `"env"` key.
- Remove the existing `RUNPOD_API_KEY` stripping logic -- it won't be in user env anymore, and runtime injection handles it via `_inject_runtime_template_vars()`.

### Template creation

- `serverless.py:_create_new_template`: change `env=KeyValuePair.from_dict(self.env or get_env_vars())` to `env=KeyValuePair.from_dict(self.env or {})`

### Deploy-time env preview

New functionality in deploy command (either in `deploy.py` or a new `cli/utils/env_preview.py`):

- Collect final env per resource: user-declared env + flash-injected runtime vars
- Render Rich table before proceeding with deployment
- Flash-injected vars labeled with `(injected by flash)` suffix
- Mask values where key matches `KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL` pattern: show first 6 chars + `****`
- Show all other values in full

Example output:
```
Resource: my-gpu
Environment Variables:
HF_TOKEN = hf_abc...****
MODEL_ID = llama-3
RUNPOD_API_KEY = rp_***...**** (injected by flash)
FLASH_MODULE_PATH = app.model (injected by flash)

Resource: my-cpu
Environment Variables:
(none)
```

### Unchanged

- `load_dotenv()` in `__init__.py` -- `os.environ` population for CLI
- `get_api_key()` in `credentials.py` -- credential resolution (env -> credentials.toml)
- `_inject_runtime_template_vars()` -- runtime var injection into `template.env`
- `skip_env` logic in `update()` -- platform var preservation (`PORT`, `PORT_HEALTH`)
- `flash env` CLI -- unrelated (deployment environments)
- ae-1549 branch fixes: `_inject_template_env()`, `_inject_runtime_template_vars()`, `skip_env`

### Breaking change strategy

Hard break, no deprecation period. The deploy-time preview table communicates the change clearly -- users see exactly what env vars go to each endpoint. If `env` is empty and `.env` exists, the preview shows only flash-injected vars, making it obvious no user vars are being sent.

## Files to Modify

### Flash repo

| File | Action |
|------|--------|
| `src/runpod_flash/core/resources/environment.py` | Delete entirely |
| `src/runpod_flash/core/resources/serverless.py` | Remove `get_env_vars()`, change `env` default to `None`, update `_create_new_template` |
| `src/runpod_flash/cli/commands/build_utils/manifest.py` | Remove `RUNPOD_API_KEY` stripping |
| `src/runpod_flash/cli/commands/deploy.py` or new `cli/utils/env_preview.py` | Deploy-time env preview table |
| `tests/unit/test_dotenv_loading.py` | Remove or rewrite |
| Tests importing `get_env_vars`/`EnvironmentVars` | Update |
| New tests for env preview | Mask logic, rendering, injected-var labeling |
| `docs/API_Key_Management.md` | Update to reflect runtime injection via `get_api_key()` |

### Flash-examples repo

| File | Action |
|------|--------|
| `CONTRIBUTING.md` | Clarify `.env` is for CLI auth, not endpoint env |
| `README.md` | Same clarification |
| `CLI-REFERENCE.md` | Update `.env` section |
| `docs/cli/commands.md` | Show explicit `env={}` for deploy-time vars |
| `docs/cli/workflows.md` | Update `.env` example section |
| `docs/cli/troubleshooting.md` | Update API key troubleshooting: `flash login` primary, `.env` for CLI only |
| `docs/cli/getting-started.md` | Update setup instructions |

## Secret Masking Strategy

Mask values where the key contains (case-insensitive): `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`.

Format: first 6 characters + `****`. All other values shown in full.

## User-Facing Example

Before (implicit):
```python
# .env
HF_TOKEN=hf_abc123
MODEL_ID=llama-3

# gpu_worker.py
@Endpoint(name="my-gpu", gpu=GpuGroup.ANY)
async def infer(prompt: str) -> dict:
...
# Both HF_TOKEN and MODEL_ID silently sent to endpoint
```

After (explicit):
```python
# .env
RUNPOD_API_KEY=rp_xxx # CLI/auth only, handled by flash login or get_api_key()

# gpu_worker.py
@Endpoint(
name="my-gpu",
gpu=GpuGroup.ANY,
env={"HF_TOKEN": os.environ["HF_TOKEN"], "MODEL_ID": "llama-3"},
)
async def infer(prompt: str) -> dict:
...
# Only declared vars sent to endpoint, visible in deploy preview
```
Loading
Loading