Skip to content

hotfix(api): reduce validate-upload memory use on staging#224

Merged
vishpillai123 merged 5 commits intostagingfrom
hotfix/validation-upload-memory-staging
Mar 26, 2026
Merged

hotfix(api): reduce validate-upload memory use on staging#224
vishpillai123 merged 5 commits intostagingfrom
hotfix/validation-upload-memory-staging

Conversation

@chapmanhk
Copy link
Copy Markdown
Contributor

changes

  • StorageControl._run_validation_and_get_normalized_df: Stop using blob.open("r") (which led to a full in-memory read via _path_for_edvise_read). Download the unvalidated object with blob.download_to_filename() into a private temp CSV, run validate_file_reader on that path, and remove the file in a finally block.
  • StorageControl._write_dataframe_to_gcs_as_csv: Stop building the entire validated CSV in a StringIO and upload_from_string. Write to_csv to a temp file and upload_from_filename, then unlink in finally.
  • Helpers: Add _unlink_if_exists and _download_blob_to_temp_csv_path so the main validation method stays small, unlink logic is shared, and failed downloads log then clean up the temp file before re-raising.
  • Observability: On OSError from download_to_filename or to_csv, log file_name / blob name, path, errno, and strerror with exc_info=True, then re-raise (behavior unchanged aside from logging).
  • Tests: Update mocks for download_to_filename / upload_from_filename; add cases for download/to_csv OSError (unlink + no validator / no upload), download error logging, and upload failure after to_csv (temp still removed).
  • Style: Black/Ruff on gcsutil_test.py.

context

Large PDP course CSVs (e.g. ~244 MB) triggered Cloud Run memory limit terminations during validate-upload (platform log: instance using too much memory). The previous path held extra full-file copies (string buffer + pandas + another full CSV string for upload), which increased peak RSS beyond the container limit, especially under concurrency.

This change keeps validation semantics the same (still validate_file_reader / edvise PDP path) but removes two large redundant in-memory representations by using disk-backed temp files on the same instance and cleaning them up on success and failure.

questions

  • None

- Download unvalidated blob to a temp file and validate by path instead of
  blob.open().read() via _path_for_edvise_read (avoids a full in-RAM copy).
- Write validated CSV to a temp file and upload_from_filename instead of
  building the entire CSV in a StringIO string.

Branched from develop (repo has no dev branch).

Made-with: Cursor
Helps distinguish ENOSPC vs other failures in Cloud Run logs; re-raises unchanged.

Made-with: Cursor
…load

- Download OSError: unlink temp, skip validate_file_reader, log errno
- to_csv OSError: unlink temp, no upload, log errno
- Upload failure after to_csv: temp still unlinked

Made-with: Cursor
Aligns with universal-principles: keep _run_validation_and_get_normalized_df
under 50 lines, reduce nesting, replace tmp_path with local_csv_path naming.

Made-with: Cursor
@vishpillai123 vishpillai123 merged commit 8eba633 into staging Mar 26, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants