hotfix(api): reduce validate-upload memory use on staging#224
Merged
vishpillai123 merged 5 commits intostagingfrom Mar 26, 2026
Merged
hotfix(api): reduce validate-upload memory use on staging#224vishpillai123 merged 5 commits intostagingfrom
vishpillai123 merged 5 commits intostagingfrom
Conversation
- Download unvalidated blob to a temp file and validate by path instead of blob.open().read() via _path_for_edvise_read (avoids a full in-RAM copy). - Write validated CSV to a temp file and upload_from_filename instead of building the entire CSV in a StringIO string. Branched from develop (repo has no dev branch). Made-with: Cursor
Helps distinguish ENOSPC vs other failures in Cloud Run logs; re-raises unchanged. Made-with: Cursor
…load - Download OSError: unlink temp, skip validate_file_reader, log errno - to_csv OSError: unlink temp, no upload, log errno - Upload failure after to_csv: temp still unlinked Made-with: Cursor
Aligns with universal-principles: keep _run_validation_and_get_normalized_df under 50 lines, reduce nesting, replace tmp_path with local_csv_path naming. Made-with: Cursor
Made-with: Cursor
vishpillai123
approved these changes
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
changes
StorageControl._run_validation_and_get_normalized_df: Stop usingblob.open("r")(which led to a full in-memory read via_path_for_edvise_read). Download the unvalidated object withblob.download_to_filename()into a private temp CSV, runvalidate_file_readeron that path, and remove the file in afinallyblock.StorageControl._write_dataframe_to_gcs_as_csv: Stop building the entire validated CSV in aStringIOandupload_from_string. Writeto_csvto a temp file andupload_from_filename, then unlink infinally._unlink_if_existsand_download_blob_to_temp_csv_pathso the main validation method stays small, unlink logic is shared, and failed downloads log then clean up the temp file before re-raising.OSErrorfromdownload_to_filenameorto_csv, logfile_name/ blob name, path,errno, andstrerrorwithexc_info=True, then re-raise (behavior unchanged aside from logging).download_to_filename/upload_from_filename; add cases for download/to_csvOSError(unlink + no validator / no upload), download error logging, and upload failure afterto_csv(temp still removed).gcsutil_test.py.context
Large PDP course CSVs (e.g. ~244 MB) triggered Cloud Run memory limit terminations during
validate-upload(platform log: instance using too much memory). The previous path held extra full-file copies (string buffer + pandas + another full CSV string for upload), which increased peak RSS beyond the container limit, especially under concurrency.This change keeps validation semantics the same (still
validate_file_reader/ edvise PDP path) but removes two large redundant in-memory representations by using disk-backed temp files on the same instance and cleaning them up on success and failure.questions