feat: add feed availability task and scheduler#1705
Open
davidgamez wants to merge 16 commits into
Open
Conversation
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary:
This PR adds the cloud function and scheduler to check for and persist GTFS feed availability. It expanded the scope of the issue to perform a quick zip content check. Initially, it was intended to perform only HEAD HTTP requests, but testing in DEV, I realized that some of the servers don't support HEAD requests(~160). As a workaround, the check executes a HEAD request and if it fails, a GET request.
From our AI friend
This pull request introduces a new system for checking the availability of GTFS feeds via HTTP HEAD (with GET fallback), refactors and modularizes HTTP and SSL utilities, and documents the new feed availability check task. The changes improve reliability, security, and observability of feed health checks, and make the codebase more maintainable.
New GTFS Feed Availability Check System:
check_gtfs_feed_availability, to the task executor, which checks the availability of published GTFS feeds using HTTP HEAD requests, with an optional GET fallback that reads only the ZIP magic bytes. Results are stored in thegtfs_feed_availability_checktable, and the task is fully documented inREADME.mdwith parameters and sample responses. [1] [2] [3]HTTP/SSL Utility Refactoring and Enhancements:
functions-python/helpers/utils.pyto modularize HTTP and SSL logic:create_feed_ssl_context, supporting legacy server compatibility and optional certificate validation disabling.build_feed_request_paramsfor constructing headers and URLs with flexible authentication.perform_requestfor performing availability checks with fallback and result normalization.download_and_get_hashto use the new modular utilities for improved clarity and maintainability. [1] [2] [3]Documentation Improvements:
README.mdto document the new feed availability check task, including configuration parameters, usage examples, and sample output for both normal and verbose modes.Expected behavior:
The GTFS availability is persisted in the DB.
Testing tips:
Internal team: Can be tested via retool in the dev environment
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.shto make sure you didn't break anything