Skip to content

fix: escape file paths in GraphQL queries and add batch-size limits#300

Open
travellingsoldier85 wants to merge 6 commits intoentrius:testfrom
travellingsoldier85:fix/graphql-file-path-escaping-and-batch-limits
Open

fix: escape file paths in GraphQL queries and add batch-size limits#300
travellingsoldier85 wants to merge 6 commits intoentrius:testfrom
travellingsoldier85:fix/graphql-file-path-escaping-and-batch-limits

Conversation

@travellingsoldier85
Copy link

Rebased version of #275 — conflicts resolved.

Changes

  • Escape special characters in file paths for GraphQL queries
  • Add batch-size limits to prevent oversized requests

Previous PR was closed due to merge conflicts. This is a clean rebase on latest test branch.

claytonlin1110 and others added 4 commits March 17, 2026 10:37
Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>
Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>
Fix two issues in the file content fetching functions used for
token-based PR scoring:

1. **GraphQL injection from unescaped file paths**: File paths containing
   double quotes or backslashes are interpolated directly into GraphQL
   query strings, breaking query syntax and causing the entire file
   content fetch to fail silently. This means PRs touching files with
   special characters in their paths get scored as 0.
   - Add _escape_graphql_expression() helper that escapes \ and "
   - Apply escaping in both fetch_file_contents_batch and
     fetch_file_contents_with_base

2. **No batch-size limit for large PRs**: PRs with many files generate
   a single GraphQL query with one object lookup per file. GitHub's
   GraphQL API has query complexity limits, so large PRs can trigger
   502 errors and lose all file contents for scoring.
   - Add _MAX_FILES_PER_GRAPHQL_BATCH = 50 constant
   - Split both fetch functions into batched requests
   - Extract _fetch_file_contents_with_base_batch() as internal helper

Tests added for escaping correctness, batch splitting behavior,
special character handling, and edge cases (empty input, added/removed
files, failed batches).
@travellingsoldier85
Copy link
Author

This is a rebased replacement for #275 — conflicts with test branch have been resolved. Ready for review! @anderdc

@anderdc anderdc force-pushed the test branch 2 times, most recently from 34f0be2 to d4815ff Compare March 20, 2026 19:34
@anderdc
Copy link
Collaborator

anderdc commented Mar 27, 2026

Have we ever actually seen a PR where we need to escape characters? do you have one where gittensor miners needed this? what happens when that issue occurs? do we fail the entire PR or just grading that file?

also I do like the graphql query limits though, not sure if I like the file path escape portion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants