-
Notifications
You must be signed in to change notification settings - Fork 669
Extract links #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract links #655
Conversation
🦋 Changeset detectedLatest commit: a6a1184 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR enhances the extraction functionality by integrating link extraction support and updating the metadata and accessibility tree processing.
- Updated /lib/inference.ts to add a new 'url_field' in the metadata schema and return it as 'urlField'.
- Modified /lib/prompt.ts to adjust extraction prompts for DOM-based link extraction and instruct returning ID-only link outputs.
- Enhanced /lib/a11y/utils.ts to remove redundant StaticText and extract URLs from node properties.
- Introduced a recursive ID-to-URL replacement in /lib/handlers/extractHandler.ts.
- Extended type definitions in /types/context.ts to support URL mapping.
5 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile
# Conflicts: # lib/a11y/utils.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
(updates since last review)
Added two new evaluation tasks to test URL extraction functionality and implemented the core URL extraction mechanism in the extract handler.
- Added
extract_single_link.ts
andextract_jfk_links.ts
evaluation tasks that test extracting URLs usingz.string().url()
schema type - Implemented schema transformation in
extractHandler.ts
that convertsz.string().url()
fields toz.number()
during processing - Added URL injection mechanism that replaces numeric IDs with actual URLs in the extraction results
- Modified
evals.config.json
to include the new evaluation tasks under the 'extract' category - Note: URL extraction is implemented in
domExtract
method but not intextExtract
method which is used by default
Greptile AI
6 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
const mappedUrl = idToUrlMapping[String(fieldValue)]; | ||
record[key] = mappedUrl ?? ``; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: If mappedUrl is undefined, this sets the field to an empty string. Consider returning null or undefined instead, or adding a warning log when a URL mapping is missing.
@seanmcguire12 which models have been used to test this? I've tried |
why
usage
z.string().url()
example:
how it works
z.string().url()
, inside the extract handler, that field type is converted intoz.number()
, and the path to the converted field is storednumber
) in the converted fieldidToUrl
mappingz.string().url()
as we gowhat changed
extractHandler
to recurse through the user defined schema, and transform any fields of typez.string().url()
to typez.number()
while recording their path within the schematest plan