Skip to content

feat: Add GitHub integration with agent_prompts and github_components #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Apr 10, 2025

Related Issues

Proposed Changes:

  • Move github_components from experimental to a new integration
  • Move agent_prompts from experimental to a new integration

The idea is to enable users to run the example notebook (or a version with updated imports) after having installed this new integration

How did you test it?

New unit tests and I ran all usage examples successfully with a test repo.

I haven't tested it with the notebook yet, which we would need to update first. (tracked by deepset-ai/haystack-cookbook#183 )

Notes for the reviewer

  • I suggest we rename github_token parameter to api_key for consistency with many other integrations.
  • While we could find a way to set up integration tests, I would rather leave them out of this PR.
  • GithubRepositoryViewer has a branch parameter in the run method, which could also be named ref to make more clear it can also be a tag or commit hash. I prefer keeping the parameter name branch.
  • Some components have github_token: Optional[Secret] = None, because they can work without any token while others use Secret.from_env_var("GITHUB_TOKEN"). I suggest we use Secret.from_env_var("GITHUB_TOKEN", strict=False) where we currently have None as the default.
  • The internal implementation of the components differs in how they use _get_headers or _get_request_headers or define headers inline. We could refactor that.

Checklist

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Apr 10, 2025
@julian-risch julian-risch marked this pull request as ready for review April 25, 2025 10:28
@@ -0,0 +1,21 @@
# github-haystack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good for us to add examples here in the Readme on how to use or to link to the tutorial/google colab for how to use.

Also another relevant detail I think is that these prompts were optimized using Anthropic models. Could be a useful thing for users to know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google colab in the cookbook and some more examples in an integration page is what I imagine. The README's we currently don't fill out, for example see: https://pypi.org/project/opensearch-haystack/
Might be a good idea to change that and use a copy of the integrations page. I don't see a good reason to keep it empty but I would prefer a consistent solution. I'll talk to Bilge.

@sjrl
Copy link
Contributor

sjrl commented Apr 29, 2025

@julian-risch maybe a general comment on the structure here. I see that the prompts aren't being used within the library and I understand they will be used in a future tutorial/colab.

I wonder then if it would be helpful to instead pre-assemble the tools within the repo so users could easily import the tools and immediately pass them to an Agent. What do you think?

@sjrl
Copy link
Contributor

sjrl commented Apr 29, 2025

@julian-risch overall this looks really good! I mostly have minor comments and only one larger conceptual one about maybe providing users Tools directly instead of needing to compose them, themselves.

I didn't comb through every line since there is a lot, but it's well tested so it's good to go from my perspective! We can always make quick updates to this if things arise and depending on usage.

Comment on lines 14 to 17
class GitHubFileEditorTool(ComponentTool):
"""
A Haystack tool for editing files in GitHub repositories.
"""
Copy link
Contributor

@sjrl sjrl May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting I wasn't thinking to inherit from ComponentTool but do something like

@component
class GitHubFileEditor:
    ...

GitHubFileEditorTool = ComponentTool(GitHubFileEditor(), ...)

and then people could import the pre-made GitHubFileEditorTool but I can see how this version would be more customizable.

Comment on lines +51 to +67
def to_dict(self) -> Dict[str, Any]:
"""
Serializes the tool to a dictionary.

:returns:
Dictionary with serialized data.
"""
return default_to_dict(
self,
name=self.name,
description=self.description,
parameters=self.parameters,
github_token=self.github_token.to_dict(),
repo=self.repo,
branch=self.branch,
raise_on_failure=self.raise_on_failure,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go this route of inheriting from ComponentTool, we won't be able to use this to_dict method I think. At least we use different sede methods for Tools with this dict structure

{"type": generate_qualified_class_name(type(self)), "data": serialized}

so we'd probably need to follow that as well right?

Since when deserializing this in a pipeline I believe we will eventually call deserialize_tools_or_toolset_inplace. Do these methods work in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julian-risch okay interesting that this works since you added the pipeline serialization test.

I still wonder to be consistent if we should not rename init_parameters to data in the serialized dict since that appears to be the pattern we use in our other tools and expect in deserialize_tools_or_toolset_inplace

@julian-risch
Copy link
Member Author

@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.

https://colab.research.google.com/drive/1ktlwQ-CDLGDs2uZXvzgG8XspfjPidYqZ?usp=sharing

@sjrl If GitHubFileEditorTool looks good to you, I will add tools for all other components and probably update the directory structure a bit.

@sjrl
Copy link
Contributor

sjrl commented May 6, 2025

@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.

@julian-risch This is related to the change we made to tools to have a new variable called outputs_to_string. So the google colab code should be updated to

    ...
    outputs_to_state={
        #"message": {"source": "documents", "handler": message_handler}, TODO
        "documents": {"source": "documents"},
    },
    outputs_to_string={"source": "documents", "handler": message_handler}
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants