Skip to content

Conversation

themightychris
Copy link
Collaborator

This captures some unfinished early work of mine on extracting information about content contained within referenced git repos by fetching the latest commit of the default branch into a scratch repo.

What I ran into and didn't finish resolving before I stopped working on this was that some projects reference Git repos outside GitHub that are down and/or have malformed responses and the git CLI is not good at telling us about this when we try to fetch or ls-remote on them, and just hangs indefinitely in some cases.

There are two things we could/should do next:

  • Add a special case handler for GitHub repositories that uses the GitHub API to fetch information about the latest commit and various files within the repo instead of trying to use the git protocol to pull the latest commit. Even doing a shallow fetch of just the latest commit, pulling via git requires downloading a lot more content and dealing with a lot more failure modes than using the GitHub API. The vast majority of repos are on GitHub, so while we want to ultimately support all sorts of Git hosts, GitHub represents a worthwhile special case to optimize for by using their API instead
  • Make fetching latest commit and content via git more resilient for cases where we can't use the GitHub API. I was thinking we might either:
    • Explore implementing our own timeout in our use of child_process to invoke the local git client to probe repositories
    • OR, and this might be quicker, implementing our own fast HTTP or SSH connection before trying to use git that just quickly checks that a connection can be established and we see some fingerprint in the response that tells us a git server is responding on the other end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant