Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancements suggested for the users_companies table #4

Open
fhoffa opened this issue Feb 22, 2019 · 2 comments
Open

enhancements suggested for the users_companies table #4

fhoffa opened this issue Feb 22, 2019 · 2 comments
Labels
bigquery need bigquery sql help enhancement New feature or request

Comments

@fhoffa
Copy link

fhoffa commented Feb 22, 2019

Thanks for sharing this!

https://bigquery.cloud.google.com/table/public-github-adobe:github_archive_query_views.users_companies?pli=1&tab=details

image

Suggestions:

  • Add the field 'user_id', as people can change their nick (but not their id).
  • Add the field 'crawled_at'. Account will have multiple companies through their lifetime, and this will allow you to attribute commits that happened x years ago to the right company.

With 'crawled_at' you'll have to allow multiple entries per user, and adjust queries later. For example, the easiest queries would go through a view that just gives the latest company per user.

@fhoffa fhoffa changed the title enhancements suggested for the users table enhancements suggested for the users_companies table Feb 22, 2019
@filmaj filmaj added the enhancement New feature or request label Feb 23, 2019
@filmaj
Copy link
Contributor

filmaj commented Feb 23, 2019

Great idea, I've had this come up a few times already in twitter questions.

This will take some modification to the node script as it keeps an in-memory DB of user:(current) company associations which is written out to a huge json file between program runs. Sounds like the users_companies table would diverge from this somewhat. Need to consider if/how the schemas for the MySQL db, the json file and the users_companies table change.

@filmaj
Copy link
Contributor

filmaj commented Feb 23, 2019

Probably necessitates a larger discussion about the CLI interface. The node commands are essentially data transfer scripts between bigquery, mysql and json, and the github.com profile scraper. Data transfers between mediums are not complete - there's db-to-json and json-to-bigquery, but not viceversa / anything else. Would you find those commands helpful?

@filmaj filmaj added the bigquery need bigquery sql help label Mar 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery need bigquery sql help enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants