Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using SQLite to store analyses #129

Open
MattWindsor91 opened this issue Feb 2, 2021 · 1 comment
Open

Consider using SQLite to store analyses #129

MattWindsor91 opened this issue Feb 2, 2021 · 1 comment
Labels
enhancement New feature or request moonshot Very long-term suggestion question Further information is requested

Comments

@MattWindsor91
Copy link
Collaborator

While writing some boilerplate to allow c4f-stat to output statistics, it occurred to me that I'm hardcoding a lot of specific analyses (such as 'give me all mutations that were hit, but not killed'), and also that it is almost always the case that the stats persister doesn't have the stats I need for a given paper.

While it's not going to be possible for me to do so any time soon, I wonder if it's worth replacing the stat persister (which is constantly storing specific views on analyses) with a SQLite database that just logs the analyses in full every time it observes them, then offers the stats in the form of SQL queries. This would have several advantages:

  • analyses get grouped together in one place;
  • we can do complex aggregation over analyses (such as 'give me the minimum, maximum, and other counts of observations - something I needed to do for a paper but had to do by manual filesystem walking);
  • as we add new stats, we can reuse old data, instead of needing to rerun experiments;
  • we can refactor the stats persister to instead perform queries over the analysis database, while retaining the same outward API onto it.

A disadvantage of this is the massive dependency it would insert. SQLite is a cgo dependency, usually. Perhaps we could use other SQL or NoSQL databases, but I really don't want to make c4t dependent on having a database set up.

@MattWindsor91 MattWindsor91 added enhancement New feature or request question Further information is requested moonshot Very long-term suggestion labels Feb 2, 2021
@MattWindsor91
Copy link
Collaborator Author

Even the analysis stage does a lot of shedding information, eg aggregating compilations down to min/mean/max duration slots. A database with its own aggregation setup makes sense here, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request moonshot Very long-term suggestion question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant