-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: can we have programmatic output as well as PDF/LaTeX? #814
Comments
I added all the SamTags and many of the SamHeader tags as well as three Enums and the BAM magic string into three different yaml files. I then ran reconstant on said files and obtained the autogenerated code. reconstant doesn't currently have an R nor a LaTeX output mode, but its a simple enough code that it could be injested and modified into this code-base, or I could submit a PR and we could continue using from it's current location. This PR is meant to provide something to discuss in the next meeting, it's not ready for merging.
I like this idea for its formality, but I'm not sure language-specific details or implementations belong with the specification. Libraries, in any particular language, probably should be providing such constants in the first place? For example, in the Rust library noodles, there are SAM data field tags, SAM record flags, etc. Compared to the Rust output in #815, see there are differences in nomenclature (e.g., the data tags use full names rather than short codes) and type definitions (e.g., flags have a type-safe wrapper rather than being an integer). Regarding enums, note that the SAM/BAM specification maintainers don't consider field values to be closed sets (see #725 (comment)). noodles changed enums to common string constants (e.g., SAM header read group platform values) because of this argument. |
I understand your hesitation, and definitely do not want to pretend that the implementation provided in #815 is ideal or even good. The point of that PR was to provide an implementation that would clarify my intention regarding how hts-specs might provide a single point definition of constants. The details of the implementation can be discussed in that PR, after we discuss here if the idea is worthwhile.... The reason I thought it would make sense that hts-specs would provide a definitive set of constants is that it makes it easy to include and recognize the library. I've seen many (mostly python) packages that re-define the hts-spec constants that they need. This provides aple oppornity for mistakes & misunderstandings when reading/using these pacakges. If the consensus is that such a collection of small libraries is pointless, I'm happy to close this issue....and I also accept the fact that I'm a little late to the game and that the existing libraries are unlikely to include the ones we may release here and make use of them...but I am still curious to see what the community thinks. |
I can see the use of something like this: for example, to use @zaeleus's usual bugbear 😄, it would be useful to provide an up-to-date list of the valid I could support adding something lighter-weight, listing the tags and keywords that are currently defined by the specification that are subject to being added to in future. For example, for SAM/BAM/CRAM this could be a JSON file something like pub/sam.json:
IMHO that would suffice, and it would be best to leave it up to implementations what, if anything, they wanted to do with the data in such a file. I don't think it would be worthwhile to have the LaTeX spec derive these items from the machine-readable version; e.g., we have textual descriptions for some of the platform values that would be non-trivial to implement in code in LaTeX. So I don't think adding something like this JSON file would be a big maintenance burden, even though the tags and value keywords are duplicated in it.
Reality also does not consider these field values to be closed sets. |
Thanks @jmarshall for the thoughts and comments. I agree that the autogenerated LaTeX is possibly a step too far and without that, there's no need for the descriptions, and types. The reason I suggested autogenerated code was that it would then be relatively straightforward to autogenerate a collection of libraries/packages (one per language) that could be included into a project with the language-appropriate packaging tool. I like the idea of a json with the tags/values, I was simply unaware of a good way of including that in a code project. This is not so surprising given that I'm far from being an expert on the matter of software packaging.... Do you or anyone else know of a good way of packaging a json as a first-class citizen in different code languages? |
There are many constants that are defined in hts-spec, but the only way to use them (currently) is to manually copy/update them in one's own implementation. If we were to publish some artifact/artifacts containing these constants, maintainers would be able to import that artifact and use it in their code.
I think that there are two general needed parts here:
Ideally, each of the hts-constants would be defined in a single "original" place, and all the other uses would be automatically generated from that.
Here's my idea for an implementation (based on https://github.com/aantn/reconstant)
Have a configuration file that contains the constants of interest. For example, the SamTags, Sam-header tags, "magic" strings.
This configuration file will be the only definitive place for adding/modifying constants.
Artifacts in different languages (python,java,c,rust,latex,R) will be generated via the make-file.
include
and use said artifactI'm mostly thinking about the SamSpec, but, of course, different sub-specs could choose to use this mechanism or not, individually, for example, VCF, refget, etc.
The text was updated successfully, but these errors were encountered: