Skip to content

Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering, rename --report to --gff-report, rename --gff-out to --gff, add startup logging, sequence counts in output filen...#65

Merged
Adamtaranto merged 18 commits into
mainfrom
copilot/fix-incorrect-flank-reporting
Apr 20, 2026

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 19, 2026

  • Add GFF output write inside the per-pair pairing-map loop, writing to pair_outDir when --gff is set — the GFF block was previously inside if not pairing_map: so it was never executed in pairing-map mode
  • Sequence count is now appended to interleaved_flanks and target_sites filenames (e.g. {prefix}{pair_label}_target_sites_{N}.fasta, {prefix}{pair_label}_interleaved_flanks_{N}.fasta)

Copilot AI and others added 2 commits April 19, 2026 08:17
… sub-directories, and summary reports

- Add --flanks flag to enable writing flanks for all hits
- Add --flanks-paired flag (replaces --flank-paired-only) for paired-only flanks
- Change --flank-len default to 50
- Handle symmetric FF/RR orientations: write both flanks with warning
- Write per-pair target site FASTA files (one per model pair)
- Write target sites as single-line non-wrapped FASTA
- Create sub-directories per pair when using --pairing-map
- Write summary reports for each model pair
- Update tests for new API
- Update documentation

Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/470e9689-0d39-4230-8ada-2b3526f50273

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix incorrect number of flanks reported and non-external flanks Fix incorrect flank reporting: only external flanks, per-pair output, summary reports Apr 19, 2026
Copilot AI requested a review from Adamtaranto April 19, 2026 08:20
@Adamtaranto
Copy link
Copy Markdown
Owner

@copilot if input blast or nhmmer results contain hits to multiple models, but the --pairing-map only contains pairs that use a subset of those models, only extract hit sequences and other pairing and processing tasks for the models in the pairing map. Check pairing map for valid model names before any processing, report count of models ignored, ignore hits for models that are not in the pairing map. If no pairing map then all hits may be processed.

When --pairing-map is provided, only hits for models referenced in the
pairing map are kept before any downstream processing (coverage calc,
e-value filter, writeTIRs, pairing, etc.). Models present in hits but
absent from the pairing map are reported with a warning and their hit
count, then excluded. If no pairing map is provided, all hits proceed
as before.

Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/ac004274-bb89-48ec-a78e-e255ec4fa6f4

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 19, 2026

@copilot if input blast or nhmmer results contain hits to multiple models, but the --pairing-map only contains pairs that use a subset of those models, only extract hit sequences a...

Done in commit 713284e. When --pairing-map is provided, the hitTable is now filtered immediately after import — before coverage calculation, e-value filtering, writeTIRs, or any pairing work — to retain only models referenced in the pairing map. Models present in hits but absent from the map trigger a warning listing their names and the number of hits excluded. If no pairing map is given, all hits are processed as before.

Copilot AI changed the title Fix incorrect flank reporting: only external flanks, per-pair output, summary reports Fix incorrect flank reporting: only external flanks, per-pair output, summary reports, pairing-map model filtering Apr 19, 2026
@Adamtaranto
Copy link
Copy Markdown
Owner

@copilot

For standard --flanks and --flanks-paired output do not write internal hit flanks to file, only flanks from the side of the terminus hit that would face away from the interior of the transposon.

One interleaved flanks output incorrectly identifies flanks external to the termini. The second interleave termini output correctly printsFor left and right models in F,F orientation on the positive strand, we want the left flank of the left model, and the right flank of the right model.

Output sub-directories per model pair are created, but duplicates of all output files are also written to the base output dir. If a model pair output dir is created then only write relevant output there.

Include detailed summary of criteria on which hits were filtered in the summary.txt file

…nk orientation, enhance summary filter details

- writeTIRs now writes to per-pair subdir in pairing_map mode; all post-loop
  output writes (writePairedTIRs, writeElements, writeFlanks, writeTargetSites,
  GFF) are guarded with 'if not pairing_map:' to prevent duplicates in base outDir
- writeFlanks: symmetric same-strand (F,F/R,R) unpaired hits now skipped (not
  written) rather than writing both left+right flanks which would include internal
  flanks; warning updated to advise --flanks-paired
- _write_pair_summary gains optional filter_stats dict; summary.txt now includes
  a 'Filtering criteria applied' section with thresholds and excluded hit counts
  for pairing-map model filter, coverage filter, e-value filter, and anchor offset
  filter; filter_stats is collected incrementally in main() and passed to all
  summary calls

Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/75009e5a-f10a-493c-aa7b-f5f00ef0099f

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 19, 2026

@copilot

For standard --flanks and --flanks-paired output do not write internal hit flanks to file, only flanks from the side of the terminus hit that would face away from the interior of the ...

Addressed in commit 7e0b861:

  1. External flanks only: For --flanks (write_all), unpaired hits in symmetric same-strand (F,F / R,R) mode are now skipped rather than writing both left and right flanks (which would include an internal flank). Paired hits continue to write only the outer flank — left external for the left terminus hit and right external for the right terminus hit. Warning updated to direct users to --flanks-paired.

  2. No duplicate output files in pairing-map mode: writeTIRs is now called inside the per-pair loop, writing only the relevant model hits to pair_outDir. All post-loop output writes (writePairedTIRs, writeElements, writeFlanks, writeTargetSites, GFF) are gated with if not pairing_map:, so nothing is duplicated to the base output directory.

  3. Summary filter details: summary.txt now includes a Filtering criteria applied section listing the threshold and excluded hit count for each filter step (pairing-map model filter, coverage filter, e-value filter, anchor offset filter), plus initial hit count and total remaining after all filters.

Copilot AI and others added 2 commits April 19, 2026 15:25
…t duplicate output files

In writeTargetSites, the pair_key for grouping output records now uses the
canonical model order from PairingConfig (config.left_model, config.right_model)
instead of the genomic-position-based order from flipTIRs. This ensures only
ONE target_sites file and ONE interleaved_flanks file per canonical model pair.

In writeFlanks, paired hit flanks are now grouped by config-canonical model names
(config.left_model for left flanks, config.right_model for right flanks) for
asymmetric pairings, preventing spurious extra flank files when the genomic
position order differs from the config's model assignment.

Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/b8293eeb-cbc5-4f59-a4f6-6cbcb7a6dcf8

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
@Adamtaranto
Copy link
Copy Markdown
Owner

@copilot paired term hits is currently written as single file with hits to left and right models. Paired hits should only be written to two separate files, one for left hits and one for right.

Ensure mkdocs tutorial for tirmite pair uses updated args and fully demonstrates flank extraction and insertion site reconstruction.

…mite-pair tutorial

Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/02571512-36fb-4831-914a-f2b151a8ec28

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
Copilot AI changed the title Fix incorrect flank reporting: only external flanks, per-pair output, summary reports, pairing-map model filtering Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering Apr 19, 2026
@Adamtaranto
Copy link
Copy Markdown
Owner

@copilot the --report flag with options {all,paired,unpaired}, controls the records that are written into the gff output. Update arg name to --gff-report and update documentation and README.md to use correct arg names.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 20, 2026

@copilot the --report flag with options {all,paired,unpaired}, controls the records that are written into the gff output. Update arg name to --gff-report and update documentation and README.md to ...

Done in 36fdc68. --report has been renamed to --gff-report (dest='gff_report') in both hmm_pair.py and legacy.py, with all args.report call sites updated to args.gff_report. The tutorial (tirmite-pair.md) and README.md now use --gff-report throughout.

Copilot AI changed the title Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering, rename --report to --gff-report Apr 20, 2026
@Adamtaranto
Copy link
Copy Markdown
Owner

@copilot change --gff-out bool option to --gff, update in docs and readme.

Add logging summary of all input args, noting anything changed from defaults, when tirmite pair is called. Also log the full cmd. log path to tirmite package. log package version.

Log check for any required software and report path if found i.e blastdbcmd if a blastdb is provided.

Copilot AI and others added 2 commits April 20, 2026 00:44
…args; check blastdbcmd

Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/d105c84e-c21e-4ee4-9042-af6fa1f0857d

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/d105c84e-c21e-4ee4-9042-af6fa1f0857d

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 20, 2026

@copilot change --gff-out bool option to --gff, update in docs and readme.

Add logging summary of all input args, noting anything changed from defaults, when tirmite pair is...

Done in abdfdc9 (with a follow-up fix in 7f2cee2):

  1. --gff-out--gff: renamed in both hmm_pair.py and legacy.py (using dest='gff_out' so no internal code changes were needed); updated all docs and README.

  2. Startup logging (log_startup_info() in hmm_pair.py): after init_logging is called, the following are now logged at INFO level:

    • TIRmite package location: <path>
    • TIRmite-pair version: <version>
    • Command: <full sys.argv>
    • Non-default arguments (each arg where the supplied value differs from the parser default), or a note that all args are at defaults.
  3. blastdbcmd check: when --blastdb is provided, shutil.which('blastdbcmd') is called and the resolved path is logged at INFO, or a WARNING is emitted if it is not found in PATH.

Copilot AI changed the title Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering, rename --report to --gff-report Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering, rename --report to --gff-report, rename --gff-out to --gff, add startup logging Apr 20, 2026
Agent-Logs-Url: https://github.com/Adamtaranto/TIRmite/sessions/6fff4ee8-7897-479e-be05-625f8010d5aa

Co-authored-by: Adamtaranto <2160099+Adamtaranto@users.noreply.github.com>
@Adamtaranto
Copy link
Copy Markdown
Owner

@copilot add sequence count to the end of file names for "interleaved flanks" and for "target sites"

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 20, 2026

@copilot add sequence count to the end of file names for "interleaved flanks" and for "target sites"

Done in 933e3d8. Both interleaved flanks and target sites filenames now include the sequence count before the extension, e.g. {prefix}{pair_label}_target_sites_12.fasta and {prefix}{pair_label}_interleaved_flanks_12.fasta. Docs updated to match.

Copilot AI changed the title Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering, rename --report to --gff-report, rename --gff-out to --gff, add startup logging Fix incorrect flank reporting: only external flanks, per-pair output, split paired TIR files, summary reports, pairing-map model filtering, rename --report to --gff-report, rename --gff-out to --gff, add startup logging, sequence counts in output filen... Apr 20, 2026
@Adamtaranto Adamtaranto marked this pull request as ready for review April 20, 2026 01:15
@Adamtaranto Adamtaranto merged commit 1885563 into main Apr 20, 2026
@Adamtaranto Adamtaranto deleted the copilot/fix-incorrect-flank-reporting branch April 20, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Incorrect number of flanks reported & non-external flanks reported

2 participants