Description of bug
- Compared to the main branch, E. coli TruSeq 100x dataset looses the longest contig.
main: NODE_1_length_405507_cov_64.322436 (405507 bp)
try-no-mapper NODE_1_length_361537_cov_63.875850 (361537 bp)
See alignment viewer
- It appears that path extend has not enough information to thread through a big loop, ending at edge with ID 2006. Proper path path continues to 1429, but path extend loops back to 381962 (see pdf attached)
- The main cause for lost paired information (381962 -> 1429 and 128578 -> 1429) is 3 out 6 reads that do not map to these edges. In fact, these edges still map but with much smaller ranges (text file attached)
- In turn, the key reason for that is that there is high number of mismatches at the end of edge 381962 (see IGV screenshot). Previously, k-mer mapper was helping to map these reads.
Proposed ideas for fix:
- Improve mapping to thread reads through mismatches
- Currently, mismatch corrector selects mismatch candidate positions based on the k-mers from k-mer mapper. It also uses only reads entirely mapping inside one edge. Need to use proper read mapping to correct all possible mismatches.
alignment_viewer.zip
E.coli 100x lost paired info.pdf
pe_fill.txt
spades.log
spades.log
params.txt
params.txt
SPAdes version
4.2.0
Operating System
Linux-6.8.0-65-generic-x86_64-with-glibc2.35
Python Version
3.10.12
Method of SPAdes installation
manual, try-no-mapper brabch
No errors reported in spades.log
Description of bug
main: NODE_1_length_405507_cov_64.322436 (405507 bp)
try-no-mapper NODE_1_length_361537_cov_63.875850 (361537 bp)
See alignment viewer
Proposed ideas for fix:
alignment_viewer.zip
E.coli 100x lost paired info.pdf
pe_fill.txt
spades.log
spades.log
params.txt
params.txt
SPAdes version
4.2.0
Operating System
Linux-6.8.0-65-generic-x86_64-with-glibc2.35
Python Version
3.10.12
Method of SPAdes installation
manual, try-no-mapper brabch
No errors reported in spades.log