-
Notifications
You must be signed in to change notification settings - Fork 174
Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain #214
Conversation
|
I was wondering why my own implementation (https://github.com/maxbachmann/RapidFuzz) was faster for very dissimilar sequences at some point. And indeed similar to this PR I store the deltas separately, so I do not have to split them. I was unaware this saved me so much time 👍 After this PR they are pretty much have the same performance. E.g. for completely different strings with length 1m I get:
The implementation for small sequences in edlib is far from optimal anyways, since in this case the whole accounting for the band size becomes way to expensive. |
Ok this will be great then! Yup, for smaller sequences the housekeeping becomes too much. I think the best approach is to use different algorithm (like LandauVishkin) for smaller sequences. Idea was to put this in Edlib and have Edlib choose the algorithm based on the sequence length, but haven't implemented that yet. It is a bit more work because it needs to support everything the current algorithm does. But that seems like the most obvious way to go to me. |
I personally use a couple implementations for the standard Levenshtein distance:
All of this is done inside the library, as long as the user calls it in an appropriate way. E.g. I can only use the simd implementation if the user calls the library with multiple strings. |
|
@Martinsos I just came across this again during the development of sassy. Would you merge this? I think it's pretty safe and simple? Otherwise I'd have to decide whether to bench edlib with or without this patch. |
|
Hey @RagnarGrootKoerkamp , I will be looking into this in the following days! My apologies for not handling it sooner -> I had hard time finding time for bigger Edlib changes due to other obligations, but I should be able to carve some time now. I expect it to be a relatively straightforward merge, but I want to make sure I study it properly. |
|
Sounds great! |
Martinsos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RagnarGrootKoerkamp reviewed the changes, please take a look!
In general, all makes sense, I love the idea. I mostly commented on smaller stuff + some code reorganization for consistency / readability. Let me know how you like it, and if you are able to apply those changes.
| * @param [out] Phout Bitset, 00..01 when hout == +1. | ||
| * @param [out] Mhout Bitset, 00..01 when hout == -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a single output, a pair, not two outputs, so let's show it as such here, as a single @param?
| * @param [in] Phin Bitset, 00..01 when hin == +1. | ||
| * @param [in] Mhin Bitset, 00..01 when hin == -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param [in] Phin Bitset, 00..01 when hin == +1. | |
| * @param [in] Mhin Bitset, 00..01 when hin == -1. | |
| * @param [in] Phin Bitset, 00..01 when hin == +1, otherwise 00..00. | |
| * @param [in] Mhin Bitset, 00..01 when hin == -1, otherwise 00..00. |
| * @param [out] Phout Bitset, 00..01 when hout == +1. | ||
| * @param [out] Mhout Bitset, 00..01 when hout == -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param [out] Phout Bitset, 00..01 when hout == +1. | |
| * @param [out] Mhout Bitset, 00..01 when hout == -1. | |
| * @param [out] Phout Bitset, 00..01 when hout == +1, otherwise 00..00. | |
| * @param [out] Mhout Bitset, 00..01 when hout == -1, otherwise 00..00. |
| // hin can be 1, -1 or 0. | ||
| // 1 -> 00...01 | ||
| // 0 -> 00...00 | ||
| // -1 -> 11...11 (2-complement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not that meaningful any more, we can maybe rewrite it like this:
| // hin can be 1, -1 or 0. | |
| // 1 -> 00...01 | |
| // 0 -> 00...00 | |
| // -1 -> 11...11 (2-complement) | |
| // (Phin, Mhin) can have one of three possible values: | |
| // - (1, 0) if hin is 1. | |
| // - (0, 1) if hin is -1. | |
| // - (0, 0) if hin is 0. |
| int hout = 0; | ||
| // This is instruction below written using 'if': if (Ph & HIGH_BIT_MASK) hout = 1; | ||
| hout = (Ph & HIGH_BIT_MASK) >> (WORD_SIZE - 1); | ||
| Word Phout = Ph >> (WORD_SIZE - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you removed & HIGH_BIT_MASK, and I was trying to figure why, but now that I am looking at it, it seems it was never useful, was it? So this was always redundant not a change caused by the other changes here, right?
Same for Mhout below.
| int bestScore = -1; | ||
| vector<int> positions; // TODO: Maybe put this on heap? | ||
| const int startHout = mode == EDLIB_MODE_HW ? 0 : 1; // If 0 then gap before query is not penalized; | ||
| const Word PstartHout = mode == EDLIB_MODE_HW ? 0 : 1; // If 0 then gap before query is not penalized; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now is a bit harder to track what is start state, since only P part of hout is set at start, and what it the relationship between the start hout and Phout and Mhout.
What if we did this instead:
int bestScore = -1;
vector<int> positions; // TODO: Maybe put this on heap?
// PMhout is (Phout, Mhout): (1,0) if hout is 1, (0,1) if hout is -1, (0,0) if hout is 0.
// Also, hout == Phout - Mhout.
const pair<Word,Word> startPMhout = mode == EDLIB_MODE_HW ? {0,0} : {1,0}; // If 0 then gap before query is not penalized;
const unsigned char* targetChar = target;
for (int c = 0; c < targetLength; c++) { // for each column
const Word* Peq_c = Peq + (*targetChar) * maxNumBlocks;
//----------------------- Calculate column -------------------------//
pair<Word,Word> PMhout = startPMhout;
bl = blocks + firstBlock;
Peq_c += firstBlock;
for (int b = firstBlock; b <= lastBlock; b++) {
PMhout = calculateBlock(bl->P, bl->M, *Peq_c, PMhout.first, PMhout.second, bl->P, bl->M);
bl->score += PMhout.first - PMhout.second; // PMhout.first - PMhout.second == hout
bl++; Peq_c++;
}
bl--; Peq_c--;
const int hout = PMhout.first - PMhout.second;
//------------------------------------------------------------------//So now we have startPMhout at the start, as a single value that describes starting condition, then we have PMhout to track how hout develops as we calculate block by block, and I avoided recalculating hout in every step, we don't really need it in that format, PMhout is actually the format we need. Only at the very end we actually calculate hout, because it is useful for the readability further below.
| pair<Word, Word> PMhout = calculateBlock(bl->P, bl->M, *Peq_c, Phout, Mhout, bl->P, bl->M); | ||
| bl->score = (bl - 1)->score - hout + WORD_SIZE + PMhout.first - PMhout.second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we apply the changes I suggested above, then we also need to adjust this part a bit:
| pair<Word, Word> PMhout = calculateBlock(bl->P, bl->M, *Peq_c, Phout, Mhout, bl->P, bl->M); | |
| bl->score = (bl - 1)->score - hout + WORD_SIZE + PMhout.first - PMhout.second; | |
| pair<Word, Word> nextPMhout = calculateBlock(bl->P, bl->M, *Peq_c, PMhout.first, PMhout.second, bl->P, bl->M); | |
| const int nextHout = nextPMhout.first - nextPMhout.second; | |
| bl->score = (bl - 1)->score - hout + WORD_SIZE + nextHout; |
|
|
||
| //----------------------- Calculate column -------------------------// | ||
| int hout = 1; | ||
| Word Phout = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If changes I suggested above make sense, we could do the same thing here -> have only PMhout here, no Phout, Mhout, and hout, till we are done with calculating blocks, only then we calculate hout, and below we go with nextPMhout and nextHout.
Oh I see now I used newHout here already -> but let's rename that to nextHout, that is a better name (since it is "next" block).
|
(I'll look more tomorrow, but in the meantime, feel free to push to the branch directly (you should have permission).) |
No worries take your time, and I do want you to take a look anyway before the changes I suggested are applied! |
This encodes horizontal input/output deltas using
PhinandMhinindicator words with the lowest bit set ifhinis+1or-1respectively. This gives up to20%speed gains on large inputs where the bitpacking block computation takes relatively longer than the accounting for band-size.Some small alignments get up to 2% slower but I suspect this is just noise.
