Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain #214

RagnarGrootKoerkamp · 2023-04-06T20:57:15Z

This encodes horizontal input/output deltas using Phin and Mhin indicator words with the lowest bit set if hin is +1 or -1 respectively. This gives up to 20% speed gains on large inputs where the bitpacking block computation takes relatively longer than the accounting for band-size.

Some small alignments get up to 2% slower but I suspect this is just noise.

maxbachmann · 2023-04-18T22:22:16Z

I was wondering why my own implementation (https://github.com/maxbachmann/RapidFuzz) was faster for very dissimilar sequences at some point. And indeed similar to this PR I store the deltas separately, so I do not have to split them. I was unaware this saved me so much time 👍

After this PR they are pretty much have the same performance. E.g. for completely different strings with length 1m I get:

edlib_old: 21.7s
edlib_new: 16.6s
rapidfuzz: 16.9s

The implementation for small sequences in edlib is far from optimal anyways, since in this case the whole accounting for the band size becomes way to expensive.

Martinsos · 2023-04-19T11:57:22Z

I was wondering why my own implementation (https://github.com/maxbachmann/RapidFuzz) was faster for very dissimilar sequences at some point. And indeed similar to this PR I store the deltas separately, so I do not have to split them. I was unaware this saved me so much time +1

After this PR they are pretty much have the same performance. E.g. for completely different strings with length 1m I get:

edlib_old: 21.7s

edlib_new: 16.6s

rapidfuzz: 16.9s

The implementation for small sequences in edlib is far from optimal anyways, since in this case the whole accounting for the band size becomes way to expensive.

Ok this will be great then!

Yup, for smaller sequences the housekeeping becomes too much. I think the best approach is to use different algorithm (like LandauVishkin) for smaller sequences. Idea was to put this in Edlib and have Edlib choose the algorithm based on the sequence length, but haven't implemented that yet. It is a bit more work because it needs to support everything the current algorithm does. But that seems like the most obvious way to go to me.

maxbachmann · 2023-04-19T12:21:15Z

Yup, for smaller sequences the housekeeping becomes too much. I think the best approach is to use different algorithm (like LandauVishkin) for smaller sequences. Idea was to put this in Edlib and have Edlib choose the algorithm based on the sequence length, but haven't implemented that yet. It is a bit more work because it needs to support everything the current algorithm does. But that seems like the most obvious way to go to me.

I personally use a couple implementations for the standard Levenshtein distance:

if the allowed edits are <= 3 use a brute force approach testing all possible edit combination
if sequence length < 64 use hyrroes/myers algorithm, but without blocks
if the band size is < 64 use an implementation calculating the bit vectors on the fly
if both length and band size is greater use the blockwise implementation
I provide a way to cache the bitvectors. While the creation of bitvectors is relatively cheap to create for long sequences, this can mean a significant speedup when comparing multiple shorter sequences.
when using short sequences < 64 characters I provide a simd implementation using sse2 and avx2 to compare multiple sequences in parallel

All of this is done inside the library, as long as the user calls it in an appropriate way. E.g. I can only use the simd implementation if the user calls the library with multiple strings.

RagnarGrootKoerkamp · 2025-05-06T11:04:11Z

@Martinsos I just came across this again during the development of sassy.

Would you merge this? I think it's pretty safe and simple?

Otherwise I'd have to decide whether to bench edlib with or without this patch.
If I include it it's confusing because it's not edlib master, but if I exclude it, you loose some perf (well good for me, but edlib could just be better.)

Martinsos · 2025-05-11T21:52:22Z

Hey @RagnarGrootKoerkamp , I will be looking into this in the following days! My apologies for not handling it sooner -> I had hard time finding time for bigger Edlib changes due to other obligations, but I should be able to carve some time now. I expect it to be a relatively straightforward merge, but I want to make sure I study it properly.

RagnarGrootKoerkamp · 2025-05-11T22:24:16Z

Sounds great!

Martinsos

@RagnarGrootKoerkamp reviewed the changes, please take a look!

In general, all makes sense, I love the idea. I mostly commented on smaller stuff + some code reorganization for consistency / readability. Let me know how you like it, and if you are able to apply those changes.

Martinsos · 2025-05-15T19:06:15Z

edlib/src/edlib.cpp

+ * @param [out] Phout  Bitset, 00..01 when hout == +1.
+ * @param [out] Mhout  Bitset, 00..01 when hout == -1.


This is actually a single output, a pair, not two outputs, so let's show it as such here, as a single @param?

Martinsos · 2025-05-15T19:10:12Z

edlib/src/edlib.cpp

+ * @param [in] Phin  Bitset, 00..01 when hin == +1.
+ * @param [in] Mhin  Bitset, 00..01 when hin == -1.


Suggested change

* @param [in] Phin Bitset, 00..01 when hin == +1.

* @param [in] Mhin Bitset, 00..01 when hin == -1.

* @param [in] Phin Bitset, 00..01 when hin == +1, otherwise 00..00.

* @param [in] Mhin Bitset, 00..01 when hin == -1, otherwise 00..00.

Martinsos · 2025-05-15T19:10:50Z

edlib/src/edlib.cpp

+ * @param [out] Phout  Bitset, 00..01 when hout == +1.
+ * @param [out] Mhout  Bitset, 00..01 when hout == -1.


Suggested change

* @param [out] Phout Bitset, 00..01 when hout == +1.

* @param [out] Mhout Bitset, 00..01 when hout == -1.

* @param [out] Phout Bitset, 00..01 when hout == +1, otherwise 00..00.

* @param [out] Mhout Bitset, 00..01 when hout == -1, otherwise 00..00.

Martinsos · 2025-05-15T19:19:15Z

edlib/src/edlib.cpp

    // hin can be 1, -1 or 0.
    // 1  -> 00...01
    // 0  -> 00...00
    // -1 -> 11...11 (2-complement)


This is not that meaningful any more, we can maybe rewrite it like this:

Suggested change

// hin can be 1, -1 or 0.

// 1 -> 00...01

// 0 -> 00...00

// -1 -> 11...11 (2-complement)

// (Phin, Mhin) can have one of three possible values:

// - (1, 0) if hin is 1.

// - (0, 1) if hin is -1.

// - (0, 0) if hin is 0.

Martinsos · 2025-05-15T19:26:11Z

edlib/src/edlib.cpp

-    int hout = 0;
    // This is instruction below written using 'if': if (Ph & HIGH_BIT_MASK) hout = 1;
-    hout = (Ph & HIGH_BIT_MASK) >> (WORD_SIZE - 1);
+    Word Phout = Ph >> (WORD_SIZE - 1);


I see that you removed & HIGH_BIT_MASK, and I was trying to figure why, but now that I am looking at it, it seems it was never useful, was it? So this was always redundant not a change caused by the other changes here, right?
Same for Mhout below.

Martinsos · 2025-05-15T19:52:42Z

edlib/src/edlib.cpp

    int bestScore = -1;
    vector<int> positions; // TODO: Maybe put this on heap?
-    const int startHout = mode == EDLIB_MODE_HW ? 0 : 1; // If 0 then gap before query is not penalized;
+    const Word PstartHout = mode == EDLIB_MODE_HW ? 0 : 1; // If 0 then gap before query is not penalized;


I think now is a bit harder to track what is start state, since only P part of hout is set at start, and what it the relationship between the start hout and Phout and Mhout.

What if we did this instead:

int bestScore = -1; vector<int> positions; // TODO: Maybe put this on heap? // PMhout is (Phout, Mhout): (1,0) if hout is 1, (0,1) if hout is -1, (0,0) if hout is 0. // Also, hout == Phout - Mhout. const pair<Word,Word> startPMhout = mode == EDLIB_MODE_HW ? {0,0} : {1,0}; // If 0 then gap before query is not penalized; const unsigned char* targetChar = target; for (int c = 0; c < targetLength; c++) { // for each column const Word* Peq_c = Peq + (*targetChar) * maxNumBlocks; //----------------------- Calculate column -------------------------// pair<Word,Word> PMhout = startPMhout; bl = blocks + firstBlock; Peq_c += firstBlock; for (int b = firstBlock; b <= lastBlock; b++) { PMhout = calculateBlock(bl->P, bl->M, *Peq_c, PMhout.first, PMhout.second, bl->P, bl->M); bl->score += PMhout.first - PMhout.second; // PMhout.first - PMhout.second == hout bl++; Peq_c++; } bl--; Peq_c--; const int hout = PMhout.first - PMhout.second; //------------------------------------------------------------------//

So now we have startPMhout at the start, as a single value that describes starting condition, then we have PMhout to track how hout develops as we calculate block by block, and I avoided recalculating hout in every step, we don't really need it in that format, PMhout is actually the format we need. Only at the very end we actually calculate hout, because it is useful for the readability further below.

Martinsos · 2025-05-15T20:14:10Z

edlib/src/edlib.cpp

+            pair<Word, Word> PMhout = calculateBlock(bl->P, bl->M, *Peq_c, Phout, Mhout, bl->P, bl->M);
+            bl->score = (bl - 1)->score - hout + WORD_SIZE + PMhout.first - PMhout.second;


If we apply the changes I suggested above, then we also need to adjust this part a bit:

Suggested change

pair<Word, Word> PMhout = calculateBlock(bl->P, bl->M, *Peq_c, Phout, Mhout, bl->P, bl->M);

bl->score = (bl - 1)->score - hout + WORD_SIZE + PMhout.first - PMhout.second;

pair<Word, Word> nextPMhout = calculateBlock(bl->P, bl->M, *Peq_c, PMhout.first, PMhout.second, bl->P, bl->M);

const int nextHout = nextPMhout.first - nextPMhout.second;

bl->score = (bl - 1)->score - hout + WORD_SIZE + nextHout;

Martinsos · 2025-05-15T20:20:48Z

edlib/src/edlib.cpp


        //----------------------- Calculate column -------------------------//
        int hout = 1;
+        Word Phout = 1;


If changes I suggested above make sense, we could do the same thing here -> have only PMhout here, no Phout, Mhout, and hout, till we are done with calculating blocks, only then we calculate hout, and below we go with nextPMhout and nextHout.
Oh I see now I used newHout here already -> but let's rename that to nextHout, that is a better name (since it is "next" block).

RagnarGrootKoerkamp · 2025-05-15T21:10:00Z

(I'll look more tomorrow, but in the meantime, feel free to push to the branch directly (you should have permission).)

Martinsos · 2025-05-15T22:27:21Z

(I'll look more tomorrow, but in the meantime, feel free to push to the branch directly (you should have permission).)

No worries take your time, and I do want you to take a look anyway before the changes I suggested are applied!

Encode horizontal deltas using +1/-1 indicators

f2a4653

Martinsos reviewed May 15, 2025

View reviewed changes

		* @param [out] Phout Bitset, 00..01 when hout == +1.
		* @param [out] Mhout Bitset, 00..01 when hout == -1.

		* @param [in] Phin Bitset, 00..01 when hin == +1.
		* @param [in] Mhin Bitset, 00..01 when hin == -1.

		pair<Word, Word> PMhout = calculateBlock(bl->P, bl->M, *Peq_c, Phout, Mhout, bl->P, bl->M);
		bl->score = (bl - 1)->score - hout + WORD_SIZE + PMhout.first - PMhout.second;

-            pair<Word, Word> PMhout = calculateBlock(bl->P, bl->M, *Peq_c, Phout, Mhout, bl->P, bl->M);
-            bl->score = (bl - 1)->score - hout + WORD_SIZE + PMhout.first - PMhout.second;
+            pair<Word, Word> nextPMhout = calculateBlock(bl->P, bl->M, *Peq_c, PMhout.first, PMhout.second, bl->P, bl->M);
+            const int nextHout = nextPMhout.first - nextPMhout.second;
+            bl->score = (bl - 1)->score - hout + WORD_SIZE + nextHout;

Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain #214

Are you sure you want to change the base?

Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain #214

Uh oh!

Conversation

RagnarGrootKoerkamp commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxbachmann commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Martinsos commented Apr 19, 2023

Uh oh!

maxbachmann commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RagnarGrootKoerkamp commented May 6, 2025

Uh oh!

Martinsos commented May 11, 2025

Uh oh!

RagnarGrootKoerkamp commented May 11, 2025

Uh oh!

Martinsos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RagnarGrootKoerkamp commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Martinsos commented May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RagnarGrootKoerkamp commented Apr 6, 2023 •

edited

Loading

maxbachmann commented Apr 18, 2023 •

edited

Loading

maxbachmann commented Apr 19, 2023 •

edited

Loading

RagnarGrootKoerkamp commented May 15, 2025 •

edited

Loading