This is a specialized family of algorithms which require a list to be uniquely partitioned (see also set partition). These don't need to know the predicate (predicate-agnostic), if equality is total. However, knowing the predicate allows significant generalization.
Note
There is an alternative formulation that uses approximate-equality, but since ≈ isn't transitive, the list must be sorted (or at least, partitioned by approximate comparison, which makes "similar" partitions contiguous)
In general, s is uniquely-partitioned if all values with a common property are contiguous; Non-generally and informally, runs must "consolidate its representative value". Examples:
- Good:
[0,0,1,1](2 distinct runs) - Bad:
[0,1,1,0](1is distinct but0is duped)
This implies many things:
- Any slice of
sis uniquely-partitioned. Which implies trivial parallelization! - Any sorted list (by the standard numeric (scalar, not vectorial) comparison-function) is uniquely-partitioned
- There cannot be more than 1 partition with the same mode
- If all partitions in
sare isometric, there's a bijection between modes ofsand its partitions
These algorithms exploit the following lemmas (theorems?):
- Multiplication is faster than repeated addition
- Bin exponentiation is faster than repeated multiplication
- Iterating over all elements of a list is
O(n), but finding values in a sorted list can be as fast asO(log(n))(bin-search)
See reference implementations here. Those are designed to "complement each other", because I want them to be examples of various use-cases.
I hope that this proof-of-concept can help optimize many other programs that deal with grouped values of any type, such as a specialized RLE compressor.
Most discussion about partition-jumping was on LLVM issue #118731 and this disbloat guild in the #cs-theory channel
Note
I don't like disbloat, but it was my only choice at the time
TLDR: average time is O(part_count * lb(n)), but it's more nuanced than that. Best-case is O(1). Worst-case is O(n * lb(n)) (bin-search), or O(n) (exponential search)
The runtime of these algorithms is dominated by the boundary (partition point) count (part_count - 1). So for the overhead to be worthwhile, the number of unique (according to some "key function") values should be low (relative to the length of s)
It's worth noting that the aforementioned lb (bin logarithm) is misleading. The 1st bisection is O(lb(n)) but the next is O(lb(n - part_len_0)) then O(lb(n - part_len_0 - part_len_1)), and so on, until it becomes O(lb(part_len_last)) (if we ignore the target == s[-1] check, which simplifies the last bisection into O(1)). That's only true for bin-search. For exp-search it's O(lb(part_len_i)) (worst-case), and O(1) (best-case) with target == s[-1].
As for the space complexity, it's O(1) for fixed-precision numbers. The basic implementation doesn't allocate auxiliary memory, but I'm working on one that does (see below)
The impl with aux-mem will use a data-structure to track the known "sub-partitions" that it encounters while bisecting (I call those "witnesses" or "bystanders"). For example:
s := [0,0,1,1,1,1,1]
target := 0, overshoot to index 3. After finding the boundary at index 2, we can remember that there are at least 2 instances of 1, even before we set it as our target, just because we happen to visit it while searching for 0
If you only want to track one bystander at a time, your aux-mem will be O(1). But my plan is to remember all bystanders, so I need something like a hash-map. I'm not sure if it's worth it, as maps have overhead
By treating the list as a "flattened" N-ary tree (such as a BST), there's no need for partitions to be unique! However, this requires partitions to be ordered in such a way that bisections never mistakenly believe that a parent node only contains occurrences of the same value.
To understand why, here's how this alt differs:
- If the 1st and last values of this section of the list are the same, assume all values in-between are the same; if not, continue
- Bisect this section and perform step#1 on each side
If there's even a single different value in a "blind spot", the result could be wildly incorrect!
I still don't know how to describe the requirement, but I believe it's practical to satisfy.
BTW, if the flat-tree is a heap (ahnentafel if you will), then it'd be cache-optimal! See also this HN comment