Super Scalar Sample
Sort
is a sorting algorithm optimized for modern-ish hardware. This is an
implementation in C++14. It is faster than std::sort
in many
cases, often only half to two thirds of the time. However, ssssort
uses quite a bit of additional memory (up to 2-3x input size).
This means that it's not applicable in all situations, but when
it is, it's pretty quick!
But you shouldn't use this code because there are even quicker methods around. In particular, IPS⁴o outperforms this code almost all of the time (benchmarks can be found in the paper, which is freely available) and doesn't have as much memory overhead. You should go and use IPS⁴o instead, and disregard this repository. Its main purpose is to serve as an implementation of Super Scalar Sample Sort for those wo want to compare their sorter to SSSS specifically.
tl;dr: use IPS⁴o instead of this code if you want a fast sorting algorithm
We performed some tests with sorting integers and compared Super Scalar Sample
Sort to std::sort
. Most notably, when sorting random integers, our
implementation ran in 50 to 65% of the time taken by std::sort
! The plot below
shows the time divided by n log(n)
, where n
is the input size. We chose this
normalization because that's the lower bound on comparison-based sorting you may
remember from your algorithms class. Thus the plot shows the time spent per
required comparison.
std::sort
is awfully fast on data that is already sorted (both
in the right and reverse order). We
can't match that. However, as soon as even 0.1% of elements aren't in the right
place, its advantage breaks down immediately! This is
also true when the first 99% of the array are sorted, and only the
last 1% contains random data.
You can find plots for some more workloads in the plots folder, or suggest new benchmarks by filing an issue. The file speed.pdf also contains the same plots with additional ±1 standard deviation lines.
We performed our experiments on a Haswell Core-i7 4790T machine with 16 GiB of DDR3-1600, but only used one core to keep things reproducible. All numbers are averages over several runs - for the randomized inputs, 100 different inputs for the small instances down to 25 different ones for the larger ones, each with 10 repetitions. For the deterministic input generators, we ran between 1000 and 100 iterations, again depending on input size. You can find the exact logic in benchmark.h.
Just include ssssort.h
and use ssssort::ssssort(Iterator begin, Iterator end)
.
Or, if you want the output to be written somewhere else, use the version with
three iterators: ssssort::ssssort(InputIt begin, InputIt end, OutputIt out_begin)
.
Note that the input range will be in an arbitrary order after calling this.
The implementation is fairly close to the paper, but uses std::sort
as base
case for sorting less than 1024 elements. As-is the code technically requires a
C++14 compiler, even though g++
is happy to compile it with -std=c++11
. The
requirement stems from the use of a variable declaration in the find_bucket
function, which is marked constexpr
. You can simply replace constexpr
with
inline
to make it valid C++11.