solve for accidental quadratic performance from min max methods #267

vanvoorden · 2025-12-29T22:25:01Z

There seems to be an accidental quadratic happening in the MixMax methods.¹

We document our worst case performance to be O(n log n). That is our runtime performance if we sort our entire collection of values. Our threshold to sort everything is ten percent of the total count.

If we do not sort the entire collection our runtime is O(k log k + nk). The problem is that nk component. If k is ten percent of n this will still grow quadratically with large n and we will not be meeting our promise of O(n log n) worst case.

One option is to change the logic that computes our threshold. Instead of computing a threshold that scales linearly with n we can compute a threshold that scales logarithmically with n. This will then help us meet our promise of O(n log n) worst case.

The current implementation presented in this diff uses the underscored _binaryLogarithm method from Standard Library. If we wanted to avoid the underscored API we could also try something similar to what happens in RandomSample to switch over methods from Darwin and Numerics.² Or we could just write our own here in this file.

Here are some benchmarks sampled locally on this new threshold:

import Algorithms
import Foundation

func Measure<T>(
  _ label: String,
  _ cycles: Int,
  setUp: () -> () = { },
  body: () -> (T),
  condition: (T) -> Bool = { _ in true },
  tearDown: () -> () = { },
) {
  var duration = Duration.nanoseconds(0)
  
  for _ in (1 ... cycles) {
    setUp()
    let clock = ContinuousClock()
    let instant = clock.now
    let result = body()
    duration += instant.duration(to: clock.now)
    precondition(condition(result))
    tearDown()
  }
  
  duration /= cycles
  
  let microseconds = duration.formatted(
    .units(
      allowed: [.microseconds],
      fractionalPart: .show(length: 3)
    )
  )
  print("\(label): \(microseconds)")
}

func Benchmark(
  n: Int,
  cycles: Int
) {
  let a = Array(1 ... n)
  let k = a.count._binaryLogarithm() - 1
  do {
    var temp = a
    Measure("Sort", cycles) {
      temp.shuffle()
    } body: {
      temp.sort()
      return temp[k]
    } condition: { result in
      result == a[k]
    }
  }
  do {
    var temp = a
    Measure("Min", cycles) {
      temp.shuffle()
    } body: {
      return temp.min(count: k + 1)[k]
    } condition: { result in
      result == a[k]
    }
  }
}

func main() {
  let sizes = [100, 1_000, 10_000, 100_000, 1_000_000, 10_000_000]
  for size in sizes {
    print("--- Size: \(size) ---")
    Benchmark(
      n: size,
      cycles: 100
    )
    print()
  }
}

main()

//
//  swift run -c release
//
//  --- Size: 100 ---
//  Sort: 1.907 μs
//  Min: 0.594 μs
//
//  --- Size: 1000 ---
//  Sort: 26.847 μs
//  Min: 2.236 μs
//
//  --- Size: 10000 ---
//  Sort: 375.711 μs
//  Min: 12.840 μs
//
//  --- Size: 100000 ---
//  Sort: 4,738.126 μs
//  Min: 109.005 μs
//
//  --- Size: 1000000 ---
//  Sort: 56,317.570 μs
//  Min: 1,045.142 μs
//
//  --- Size: 10000000 ---
//  Sort: 682,109.937 μs
//  Min: 10,462.117 μs
//

Checklist

I've added at least one test that validates that my change is working, if appropriate
I've followed the code style of the rest of the project
I've read the Contribution Guidelines
I've updated the documentation if necessary

xwu · 2025-12-30T15:46:27Z

Sources/Algorithms/MinMax.swift

+    // If we're attempting to prefix more than log n of the collection, it's
    // faster to sort everything.
-    guard prefixCount < (self.count / 10) else {
+    guard prefixCount <= self.count._binaryLogarithm() else {


You don't need underscored API for this:

Suggested change

guard prefixCount <= self.count._binaryLogarithm() else {

guard prefixCount <= (Int.bitWidth - self.count.leadingZeroBitCount) else {

xwu · 2025-12-30T15:51:11Z

The current implementation presented in this diff uses the underscored _binaryLogarithm method from Standard Library.

For a fixed width type like Int, the floor of log2(x) is the bit width minus the leading zero bit count and can be written out as such. There is no need to call an underscored API that does the same thing.

xwu · 2025-12-30T15:55:01Z

Guides/MinMax.md

 [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). 
 The total complexity is `O(k log k + nk)`, which will result in a runtime close 
-to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of 
+to `O(n)` if *k* is a small amount. If *k* is a large amount (more than log n of 


Nit: Your implementation considers anything more than log2(n) to be “large” (not ln or log10); log2 isn’t one of the bases abbreviated “log”.

@xwu Ahh… that's a good point!

TBH… I'm also open to hearing better ideas for how to compute this new threshold. Taking the binary logarithm guarantees our O(n log n) worst-case complexity but is also maybe more conservative than necessary. For an Array of one million elements this only gives us the first 19 elements before we sort the whole Array.

Adding one to the result of the binary logarithm gives us approximately log_1.41 for the first 20 elements. We can keep going down that road and adding more… but I don't have a great answer how to analytically solve for what the optimal threshold should be that guarantees our O(n log n) worst-case complexity but keeps working on the "fast path" as much as possible.

natecook1000 · 2026-01-05T03:28:16Z

From those benchmarks it seems like there’s quite a bit of headroom to raise the threshold — are you able to do a bit of experimentation to find where that line actually is?

vanvoorden · 2026-01-05T19:39:02Z

From those benchmarks it seems like there’s quite a bit of headroom to raise the threshold — are you able to do a bit of experimentation to find where that line actually is?

https://gist.github.com/vanvoorden/57a2a9768713907677f9fbb258e2c382

@natecook1000 Here are some experiments running different log bases against different collection sizes.

It looks like log_1.25 performs better than sorting across the board.

It looks like log_{1.000244140625} performs worse than sorting across the board.

Picking something in between like log_1.00390625 performs better than sorting on collections of 100K and larger and worse than sorting on collections of 10K and smaller.

BTW these are all collections of Int values. There could be different results if we compare Element values that have much more expensive comparisons because the memory footprint is much larger.

I'm not sure how else to analytically solve for the "optimal" threshold… any more ideas about that? Is this sort of thing usually found through trial and error and inductive reasoning or more like analytic deductive reasoning?

min max performance

c4394fb

vanvoorden force-pushed the min-max-performance branch from 4b30cbb to c4394fb Compare December 30, 2025 06:48

xwu reviewed Dec 30, 2025

View reviewed changes

mdznr changed the title ~~solve for accidental quadratic performance from mix max methods~~ solve for accidental quadratic performance from min max methods Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

solve for accidental quadratic performance from min max methods #267

solve for accidental quadratic performance from min max methods #267

vanvoorden commented Dec 29, 2025 •

edited

Loading

Uh oh!

xwu Dec 30, 2025 •

edited

Loading

Uh oh!

xwu commented Dec 30, 2025

Uh oh!

xwu Dec 30, 2025

Uh oh!

vanvoorden Dec 30, 2025 •

edited

Loading

Uh oh!

natecook1000 commented Jan 5, 2026

Uh oh!

vanvoorden commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	guard prefixCount <= self.count._binaryLogarithm() else {
	guard prefixCount <= (Int.bitWidth - self.count.leadingZeroBitCount) else {

solve for accidental quadratic performance from min max methods #267

Are you sure you want to change the base?

solve for accidental quadratic performance from min max methods #267

Conversation

vanvoorden commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Footnotes

Uh oh!

xwu Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xwu commented Dec 30, 2025

Uh oh!

xwu Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

vanvoorden Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natecook1000 commented Jan 5, 2026

Uh oh!

vanvoorden commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vanvoorden commented Dec 29, 2025 •

edited

Loading

xwu Dec 30, 2025 •

edited

Loading

vanvoorden Dec 30, 2025 •

edited

Loading