Skip to content

modulo operation is very expensive in maglev implementation #44167

@etruong42

Description

@etruong42

Internal performance profiling indicates that maglev_lb.cc consumes some of the most CPU cycles for jobs running the Envoy binary.

Taking a closer look at maglev_lb.cc, we can see that there is an expensive modulo operation for every iteration over the maglev table. Some possible explanations for this cost is

  • the default maglev table size being 65537 setting a large minimum number of iterations
  • changing backends can trigger table rebuilds often

One possible way to improve the efficiency of building the maglev table is to avoid the use of modulo.

Benchmarks indicate a possible 15% improvement in efficiency:

Baseline

Benchmark Time CPU Iterations Memory Memory per Host
BuildTable/100 2.68 ms 2.67 ms 266 67.8k 678
BuildTable/200 2.42 ms 2.42 ms 284 77.656k 388
BuildTable/500 2.60 ms 2.60 ms 269 90.584k 181

Avoid use of modulo

Benchmark Time CPU Iterations Memory Memory per Host
BuildTable/100 2.22 ms 2.22 ms 322 67.8k 678
BuildTable/200 2.05 ms 2.05 ms 342 77.656k 388
BuildTable/500 2.22 ms 2.21 ms 314 90.584k 181

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions