Internal performance profiling indicates that maglev_lb.cc consumes some of the most CPU cycles for jobs running the Envoy binary.
Taking a closer look at maglev_lb.cc, we can see that there is an expensive modulo operation for every iteration over the maglev table. Some possible explanations for this cost is
- the default maglev table size being 65537 setting a large minimum number of iterations
- changing backends can trigger table rebuilds often
One possible way to improve the efficiency of building the maglev table is to avoid the use of modulo.
Benchmarks indicate a possible 15% improvement in efficiency:
Baseline
| Benchmark |
Time |
CPU |
Iterations |
Memory |
Memory per Host |
| BuildTable/100 |
2.68 ms |
2.67 ms |
266 |
67.8k |
678 |
| BuildTable/200 |
2.42 ms |
2.42 ms |
284 |
77.656k |
388 |
| BuildTable/500 |
2.60 ms |
2.60 ms |
269 |
90.584k |
181 |
Avoid use of modulo
| Benchmark |
Time |
CPU |
Iterations |
Memory |
Memory per Host |
| BuildTable/100 |
2.22 ms |
2.22 ms |
322 |
67.8k |
678 |
| BuildTable/200 |
2.05 ms |
2.05 ms |
342 |
77.656k |
388 |
| BuildTable/500 |
2.22 ms |
2.21 ms |
314 |
90.584k |
181 |
Internal performance profiling indicates that maglev_lb.cc consumes some of the most CPU cycles for jobs running the Envoy binary.
Taking a closer look at maglev_lb.cc, we can see that there is an expensive modulo operation for every iteration over the maglev table. Some possible explanations for this cost is
One possible way to improve the efficiency of building the maglev table is to avoid the use of modulo.
Benchmarks indicate a possible 15% improvement in efficiency:
Baseline
Avoid use of modulo