[experiment] Try to make index vec smaller when possible#157799
[experiment] Try to make index vec smaller when possible#157799panstromek wants to merge 22 commits into
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…try> [experiment] Try to make index vec smaller when possible
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (76803df): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.9%, secondary -1.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 519.113s -> 519.594s (0.09%) |
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…try> [experiment] Try to make index vec smaller when possible
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (e8c5fee): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.9%, secondary -1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 3.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 526.872s -> 519.342s (-1.43%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…try> [experiment] Try to make index vec smaller when possible
This comment has been minimized.
This comment has been minimized.
…try> [experiment] Try to make index vec smaller when possible
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (3ca59f1): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.6%, secondary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 488.059s -> 489.607s (0.32%) |
|
I'm a bit frustrated by the results tbh. The last two commits which avoid the ub_checks in as_slice, as_slice_mut and _into_vec fixed the majority of the regression in I tried to remove one more bounds check in ast lowering, let's see. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…try> [experiment] Try to make index vec smaller when possible
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (84df2e4): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.2%, secondary -2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 485.944s -> 485.892s (-0.01%) |
…n loading dep graph
|
Detailed results show that Based on results from other queries and local profiling with cachegrind, I see the problem is often in places that iterate over the vec and index into it in each iteration through I tried to change one of those loops to create those slices before the loop explicitly to test this hypothesis. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…try> [experiment] Try to make index vec smaller when possible
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (2e6fea9): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -2.7%, secondary -3.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 1.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 485.944s -> 488.554s (0.54%) |
View all comments
This is very ugly and unsound, but I want to check CI and perf to see whether there's at least something to gain.
Notes
TODO