Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self-profile: add LD_PRELOAD Support #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yskelg
Copy link

@yskelg yskelg commented Aug 8, 2024

This code is a simple implementation of my idea in #1 , with a focus on making the self-profile's portability.
It seems useful, even if there's a call to the main function, because This method seems to reduce overhead compared to using perf record directly. We can directly insert the code according to its original purpose.

Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14)

$ uname -a
Linux paran 6.10.1-v8-16k+ #1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux

$ make run
export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile
Sorting...
00: { "H", 107, 0.900000 }
01: { "I", 111, 0.900000 }
02: { "G", 117, 0.900000 }
03: { "E", 127, 0.900000 }
04: { "F", 147, 0.900000 }
05: { "A", 157, 0.900000 }
06: { "K", 157, 0.900000 }
07: { "L", 157, 0.900000 }
08: { "M", 157, 0.900000 }
09: { "N", 157, 0.900000 }
10: { "O", 157, 0.900000 }
11: { "P", 157, 0.900000 }
12: { "Z", 157, 0.900000 }
13: { "C", 175, 0.900000 }
14: { "J", 227, 0.900000 }
15: { "B", 517, 0.900000 }
16: { "D", 571, 0.900000 }
PERF_COUNT_HW_CPU_CYCLES(0): 7970
export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile
Sorting...
PERF_COUNT_HW_CPU_CYCLES(0): 7444
export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch
Sorting...
PERF_COUNT_HW_CPU_CYCLES(0): 6779

This code is a simple implementation of my idea, with a focus on making
the self-profile portable.
It seems useful, even if there's a call to the main function, because
This method seems to reduce overhead compared to using "perf record" directly.
We can directly insert the code according to its original purpose.

Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14)

$ uname -a
Linux paran 6.10.1-v8-16k+ ThinkOpenly#1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux

$ make run
export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile
Sorting...
00: { "H", 107, 0.900000 }
01: { "I", 111, 0.900000 }
02: { "G", 117, 0.900000 }
03: { "E", 127, 0.900000 }
04: { "F", 147, 0.900000 }
05: { "A", 157, 0.900000 }
06: { "K", 157, 0.900000 }
07: { "L", 157, 0.900000 }
08: { "M", 157, 0.900000 }
09: { "N", 157, 0.900000 }
10: { "O", 157, 0.900000 }
11: { "P", 157, 0.900000 }
12: { "Z", 157, 0.900000 }
13: { "C", 175, 0.900000 }
14: { "J", 227, 0.900000 }
15: { "B", 517, 0.900000 }
16: { "D", 571, 0.900000 }
PERF_COUNT_HW_CPU_CYCLES(0): 7970
export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile
Sorting...
PERF_COUNT_HW_CPU_CYCLES(0): 7444
export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch
Sorting...
PERF_COUNT_HW_CPU_CYCLES(0): 6779

Signed-off-by: Yunseong Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant