-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would it be good to create a library object and use LD_PRELOAD
to support self-profiling?
#1
Comments
I'm pleased that you like it!
Are you suggesting creating a library to be pre-loaded with a Do you see significant advantage to using
Tell me more about what you are suggesting here. The current implementation requires that the code to be profiled be instrumented with |
This code is a simple implementation of my idea, with a focus on making the self-profile portable. It seems useful, even if there's a call to the main function, because This method seems to reduce overhead compared to using "perf record" directly. We can directly insert the code according to its original purpose. Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14) $ uname -a Linux paran 6.10.1-v8-16k+ ThinkOpenly#1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux $ make run export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile Sorting... 00: { "H", 107, 0.900000 } 01: { "I", 111, 0.900000 } 02: { "G", 117, 0.900000 } 03: { "E", 127, 0.900000 } 04: { "F", 147, 0.900000 } 05: { "A", 157, 0.900000 } 06: { "K", 157, 0.900000 } 07: { "L", 157, 0.900000 } 08: { "M", 157, 0.900000 } 09: { "N", 157, 0.900000 } 10: { "O", 157, 0.900000 } 11: { "P", 157, 0.900000 } 12: { "Z", 157, 0.900000 } 13: { "C", 175, 0.900000 } 14: { "J", 227, 0.900000 } 15: { "B", 517, 0.900000 } 16: { "D", 571, 0.900000 } PERF_COUNT_HW_CPU_CYCLES(0): 7970 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 7444 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 6779 Signed-off-by: Yunseong Kim <[email protected]>
This code is a simple implementation of my idea, with a focus on making the self-profile portable. It seems useful, even if there's a call to the main function, because This method seems to reduce overhead compared to using "perf record" directly. We can directly insert the code according to its original purpose. Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14) $ uname -a Linux paran 6.10.1-v8-16k+ ThinkOpenly#1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux $ make run export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile Sorting... 00: { "H", 107, 0.900000 } 01: { "I", 111, 0.900000 } 02: { "G", 117, 0.900000 } 03: { "E", 127, 0.900000 } 04: { "F", 147, 0.900000 } 05: { "A", 157, 0.900000 } 06: { "K", 157, 0.900000 } 07: { "L", 157, 0.900000 } 08: { "M", 157, 0.900000 } 09: { "N", 157, 0.900000 } 10: { "O", 157, 0.900000 } 11: { "P", 157, 0.900000 } 12: { "Z", 157, 0.900000 } 13: { "C", 175, 0.900000 } 14: { "J", 227, 0.900000 } 15: { "B", 517, 0.900000 } 16: { "D", 571, 0.900000 } PERF_COUNT_HW_CPU_CYCLES(0): 7970 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 7444 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 6779 Signed-off-by: Yunseong Kim <[email protected]>
This code is a simple implementation of my idea, with a focus on making the self-profile portable. It seems useful, even if there's a call to the main function, because This method seems to reduce overhead compared to using "perf record" directly. We can directly insert the code according to its original purpose. Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14) $ uname -a Linux paran 6.10.1-v8-16k+ ThinkOpenly#1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux $ make run export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile Sorting... 00: { "H", 107, 0.900000 } 01: { "I", 111, 0.900000 } 02: { "G", 117, 0.900000 } 03: { "E", 127, 0.900000 } 04: { "F", 147, 0.900000 } 05: { "A", 157, 0.900000 } 06: { "K", 157, 0.900000 } 07: { "L", 157, 0.900000 } 08: { "M", 157, 0.900000 } 09: { "N", 157, 0.900000 } 10: { "O", 157, 0.900000 } 11: { "P", 157, 0.900000 } 12: { "Z", 157, 0.900000 } 13: { "C", 175, 0.900000 } 14: { "J", 227, 0.900000 } 15: { "B", 517, 0.900000 } 16: { "D", 571, 0.900000 } PERF_COUNT_HW_CPU_CYCLES(0): 7970 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 7444 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 6779 Signed-off-by: Yunseong Kim <[email protected]>
This code is a simple implementation of my idea, with a focus on making the self-profile portable. It seems useful, even if there's a call to the main function, because This method seems to reduce overhead compared to using "perf record" directly. We can directly insert the code according to its original purpose. Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14) $ uname -a Linux paran 6.10.1-v8-16k+ ThinkOpenly#1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux $ make run export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile Sorting... 00: { "H", 107, 0.900000 } 01: { "I", 111, 0.900000 } 02: { "G", 117, 0.900000 } 03: { "E", 127, 0.900000 } 04: { "F", 147, 0.900000 } 05: { "A", 157, 0.900000 } 06: { "K", 157, 0.900000 } 07: { "L", 157, 0.900000 } 08: { "M", 157, 0.900000 } 09: { "N", 157, 0.900000 } 10: { "O", 157, 0.900000 } 11: { "P", 157, 0.900000 } 12: { "Z", 157, 0.900000 } 13: { "C", 175, 0.900000 } 14: { "J", 227, 0.900000 } 15: { "B", 517, 0.900000 } 16: { "D", 571, 0.900000 } PERF_COUNT_HW_CPU_CYCLES(0): 7970 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 7444 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 6779 Signed-off-by: Yunseong Kim <[email protected]>
This code is a simple implementation of my idea, with a focus on making the self-profile portable. It seems useful, even if there's a call to the main function, because This method seems to reduce overhead compared to using "perf record" directly. We can directly insert the code according to its original purpose. Here are the test results from my Raspberry Pi 5, gcc version 12.2.0 (Debian 12.2.0-14) $ uname -a Linux paran 6.10.1-v8-16k+ ThinkOpenly#1 SMP PREEMPT Sat Jul 27 17:52:03 KST 2024 aarch64 GNU/Linux $ make run export PERF_COUNT_HW_CPU_CYCLES=1; ./test_profile Sorting... 00: { "H", 107, 0.900000 } 01: { "I", 111, 0.900000 } 02: { "G", 117, 0.900000 } 03: { "E", 127, 0.900000 } 04: { "F", 147, 0.900000 } 05: { "A", 157, 0.900000 } 06: { "K", 157, 0.900000 } 07: { "L", 157, 0.900000 } 08: { "M", 157, 0.900000 } 09: { "N", 157, 0.900000 } 10: { "O", 157, 0.900000 } 11: { "P", 157, 0.900000 } 12: { "Z", 157, 0.900000 } 13: { "C", 175, 0.900000 } 14: { "J", 227, 0.900000 } 15: { "B", 517, 0.900000 } 16: { "D", 571, 0.900000 } PERF_COUNT_HW_CPU_CYCLES(0): 7970 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./preload_test_profile Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 7444 export PERF_COUNT_HW_CPU_CYCLES=1; LD_PRELOAD=self-profile.so ./bsearch Sorting... PERF_COUNT_HW_CPU_CYCLES(0): 6779 Signed-off-by: Yunseong Kim <[email protected]>
Thank you @ThinkOpenly for your comments, which have helped me to articulate the self-profiling project more clearly. One of the key strengths of this project, in my opinion, is the ability to focus profiling specifically on the code where it's needed most. As you know, If used alongside production code, I believe we can divide the activation into macros and build options—similar to how static trace points are activated with This project has reminded me of the importance of understanding the underlying principles to explore new approaches, rather than always relying on existing tools passively.
I think my focus is on portability with other executable program. My PR is a Proof of Concept based on what I’ve implemented so far, and I’m happy to update it with any additional ideas you might have. In #2 , I implemented the ability to measure the original main function.
If there’s a specific function the user wants to profile, similar to the main function in P.S. Once again, thank you for the inspiration. |
Wow, This is really great Idea. Thank you for the inspiration @ThinkOpenly.
Using
LD_PRELOAD
to execute at the start withconstructor
and terminate atexit
would be very convenient for profiling other program!If we consider the interface of that library, we could also measure specific functions.
The text was updated successfully, but these errors were encountered: