-
Notifications
You must be signed in to change notification settings - Fork 601
i#7685: Initial implementation of the BBV client #7687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Initial implementation of client that generates Basic Block Vectors (BBVs), which capture Basic Blocks (BBs) execution frequencies within a user-provided instruction interval. The client registers two events: - `drmgr_register_bb_instrumentation_event()` to instrument BBs. - `drmgr_register_exit_event()` to save the collected BBV data to a file. Example usage: ``` ./bin64/drrun -c api/bin/libbbv.so -instr_interval 10000 -- ls . ``` Generates a proc.APP_NAME.PID.UNIQUE_ID.bb file. The instrumentation is a single clean call to a function. Because clean calls are expensive we plan to inline the counter updates in the instrumented BBs themselves and jump to the rest of the instrumentation as a cleaan call only when the given instruction interval is reached. Currently we generate a unique ID for BBs using the BB first instruction PC (its absolute value). This is not a reliable method. We plan to use the `drmodtrack` library to compute instruction module offsets instead. Adds a new `hashtable_t` iterator: `hashtable_apply_to_all_key_payload_pairs_user_data()` that exposes the key of each payload when iterating over the elements of an `hashtable_t`. This is necessary as we need to save both the BB ID (key) and hit count (payload) once we reach the end of an instruction interval. The alternative would be to add the key to the payload, but it seems unnecessary and consumes more memory. Issue #7685
|
The fail on macos does not seem related to this change. |
api/samples/bbv.cpp
Outdated
| @@ -0,0 +1,342 @@ | |||
| /* ********************************************************** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a first-class DR tool in clients/drpoints/. api/samples/* are smaller, less polished, meant more as starting points for users to modify than final tools, and isolated with no auxiliary support. drbbv OTOH is meant to be used as-is with end to end support for feeding to post processing frameworks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be additional post-processing tools included in this repository, or the client itself may span multiple files? That also makes a case for a first-class top-level tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we're not planning additional files or post-processing tools here in the dynamorio repo (internally we will process the basic block vectors generated by this client though).
There might be additional files/tools in the future.
Still ok with me to move it into clients/drpoints though.
Or does your opinion change if it's only one file?
Perhaps we should just call this the BBV client and DrPoints the processed basic block vectors (into instruction intervals), since those are the actual "points"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to client/drpoints.
api/docs/release.dox
Outdated
| PC in the trace, which is from the next instruction or the next | ||
| #dynamorio::drmemtrace::TRACE_MARKER_TYPE_KERNEL_EVENT marker, whichever comes | ||
| first. | ||
| - Added a new hashtable_t iterator hashtable_apply_to_all_key_payload_pairs_user_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#hashtable_t should make a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to different PR.
api/samples/bbv.cpp
Outdated
| @@ -0,0 +1,342 @@ | |||
| /* ********************************************************** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a mismatch to call it DrPoints but have it in a file named "bbv.cpp". Once in clients/drpoints/, will there be additional pieces and the BBV client is just one piece? Then bbv.cpp for the client instead of drpoints.cpp could make sense; but drpoints.cpp seems to fit better if that's the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to drpoints.cpp.
| * @param user_data User data that is available when iterating through payloads. | ||
| */ | ||
| void | ||
| hashtable_apply_to_all_key_payload_pairs_user_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to move this to its own PR with a test added to ./suite/tests/client-interface/drcontainers-test.dll.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/samples/bbv.cpp
Outdated
|
|
||
| // Global hash table that maps the PC of a BB's first instruction to a unique, | ||
| // increasing ID that comes from unique_bb_count. | ||
| static hashtable_t pc_to_id_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe the thread safety aspects of these global vars. Do they use the internal hashtable lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
They don't use any lock.
We only support single threaded applications at the moment, so no need for a locking mechanism (unless DR does some of these things in parallel that I missed? Like instrumenting BBs in parallel?).
api/samples/bbv.cpp
Outdated
| { | ||
| // By default drmgr enables auto-predication, which predicates all instructions | ||
| // with the predicate of the current instruction on ARM. We disable it here | ||
| // because we want to unconditionally execute the following instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you'll include unexecuted predicated instructions in the bb size count? Worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Yes, from my testing it seems that perf will count predicated instructions with a false predicate as well (probably as a no-op), so we want to include them to match the perf instruction count (necessary when we validate the resulting representative intervals).
api/samples/bbv.cpp
Outdated
| (int *)hashtable_lookup(&hit_count_table, (void *)(ptr_int_t)bb_id); | ||
| if (hit_count_ptr == NULL) { | ||
| // If hit count is not mapped to this BB, then add a new count to the table. | ||
| int *hit_count = (int *)dr_global_alloc(sizeof(int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be the value instead of needing the indirection? If it starts at 1 I guess so there's a sentinel value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to start at 0 though, because we use that as sentinel value to filter out BBs that were not executed in the current instruction interval. If we use another sentinel value like -1 that will complicate the instrumentation logic that increases the hit_count at runtime. I think it's better we pay the cost of indirection and allocate a counter at instrumentation time instead.
api/samples/bbv.cpp
Outdated
| } | ||
|
|
||
| // Get the number of instructions in the BB. | ||
| int *bb_size_ptr = (int *)hashtable_lookup(&bb_size_table, (void *)(ptr_int_t)bb_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically unsafe as the same start PC could have a different sized block after it, even w/o a JIT: app could load lib A then unload and load lib B in overlapping place; or even re-load A but run code in a different order; or client could flush all the app code and then it runs in different order.
Better: pass the bb size as the user_data (from analysis event) and get rid of this hashtable altogether: both more efficient and more correct in corner cases.
Once this is inlined the size will also be known as a literal right there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Though the actual issue is above in pc_to_id_map, that's where we use the first BB instruction PC to create a sequential BB ID that starts from 1 (as required by SimpointToolkit).
I have not yet addressed this, but I think this PR is getting quite large, so I'd rather defer this to later.
There is a TODO comment about it.
api/samples/bbv.cpp
Outdated
| instr_count += bb_size; | ||
|
|
||
| // We reached the end of the instruction interval. | ||
| if (instr_count > (int)instr_interval.get_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will not hit the interval exactly: that will be a problem? Do we need perfect precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem. We can't really split BBs, so whatever BB reaches or goes over the instr_interval is the cutoff for that interval and its corresponding BBV. From what I've seen so far, BBV go over instr_interval by a single digit most of the time; highest I've seen is ~25 instructions.
api/samples/bbv.cpp
Outdated
| DROPTION_SCOPE_CLIENT, "instr_interval", 100000000 /*=100M instructions*/, | ||
| "The instruction interval to generate BBVs", | ||
| "Divides the program execution in instruction intervals of the specified size and " | ||
| "generates BBVs using the BB hit count frequency within the interval and the number " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/and/multiplied by/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Add inlined counter update for X86_64. Makes counters 64 bit.
edeiana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet done. Adding tests, but wanted to push what I have right now since it's been a while.
api/docs/release.dox
Outdated
| PC in the trace, which is from the next instruction or the next | ||
| #dynamorio::drmemtrace::TRACE_MARKER_TYPE_KERNEL_EVENT marker, whichever comes | ||
| first. | ||
| - Added a new hashtable_t iterator hashtable_apply_to_all_key_payload_pairs_user_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to different PR.
api/samples/bbv.cpp
Outdated
| @@ -0,0 +1,342 @@ | |||
| /* ********************************************************** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to client/drpoints.
api/samples/bbv.cpp
Outdated
| @@ -0,0 +1,342 @@ | |||
| /* ********************************************************** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to drpoints.cpp.
api/samples/bbv.cpp
Outdated
| /* DrPoints: Basic Block Vector (BBV) Client. | ||
| * | ||
| * Given a user-defined instruction interval, computes the BBVs (histogram of BB | ||
| * frequencies within the interval) of a program execution and outputs them in a .bb file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of.
The SimpointToolkit has an example of BBV input file and its name is sample.bb (https://github.com/hanhwi/SimPoint/blob/master/input/sample.bb), so I mirrored that.
The file extension doesn't really matter for SimpointToolkit though, so we can use .bbv, which I agree makes more sense.
Changed suffix to .bbv and prefix to drpoints. instead of proc..
api/samples/bbv.cpp
Outdated
| DROPTION_SCOPE_CLIENT, "instr_interval", 100000000 /*=100M instructions*/, | ||
| "The instruction interval to generate BBVs", | ||
| "Divides the program execution in instruction intervals of the specified size and " | ||
| "generates BBVs using the BB hit count frequency within the interval and the number " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/samples/bbv.cpp
Outdated
| static hashtable_t bb_size_table; | ||
|
|
||
| // Global unique BB counter used as ID. | ||
| static int unique_bb_count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/samples/bbv.cpp
Outdated
| // Key: unique BB ID, value: hit count. | ||
| static hashtable_t hit_count_table; | ||
|
|
||
| // Global hash table to save the instruction size of each BB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of inlined increments, the issue was that I was incrementing the BB execution count by 1 and then multiplying that by the BB size (which I had to save somewhere, hence this table) at the end of the instruction interval, while I should have just incremented by BB size in the first place.
Anyway, this table is gone.
api/samples/bbv.cpp
Outdated
|
|
||
| // Global hash table that maps the PC of a BB's first instruction to a unique, | ||
| // increasing ID that comes from unique_bb_count. | ||
| static hashtable_t pc_to_id_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
They don't use any lock.
We only support single threaded applications at the moment, so no need for a locking mechanism (unless DR does some of these things in parallel that I missed? Like instrumenting BBs in parallel?).
api/samples/bbv.cpp
Outdated
|
|
||
| // Global instruction counter to keep track of when we reach the end of the user-defined | ||
| // instruction interval. | ||
| static int instr_count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment (they are not). See above.
api/samples/bbv.cpp
Outdated
| } | ||
|
|
||
| // Get the number of instructions in the BB. | ||
| int *bb_size_ptr = (int *)hashtable_lookup(&bb_size_table, (void *)(ptr_int_t)bb_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Though the actual issue is above in pc_to_id_map, that's where we use the first BB instruction PC to create a sequential BB ID that starts from 1 (as required by SimpointToolkit).
I have not yet addressed this, but I think this PR is getting quite large, so I'd rather defer this to later.
There is a TODO comment about it.
| scheduling of non-i-filtered thread-sharded traces that returns the next continuous | ||
| PC in the trace, which is from the next instruction or the next | ||
| #dynamorio::drmemtrace::TRACE_MARKER_TYPE_KERNEL_EVENT marker, whichever comes | ||
| first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which capture Basic Blocks (BBs) execution frequencies within a
nit: s/Blocks/Block/; s/BBs/BB/
| first. | ||
| - Added a new #hashtable_t iterator hashtable_apply_to_all_key_payload_pairs_user_data() | ||
| that exposes the key of every element in the table. | ||
| - Added a new Dynamorio client "DrPoints" in clients/drpoints to compute the Basic Block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DynamoRIO
| first. | ||
| - Added a new #hashtable_t iterator hashtable_apply_to_all_key_payload_pairs_user_data() | ||
| that exposes the key of every element in the table. | ||
| - Added a new Dynamorio client "DrPoints" in clients/drpoints to compute the Basic Block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point to the doxygen page for drpoints (see other comment on adding .dox page) -- unless not part of this PR.
| ########################################################################### | ||
| # drpoints tests | ||
|
|
||
| if (X86 OR AARCH64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO i#nnn comment on expanding the tests to all platforms.
| # drpoints tests | ||
|
|
||
| if (X86 OR AARCH64) | ||
| if (NOT WINDOWS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Windows? Needs explanation and probably a XXX i# comment; this is a pretty simple client and it should be straightforward to make it cross-platform.
| instr_count += bb_size; | ||
|
|
||
| // We reached the end of the instruction interval. | ||
| if (instr_count > instr_interval.get_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>= ?
| // If the user defined instruction interval is reached, jump to a clean call of the | ||
| // instrumentation function that saves the current BBV, otherwise jump to the rest of | ||
| // the BB and continue. | ||
| MINSERT(bb, inst, INSTR_CREATE_jcc(drcontext, OP_jle, opnd_create_instr(skip_call))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why skip when the count equals the interval? Aren't we done then?
| @@ -0,0 +1,4 @@ | |||
| T:1:5 :2:5 :3:5 [ ]* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the brackets for? This is a regex, right? So just * to match zero or more spaces.
| if (X86 OR AARCH64) | ||
| if (NOT WINDOWS) | ||
| torunonly_ci(tool.drpoints.simple ${ci_shared_app} drpoints | ||
| client-interface/${ci_shared_app} # for .expect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this doesn't test that drpoints did anything. Please add an output template for a sanity check that drpoints actually wrote something. It may be nicer to not print to stdout and instead write the file and add a CMake script checking the file actually exists: so there's an end to end test on the output file being written.
| DROPTION_SCOPE_CLIENT, "no_out_bbv_file", false, | ||
| "Disables the generation of the output .bbv file", | ||
| "Disables the generation of the output .bbv file, but still runs the client. Useful " | ||
| "for unit tests or paired with -print_to_stdout. Default is false."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the file normally written? I don't see any control over that: best to add a parameter for that.
Initial implementation of a client that generates Basic Block Vectors (BBVs),
which capture Basic Blocks (BBs) execution frequencies within a
user-provided instruction interval.
The client registers three events:
drmgr_register_bb_instrumentation_event()to instrument BBs.drmgr_register_exit_event()to save the collected BBV data to a file.drmgr_register_thread_init_event()to check for multi-threaded programsand abort if so (multi-threaded program are not currently supported).
Example usage:
Generates a
drpoints.APP_NAME.PID.UNIQUE_ID.bbvfile.This is an initial, working implementation, but there are missing pieces:
Instrumentation optimizations for AARCH64 and 32 bit architectures.
The instrumentation is a single clean call to a function in this case.
Because clean calls are expensive, for X86_64, we inline the counter updates in
the instrumented BBs themselves and jump to the rest of the instrumentation
as a clean call only when the given instruction interval is reached. We plan to
do the same for, at least, AARCH64 architectures.
Use of absolute PC values.
Currently we generate a unique ID for BBs using the BB first instruction PC
(its absolute value). This is not a reliable method. We plan to use the
drmodtracklibrary to compute instruction module offsets instead.
Issue #7685