-
Notifications
You must be signed in to change notification settings - Fork 5
Everett
Regions are contiguous sequences of bytes which are conveniently handled as a whole by the rule writer. Each region is associated with:
- a payload
- a size in bytes
- a type (see below)
Regions can be nested into other regions, therefore the payload can be either a raw sequence of bytes or a list of child regions. As such a region also has:
- a parent region (if any)
- an offset from the beginning of the parent region (defaults to 0)
- a list of child regions (if any)
What is important to note is that by definition of regions, the list of child regions must globally define a contiguous sequence of bytes.
Regions can be parsed according to a view, that is an interpretation of the bytes that are contained in the region. The parsing of a region according to a view generates a new region with child regions.
For example, for code we have the following views:
-
the section view parses a region
rinto a regionr'whose children are section regions, that are regions whose type isSECTION -
the function view parses a region
rinto a regionr'whose children are function regions, that are regions whose type isFUNCTION -
the basic-block view parses a region
rinto a regionr'whose children are basic-block regions, that are regions whose type isBLOCK -
the instruction view parses a region
rinto a regionr'whose children are instruction regions, that are regions whose type isINSTRUCTION
As for raw data, we might have other views to parse, e.g., debug information or other objects.
Upon parsing an object file, Hijacker does the following:
- Creates a unique region
rfor the whole object file - Applies the section view on
rto obtainr' - For each children
qorr', applies the instruction view onqto obtainq'
The regions obtained in this phase are sticky and every other view is built upon these regions. As such, side-effects on the contents of the object file are enabled by operating directly on those regions or by applying views over them.
There are some basic operations that can be performed on a region.
rgn_t *region_attach(rgn_t *new, rgn_t *pivot, rgn_insert_mode mode)Link regions together using the pivot region as a reference for the insertion mode:
-
RGN_ATTACH_BEFORE: the new region is inserted as a sibling region wrt the pivot, right before it. The update involves the pivot's parent region. -
RGN_ATTACH_AFTER: the new region is inserted as a sibling region wrt the pivot, right after it. The update involves the pivot's parent region. -
RGN_ATTACH_FIRST: the new region is inserted as the first child region wrt the pivot. The update involves the pivot's current first child region. -
RGN_ATTACH_LAST: the new region is inserted as the last child region wrt the pivot. The update involves the pivot's current last child region.
The possible applications of this function are the following:
- To insert an instruction region right before another instruction region. Note that depending on the implicit view, this may or may not have effect on the basic-block view, hence on the update of jump instruction immediates.
void region_detach(rgn_t *target)Remove a region from the parent chain of child regions, without de-allocating memory. The update involves the target's parent region.
rgn_t *region_split(rgn_t *target, rgn_t *pivot, rgn_split_mode mode)Split a non-leaf region along the pivot according to the splitting mode:
-
RGN_SPLIT_FIRST: the pivot region is considered the first child region of the right-hand region resulting from the split operation. The update involves the target's parent region. -
RGN_SPLIT_LAST: the pivot region is considered the first child region of the left-hand region resulting from the split operation. The update involves the target's parent region.
rgn_t *region_split_leaf(rgn_t *target, addr_t offset)Split a leaf region along an offset according to the splitting mode:
-
RGN_SPLIT_FIRST: the right-hand region resulting from the split operation begins at the offset passed as input. The update involves the target's parent region. -
RGN_SPLIT_LAST: the left-hand region resulting from the split operation ends at the offset passed as input. The update involves the target's parent region.
rgn_t *region_merge(rgn_t *a, rgn_t *b)Merge two regions into a new region, which is returned. The update involves the input regions' common parent region. The input regions aren't changed.
list_t<rgn_t *> region_parse(rgn_t *target, rgn_parse_kernel kernel)Parses a target region into a list of regions according to a parse kernel (i.e., a view):
rgn_parse_sectionrgn_parse_functionrgn_parse_blockrgn_parse_instruction
Note that custom kernels can be defined. For example, if we are only interested in transactions, we can write a parse kernel which returns a list of transaction regions, which in turn contain instruction regions as children. Clearly, a list of multiple regions is returned in this case---as opposed to a list of a single region---since transactions aren't necessarily byte-contiguous in the input object file.
It can be interesting for Hijacker users to select different output emission modes for each instrumentation version. To understand why, consider this futuristic tool-chain:
COMPILER -> ASSEMBLER -> INSTRUMENTER -> OPTIMIZER -> LINKER
In this tool-chain, the assembler always emits code with no optimizations. This is done for the purpose of both aiding the instrumented in its code analysis phase---so as to avoid mistakes---and because instrumentation itself may harm optimizations performed in previous phases.
By making the optimization phase a first-class member of the default tool-chain, optimizations could be postponed until Hijacker's emit step, which now has all the available information to perform code optimization techniques which take into account the injected instrumentation code.
To support this scenario, one might imagine at least two output emission modes:
- Naive emission: Code is emitted in its byte order, thus relying heavily on regions for this purpose.
- Optimized emission: Code is emitted in an order which might be different from the byte order, thus depending less and less from regions.
As an example of naive emission, Hijacker does the following:
- For each children
r'ofr, merges all child regions ofr' - Merges all child regions of
r - Writes
rback into a file
This guarantees that byte order is preserved and is also quite intuitive to understand and implement.
As an example of optimized emission, Hijacker can exploit control-flow information at the basic-block level to invert the mutual order of some regions or shift their positions with respect to the parent region, so as to create a more optimized object file in terms of memory alignment, static branch prediction, and so on.