Everett

Hijacker

Regions

Definition

Regions are contiguous sequences of bytes which are conveniently handled as a whole by the rule writer. Each region is associated with:

a payload
a size in bytes
a type (see below)

Regions can be nested into other regions, therefore the payload can be either a raw sequence of bytes or a list of child regions. As such a region also has:

a parent region (if any)
an offset from the beginning of the parent region (defaults to 0)
a list of child regions (if any)

What is important to note is that by definition of regions, the list of child regions must globally define a contiguous sequence of bytes.

Views

Regions can be parsed according to a view, that is an interpretation of the bytes that are contained in the region. The parsing of a region according to a view generates a new region with child regions.

For example, for code we have the following views:

the section view parses a region r into a region r' whose children are section regions, that are regions whose type is SECTION
the function view parses a region r into a region r' whose children are function regions, that are regions whose type is FUNCTION
the basic-block view parses a region r into a region r' whose children are basic-block regions, that are regions whose type is BLOCK
the instruction view parses a region r into a region r' whose children are instruction regions, that are regions whose type is INSTRUCTION

As for raw data, we might have other views to parse, e.g., debug information or other objects.

Upon parsing an object file, Hijacker does the following:

Creates a unique region r for the whole object file
Applies the section view on r to obtain r'
For each children q or r', applies the instruction view on q to obtain q'

The regions obtained in this phase are sticky and every other view is built upon these regions. As such, side-effects on the contents of the object file are enabled by operating directly on those regions or by applying views over them.

Operations

There are some basic operations that can be performed on a region.

rgn_t *region_attach(rgn_t *new, rgn_t *pivot, rgn_insert_mode mode)

Link regions together using the pivot region as a reference for the insertion mode:

RGN_ATTACH_BEFORE: the new region is inserted as a sibling region wrt the pivot, right before it. The update involves the pivot's parent region.
RGN_ATTACH_AFTER: the new region is inserted as a sibling region wrt the pivot, right after it. The update involves the pivot's parent region.
RGN_ATTACH_FIRST: the new region is inserted as the first child region wrt the pivot. The update involves the pivot's current first child region.
RGN_ATTACH_LAST: the new region is inserted as the last child region wrt the pivot. The update involves the pivot's current last child region.

The possible applications of this function are the following:

To insert an instruction region right before another instruction region. Note that depending on the implicit view, this may or may not have effect on the basic-block view, hence on the update of jump instruction immediates.

void region_detach(rgn_t *target)

Remove a region from the parent chain of child regions, without de-allocating memory. The update involves the target's parent region.

rgn_t *region_split(rgn_t *target, rgn_t *pivot, rgn_split_mode mode)

Split a non-leaf region along the pivot according to the splitting mode:

RGN_SPLIT_FIRST: the pivot region is considered the first child region of the right-hand region resulting from the split operation. The update involves the target's parent region.
RGN_SPLIT_LAST: the pivot region is considered the first child region of the left-hand region resulting from the split operation. The update involves the target's parent region.

rgn_t *region_split_leaf(rgn_t *target, addr_t offset)

Split a leaf region along an offset according to the splitting mode:

RGN_SPLIT_FIRST: the right-hand region resulting from the split operation begins at the offset passed as input. The update involves the target's parent region.
RGN_SPLIT_LAST: the left-hand region resulting from the split operation ends at the offset passed as input. The update involves the target's parent region.

rgn_t *region_merge(rgn_t *a, rgn_t *b)

Merge two regions into a new region, which is returned. The update involves the input regions' common parent region. The input regions aren't changed.

list_t<rgn_t *> region_parse(rgn_t *target, rgn_parse_kernel kernel)

Parses a target region into a list of regions according to a parse kernel (i.e., a view):

rgn_parse_section
rgn_parse_function
rgn_parse_block
rgn_parse_instruction

Note that custom kernels can be defined. For example, if we are only interested in transactions, we can write a parse kernel which returns a list of transaction regions, which in turn contain instruction regions as children. Clearly, a list of multiple regions is returned in this case---as opposed to a list of a single region---since transactions aren't necessarily byte-contiguous in the input object file.

References

Output emission modes

It can be interesting for Hijacker users to select different output emission modes for each instrumentation version. To understand why, consider this futuristic tool-chain:

COMPILER -> ASSEMBLER -> INSTRUMENTER -> OPTIMIZER -> LINKER

In this tool-chain, the assembler always emits code with no optimizations. This is done for the purpose of both aiding the instrumented in its code analysis phase---so as to avoid mistakes---and because instrumentation itself may harm optimizations performed in previous phases.

By making the optimization phase a first-class member of the default tool-chain, optimizations could be postponed until Hijacker's emit step, which now has all the available information to perform code optimization techniques which take into account the injected instrumentation code.

To support this scenario, one might imagine at least two output emission modes:

Naive emission: Code is emitted in its byte order, thus relying heavily on regions for this purpose.
Optimized emission: Code is emitted in an order which might be different from the byte order, thus depending less and less from regions.

Naive emission example

As an example of naive emission, Hijacker does the following:

For each children r' of r, merges all child regions of r'
Merges all child regions of r
Writes r back into a file

This guarantees that byte order is preserved and is also quite intuitive to understand and implement.

Optimized emission example

As an example of optimized emission, Hijacker can exploit control-flow information at the basic-block level to invert the mutual order of some regions or shift their positions with respect to the parent region, so as to create a more optimized object file in terms of memory alignment, static branch prediction, and so on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Everett

Hijacker

Regions

Definition

Views

Operations

References

Output emission modes

Naive emission example

Optimized emission example

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally