Skip to content
Davide Cingolani edited this page Apr 22, 2016 · 1 revision

Hijacker's IBR

The IBR (Intermediate Binary Representation) is the basic mean to represent the whole input file, properly to support the rule-driven instrumentation process.

There are two levels of abstraction associated to the IBR representation, a physical level and a logical level. The former is related to the actual structure of the input file. It contains the element and their relationship "as is", so that they strictly following parsing order. Physical representation is persistent during the whole instrumentation process, and it embodies the basic structure from which to build further abstractions, namely the views. A logical representation is a mean to traverse the IBR and depends on the specific depth by which it will be interpreted. It allows to focus only on a portion of objects that are useful fo a given task.

Nodes

The Hijacker’s IBR comprises of 3 basic components, which keep track of the relevant metadata of the related entity. Each of these descriptors can be linked to the others within a network of relationships via a graph node, called nut.

Physically, a node is no more than a simple connector through which to create the whole graph. Therefore, a nut does not contains any relevant information, with respect to the parent entity it represents. Indeed, the real payload is embodied by the descriptor of the entity, which is pointed by the nut itself.

A descriptor is the container of the all the relevant metadata related to the specific object of the IBR. In Hijacker there are three main types of descritpors:

  • Instruction descriptor
  • Data nodes descriptor
  • Function nodes descriptor

Descriptors are persistent objects, conversely nodes are related to the concept of views. A view is a logical interpretation of the IBR which focuses on specific aspects of it. Each view is ephemeral, it is thought to be a tool to easily perform different tasks, by means of several views.

Physical representation of the IBR objects is called naive representation and its lifetime spans over the whole instrumentation process.

A descriptor can reference other descritpors, both instructions and data, realizing a relocation relationship. Possible references are the following:

  • Instruction to instruction, in case of a jmp or call to a specific target label in the code
  • Instruction to data, i.e. in case of mov of a constant quantity
  • Data to instruction, in case of a switch case

According to the type of each of the two endpoints, it is possible to infer which kind of relationship there is between them.

A descriptor object, independently of the specific reification (namely instruction, data or function), has a standard header that contains a bounce of hazelnuts. They are simple connectors to other hazelnuts within foreign descriptors. These objects allow a descritpor to reference another one, hence to create the basic network of relationships as parsed form the input file; in other words, the naive representation of the IBR. This structure is persistent with respect to the whole instrumentation process. From this naive representation it is possible to create several logical views dynamically on demand, as needed for the actual task.

Instruction descriptor

The instruction chain collects the assembly instructions of all the text sections of the input file. Each assembly instruction is associated to a descriptor, which is pointed by the payload field of the current view's nut.

Each instruction descriptor has a set of fields which is essentially used to keep track of the metadata associated to that instruction, needed in order to apply the instrumentation rules. At the physical level, instruction descriptors can be linked to other descriptors, both instruction and data, through the reference hazelnuts. This relationships realize relocations.

The instruction descriptor has basically the following fields:

  • Offset, the original offset wihtin the input file’s section
  • Raw bytes
  • Mnemonic
  • Size

Special mock instruction descriptors, namely peanuts, are employed as placeholders of functions, tracking their entry instruction. These "function" descriptors do not convey any relevant assembly information, but the name of the funcion. There is no limit to the number of peanuts that can be placed subsequently to each other; in this case each one will be logically linked to the first valid forward instruction in the chain.

image

This naive instruction chain inherits the natural order by which objects are met during initial parsing and it captures the intrinsic relations among instructions and between instructions and data. This structure follows the byte-order, also called instruction-order. Logically, though, these relations create a graph, therefore allowing for several possibility to be interpreted. Looking at the relationships among them it is possible to build several logical graphs, according to the actual needs. The Hijacker’s IBR retains a twofold order at the same time:

  • Byte order, or instruction order
  • Control flow order

Data descriptor

Data chain is a simple chain of nodes which points to some data in the original input file. A data descriptor is quite plain and contains the following metadata:

  • field 1
  • field 2
  • ....

Function descriptor

Function chain is a double linked list of nodes which point to the entry instruction of a function, namely the peanuts.

Views

Beyond the physical representation of the IBR, it possible to interpret it in severla ways according to the focus level we want to achieve. This is the concept of the view. A view is an abstraction layer over the physical representation of the IBR, which allows to express virtual relationships among nodes and leverages them according to the specific task to accomplish.

Depending on the inspection level and the relationship order by which the nodes are analyzed, it is therefore possible to produce a number of different views.

diagram of possible combinations of view

The concept of the view is to provide a virtual abstraction layer to the raw data, phisically stored in the IBR. Each view is ephemeral by definition, so that its lifetime is strictly bound to a specific rule, which is in charge of taking care of it. The basic structure of the IBR is the lone persistent object during the whole instrumentation process, whereas each view is created on demand as needed. To this end, there are a set of low level APIs which allow to build a new view. The APIs of higher level, are designedin order to employ views. This solution wants to force the user to build the instrumentations rules, consciusly. By working at different layer of abstraction, it is possible to simplify the instrumentation process, on the one hand, and to make possible to modify, with high precision, the IBR itself.

Each API works with those views and require that the user is aware of what is asking to do.

Ranges

Alongside views, ranges are a mean to handle portion of "contiguous" entities with respect to a specific logical view, namely the IBR's interpretation. Pragmatically, a range is a linear subportion of a view (i.e. a sub-graph) which is used to focus on specific areas of the input representation.


APIs

In order to handle the IBR, we provide a set of APIs grouped into two main categories, accroding to the level of action:

Group Description
Naive level which is the set working on the physical naive representation
Rule level which is the set working on the logical abstraction of the views

The main difference among the two groups is the abstraction level they actually work on and modify; therefore the responsabilities these APIs requires to the caller.

The main assumption beyond Hijacker's API is to defer the whole complexity to the rule level, hence callers of lower level APIs are in charge of properly manage metadata in order to maintain consistency where the effects spans over multiple layer of abstraction.

There are two dimesions by which to create views:

Instruction Basic Blocks Functions
Graph Instruction Graph Control Flow Graph Function Call Graph
Linear Instruction Chain -- --

Low level

  graph * view_instruction_graph();

High level

  instruction_add();

Clone this wiki locally