Skip to content

IncrementalUpdateForDataFeature

Vinay Augustine edited this page Oct 18, 2012 · 1 revision

Definition

Feature Overview

ABB.SrcML.Data is a library for producing program-related data from srcML. It focuses on producing maps and relationships between program elements. Some of the maps it currently produces are:

  • Type maps: Given a type usage (such as in a variable declaration), where is the type definition?
  • Variable maps: Given a variable usage, where is the variable declaration?
    • If the variable is attached to an object, what object? (i.e. resolve b in the statement a.b — this relies on the type map)
  • Function maps: Given a function call, what is the definition for the function? Because we may have to resolve method calls on objects and argument variables, this depends on both the variable map and the type map.

SrcMLData also provides a full function call graph on top of these maps.

An Example

The basic idea behind SrcML.Data is that we can infer much of the information about programming statements by looking at the structure of the srcML document. For instance, given the following C++ code:

int MyObject::PrintTheString(string theString)
{
	if(theString.length() > 0)
		cout << theString;
	return theString.length();
}

We get the following srcML:

<function><type><name>int</name></type> <name><name>MyObject</name><op:operator>::</op:operator><name>PrintTheString</name></name><parameter_list>(<param><decl><type><name>string</name></type> <name>theString</name></decl></param>)</parameter_list>
<block>{
	<if>if<condition>(<expr><name>theString</name><op:operator>.</op:operator><call><name>length</name><argument_list>()</argument_list></call> <op:operator>&gt;</op:operator> <lit:literal type="number">0</lit:literal></expr>)</condition><then>
		<expr_stmt><expr><name>cout</name> <op:operator>&lt;&lt;</op:operator> <name>theString</name></expr>;</expr_stmt></then></if>
	<return>return <expr><name>theString</name><op:operator>.</op:operator><call><name>length</name><argument_list>()</argument_list></call></expr>;</return>
}</block></function>

without having to compile the function, we can learn a number of things from the XML. First, what do we know about this function:

  • The function is named PrintTheString
  • It is a member of the MyObject class
  • It returns an integer
  • It takes one argument (a string)

From this information, we can easily construct a signature for this method.

Additionally, we can look at the use of a variable in the function body (theString). theString is used three times in the body of the method. Where is it declared? We can answer this question by:

  1. Look at the current block: Is theString declared? No
  2. Look at the function: Is theString declared? Yes — it is an argument to the method

Now we have the declaration for theString and its type.

The current state

The current SrcML.Data implementation parses a multi-file srcML archive and dumps all of the relevant data (variable declarations, type definitions, method definitions, etc) into a SQLServer database. Even for relatively small programs (for example Notepad++), it can take 5-10 minutes on a Core i5 laptop.

Another deficiency is updating: the current implementation requires the entire dataset to be thrown away and regenerated when any source file changes. This is not sustainable.

Users & Use cases

Users

List users of the feature here

Use cases

Describe use-cases for this feature here

Dependencies

Design

Describe the design of the feature here

Clone this wiki locally