-
Notifications
You must be signed in to change notification settings - Fork 16
IncrementalUpdateFeature
The Incremental Update feature is a new srcML archive that allows for incremental update of a srcML archive. The two previous types of srcML archive are individual srcML files (where one srcML file represents one source file) and multi-file archives (where one srcML file represents multiple source files).
A srcML file that represents an individual XML file is structured like this:
<unit filename="myfile.c"><!-- srcML for myfile.c --></unit>There is a single unit element that then contains the srcML version of the source code. This is primarily useful when you want to analyze a single file. Because it represents a single source file, it is very easy to re-generate when the file changes.
A multi-file archive is structured like this:
<unit>
<unit filename="myfile.c"><!-- srcML for myfile.c --></unit>
<unit filename="myheader.h"><!-- srcML for myheader.h --></unit>
</unit>There is a single root unit that contains other sub-units. Each of these sub-units represents a complete source file. This structure is very useful for querying the project or doing a large-scale transformation. Because the "filename" attribute is stored with each unit, it is also very easy to write all of the source code out to disk.
A very common query when using the ABB.SrcML framework is to do a query like this:
var archive = new SrcMLFile("pathToXml");
var newOps = from fileUnit in archive.FileUnits
from op in fileUnit.Descendants(OP.Operator)
where op.Value == "new"
select op;This iterates over all of the files in the archive and looks for instances of the new operator. This makes it easy to do searches or make changes to an entire development project instead of file-by-file.
The incrementally updating srcML archive combines these two features: single-file storage of srcML and iteration over an entire project. This srcML archive also responds to file updates (addition, deletion, modification).
Clients that wish to use get up-to-date srcML representations for source code.
Here are several use cases that inform the incremental update feature.
Sando is a Visual Studio plugin that updates its index whenever it detects that a source file has changed. The ABB.SrcML framework should be able to take over the monitoring of source code and the generation of srcML.
Once srcML is generated, Sando should be notified that a new srcML file is available so that it can update its index.
A service that monitors a directory for changes to source code files needs to be able to detect changes to source code in the directory and then generate srcML. This may be used by a 3rd-party text editor / IDE that does not have srcML integrated in it.
Pat is developing a new tool on top of ABB.SrcML and would like to use the new srcML archive without the directory monitoring component. Pat would like to create an archive composed of individual srcML files, and run experiments on it as if it was a multi-file srcML archive. He does not expect the source code to change, and therefore doesn't need to monitor the files to see if they're updated.
This feature has no dependencies.
The incrementally updating archive combines the best of both of these types. Individual source files are represented by individual srcML files. However, the srcML files are grouped together in a directory that allows code built on top of ABB.SrcML to iterate over them via the FileUnits property.
The project interface provides an interface between the representation of the "project" and the SrcMLArchive. The project interface must provide a few key functions:
This function gets a list of files from the client. This list of files is what we are monitoring for changes.
In the simplest case, the list of files is just all of the source files in a directory. A more complicated case is a Visual Studio solution or project. In this case, we would parse the project file or query Visual Studio for the list of source files.
GetListOfFiles should always return the latest collection of files as reported by the client.
The client may cache the list of files for use in monitoring.
The project should raise this event whenever it detects a change to the source code being monitored. Changes include:
- Creation
- Deletion
- Modification
This function tells project to start monitoring the list of source files for changes. It can do this either by:
- Subscribing to events (for example: FileSystemWatcher)
- Occasionally crawling the directory and comparing the contents to the contents of the archive directory
When a change is encountered, the project should raise the SourceChangedEvent.
When called, the project object should stop monitoring the list of source files. This means it should either unsubscribe to events it is listening to or it should stop crawling the directory.
After StopMonitoring is called, the SourceChangedEvent should no longer be raised.
The archive class is the primary interface for the incrementally updating archive. It supports common archive operations such as:
- iterating over the source files
- exporting the archive to source code
- Getting information about the archive (root attributes, etc)
The primary feature of this class, though, is that it implements the IProject interface. This means that it can monitor a list of source files for changes. It also wraps an IProject object that is used to do the actual monitoring of the source code.
A common use of this is to do:
IProject directory = new DirectoryProject("/path/to/source/code");
SrcMLArchive archive = new SrcMLArchive(directory);
archive.StartMonitoring(); // causes directory.StartMonitoring() to executeWhen archive is notified that there is a change, it creates/updates/removes the related srcML and then fires its own SourceChangedEvent
Given the following directory tree:
+ myCppProject/
|-- main.cpp
|-- component/
| |-- component.cpp
| +-- component.h
+-- mainHeader.h
We should get the following structure in the archive directory:
+ .srcml
|-- <hash of main.cpp path>.xml
|-- <hash of component/component.cpp>.xml
|-- <hash of component/component.h>.xml
+-- <hash of mainHeader.h>.xml
There are a number of ways to hash the file paths. The key requirement here is that the "hash" be reversible. If the path to my source file is c:\path\to\me.cpp and the relevant path in the archive is c:\path\to\archive\hash_of_me.cpp, I should be able to convert the source path into the archive path and vice versa.
Some options for implementing this "hash" include:
The reason for doing this is that development projects (such as Visual Studio projects) may include files from unrelated folders. Rather than recreate the entire path on disk in the archive directory, we encode each path using one of the above options.
-
Project: The archive is a wrapper around anIProject(calledproject, here). The archive monitors the source files inproject -
ArchivePath: The full path on disk to the directory where this archive stores its srcML files. -
SourceChangedEvent(fromIProject):SrcMLArchivefiresSourceChangedEventonly after it has updated the related srcML file.
- From
IProject:-
GetListOfFiles: QueryProjectfor the list of files -
StartMonitoring: ExecuteProject.StartMonitoring() -
StopMonitoring: ExecuteProject.StopMonitoring()
-
-
GetXmlPathForSourcePath: Gets the full path to the srcML file for the given source file -
GetSourcePathForXmlPath: Gets the full path to the source file for the given srcML file