-
Notifications
You must be signed in to change notification settings - Fork 10
Analyzer
This document was generated from 'src/documentation/wiki-analyzer.ts' on 2025-12-01, 16:14:34 UTC presenting an overview of flowR's analyzer (v2.6.3, using R v4.5.0). Please do not edit this file/wiki page directly.
No matter whether you want to analyze a single R script, a couple of R notebooks, a complete project, or an R package,
your journey starts with the FlowrAnalyzerBuilder (further described in Builder Configuration below).
This builder allows you to configure the analysis in many different ways, for example, by specifying which plugins to use or
what engine to use for the analysis.
When building the FlowrAnalyzer instance, the builder will take care to
- load the requested plugins
- setup an initial context
- create a cache for speeding up future analyses
- initialize the engine (e.g., TreeSitter) if needed
The builder provides two methods for building the analyzer:
-
FlowrAnalyzerBuilder::build
for an asynchronous build process that also initializes the engine if needed -
FlowrAnalyzerBuilder::buildSync
for a synchronous build process, which requires that the engine (e.g., TreeSitter) has already been initialized before calling this method. Yet, as Engines only have to be initialized once per process, this method is often more convenient to use.For more information on how to configure the builder, please refer to the Builder Configuration section below.
Once you have created an analyzer instance, you can add R files, folders, or even entire projects for analysis using the
FlowrAnalyzer::addRequest method.
All loaded plugins will be applied fully automatically during the analysis.
Please note that adding new files after you already requested analysis results may cause bigger invalidations and cause re-analysis of previously analyzed files.
With the files context, you can also add virtual files to the analysis to consider, or overwrite existing files with modified content.
For this, have a look at the
FlowrAnalyzer::addFile method.
Note
If you want to quickly try out the analyzer, you can use the following code snippet that analyzes a simple R expression:
const analyzer = await new FlowrAnalyzerBuilder()
.setEngine('tree-sitter')
.build();
// register a simple inline text-file for analysis
analyzer.addRequest('x <- 1; print(x)');
// get the dataflow
const df = await analyzer.dataflow();
// obtain the identified loading order
console.log(analyzer.inspectContext().files.loadingOrder.getLoadingOrder());
// run a dependency query
const results = await analyzer.query([{ type: 'dependencies' }]);To reset the analysis (e.g., to provide new requests) you can use FlowrAnalyzer::reset.
If you need to pre-compute analysis results (e.g., to speed up future queries), you can use FlowrAnalyzer::runFull.
Please make sure to add all of the files, folder, and projects you want to analyze using the
FlowrAnalyzer::addRequest method (or FlowrAnalyzer::addFile for virtual files).
Afterwards, you can request different kinds of analysis results, such as:
-
FlowrAnalyzer::parseto get the parsed information by the respective engine
You can also useFlowrAnalyzer::peekParseto inspect the parse information if it was already computed (but without triggering a computation). WithFlowrAnalyzer::parserInformation, you get additional information on the parser used for the analysis. -
FlowrAnalyzer::normalizeto compute the Normalized AST
Likewise,FlowrAnalyzer::peekNormalizereturns the normalized AST if it was already computed but without triggering a computation. -
FlowrAnalyzer::dataflowto compute the Dataflow Graph
Again,FlowrAnalyzer::peekDataflowallows you to inspect the dataflow graph if it was already computed (but without triggering a computation). -
FlowrAnalyzer::controlflowto compute the Control Flow Graph
Also,FlowrAnalyzer::peekControlflowreturns the control flow graph if it was already computed but without triggering a computation. -
FlowrAnalyzer::queryto run queries on the analyzed code. -
FlowrAnalyzer::runSearchto run a search query on the analyzed code using the search API
We work on providing a set of example repositories that demonstrate how to use the analyzer in different scenarios:
- flowr-analysis/sample-analyzer-project-query for an example project that runs queries on an R project
- flowr-analysis/sample-analyzer-df-diff for an example project that compares dataflows graphs
If you are interested in all available options, have a look at the Builder Reference below. The following sections highlight some of the most important configuration options:
- How to configure flowR
- How to configure the engine
- How to register plugins
You can fundamentally change the behavior of flowR using the config file,
embedded in the interface FlowrConfigOptions.
With the builder you can either provide a complete configuration or amend the default configuration using:
-
FlowrAnalyzerBuilder::setConfigto set a complete configuration -
FlowrAnalyzerBuilder::amendConfigto amend the default configuration
By default, the builder uses flowR's standard configuration obtained with defaultConfigOptions.
Note
During the analysis with the FlowrAnalyzer, you can also access the configuration with
the FlowrAnalyzerContext.
FlowR supports multiple engines for parsing and analyzing R code. With the builder, you can select the engine to use with:
-
FlowrAnalyzerBuilder::setEngineto set the desired engine. -
FlowrAnalyzerBuilder::setParserto set a specific parser implementation.
By default, the builder uses the TreeSitter engine with the TreeSitter parser.
The builder also takes care to initialize the engine if needed during the asynchronous build process
with FlowrAnalyzerBuilder::build.
If you want to use the synchronous build process with FlowrAnalyzerBuilder::buildSync,
please ensure that the engine has already been initialized before calling this method.
There are various ways for you to register plugins with the builder, exemplified by the following snippet
relying on the FlowrAnalyzerBuilder::registerPlugins method:
const analyzer = await new FlowrAnalyzerBuilder(false)
.registerPlugins(
'file:description',
new FlowrAnalyzerQmdFilePlugin(),
['file:rmd', [/.*.rmd/i]]
)
.build();This indicates three ways to add a new plugin:
- By using a predefined name (e.g.,
file:descriptionfor theFlowrAnalyzerDescriptionFilePlugin)
These mappings are controlled by theregisterPluginMakerfunction in thePluginRegistry. Under the hood, this relies onmakePluginto create the plugin instance from the name. - By providing an already instantiated plugin (e.g., the new
FlowrAnalyzerQmdFilePlugininstance).
You can pass these by reference, instantiating any class that conforms to the plugin specification. - By providing a tuple of the plugin name and its constructor arguments (e.g.,
['file:rmd', [/.*.rmd/i]]for theFlowrAnalyzerRmdFilePlugin).
This will also use themakePluginfunction under the hood to create the plugin instance.
Please note, that by passing false to the builder constructor, no default plugins (see FlowrAnalyzerPluginDefaults) are registered (otherwise, all of the plugins in the example above would be registered by default).
If you want to unregister specific plugins, you can use the FlowrAnalyzerBuilder::unregisterPlugins method.
Note
If you directly access the API, please prefer creating the objects yourself by instantiating the respective classes instead of relying on the plugin registry. This avoids the indirection and potential issues with naming collisions in the registry. Moreover, this allows you to directly provide custom configuration to the plugin constructors in a readable fashion, and to re-use plugin instances. Instantiation by text is mostly for serialized communications (e.g., via a CLI or config format).
For more information on the different plugin types and how to create new plugins, please refer to the Plugins section below.
The builder provides a plethora of methods to configure the resulting analyzer instance:
-
FlowrAnalyzerBuilder::amendConfig
Apply an amendment to the configuration the builder currently holds. Per default, thedefaultConfigOptionsare used. -
FlowrAnalyzerBuilder::registerPlugins
Register one or multiple additional plugins. For the default plugin set, please refer toFlowrAnalyzerPluginDefaults, they can be registered by passingtrueto theFlowrAnalyzerBuilderconstructor. -
FlowrAnalyzerBuilder::setConfig
Overwrite the configuration used by the resulting analyzer. -
FlowrAnalyzerBuilder::setEngine
Set the engine and hence the parser that will be used by the analyzer. This is an alternative toFlowrAnalyzerBuilder#setParserif you do not have a parser instance at hand. -
FlowrAnalyzerBuilder::setInput
Additional parameters for the analyses. -
FlowrAnalyzerBuilder::setParser
Set the parser instance used by the analyzer. This is an alternative toFlowrAnalyzerBuilder#setEngineif you already have a parser instance. Please be aware, that if you want to parallelize multiple analyzers, there should be separate parser instances. -
FlowrAnalyzerBuilder::unregisterPlugins
Remove one or multiple plugins.
To build the analyzer after you have configured the builder, you can use one of the following:
-
FlowrAnalyzerBuilder::build
Create theFlowrAnalyzerinstance using the given information. Please note that the only reason this isasyncis that if no parser is set, we need to retrieve the default engine instance which is an async operation. If you have already initialized the engine (e.g., withTreeSitterExecutor#initTreeSitter), you can use the synchronous versionFlowrAnalyzerBuilder#buildSyncinstead. -
FlowrAnalyzerBuilder::buildSync
Synchronous version ofFlowrAnalyzerBuilder#build, please only use this if you have set the parser usingFlowrAnalyzerBuilder#setParserbefore, otherwise an error will be thrown.
Plugins allow you to extend the capabilities of the analyzer in many different ways.
For example, they can be used to support other file formats, or to provide new algorithms to determine the loading order of files in a project.
All plugins have to extend the FlowrAnalyzerPlugin base class and specify their PluginType.
During the analysis, the analyzer will apply all registered plugins of the different types at the appropriate stages of the analysis.
If you just want to use these plugins, you can usually ignore their type and just register them with the builder as described
in the Builder Configuration section above.
However, if you want to create new plugins, you should be aware of the different plugin types and when they are applied during the analysis.
Currently, flowR supports the following plugin types built-in:
| Name | Class | Type | Description |
|---|---|---|---|
file:description |
FlowrAnalyzerDescriptionFilePlugin |
file-load | This plugin provides support for R DESCRIPTION files. |
file:ipynb |
FlowrAnalyzerJupyterFilePlugin |
file-load | The plugin provides support for Jupyter (.ipynb) files |
file:qmd |
FlowrAnalyzerQmdFilePlugin |
file-load | The plugin provides support for Quarto R Markdown (.qmd) files |
file:rmd |
FlowrAnalyzerRmdFilePlugin |
file-load | The plugin provides support for R Markdown (.rmd) files |
loading-order:description |
FlowrAnalyzerLoadingOrderDescriptionFilePlugin |
loading-order | This plugin extracts loading order information from R DESCRIPTION files. It looks at the Collate field to determine the order in which files should be loaded. If no Collate field is present, it does nothing. |
versions:description |
FlowrAnalyzerPackageVersionsDescriptionFilePlugin |
package-versions | This plugin extracts package versions from R DESCRIPTION files. It looks at the Depends and Imports fields to find package names and their version constraints. |
During the construction of a new FlowrAnalyzer, plugins of different types are applied at different stages of the analysis.
These plugins are grouped by their PluginType and are applied in the following order (as shown in the documentation of the PluginType):
┌───────────┐ ┌───────────────────┐ ┌─────────────┐ ┌───────────────┐ ┌───────┐
│ │ │ │ │ │ │ │ │ │
│ *Builder* ├──>│ Project Discovery ├──>│ File Loader ├──>│ Dependencies ├──>│ *DFA* │
│ │ │ (if necessary) │ │ │ │ (static) │ │ │
└───────────┘ └───────────────────┘ └──────┬──────┘ └───────────────┘ └───────┘
│ ▲
│ ┌───────────────┐ │
│ │ │ │
└─────────>│ Loading Order ├───────┘
│ │
└───────────────┘
Please note, that every plugin type has a default implementation (e.g., see defaultPlugin)
that is always active.
We describe the different plugin types in more detail below.
These plugins trigger when confronted with a project analysis request (see, RProjectAnalysisRequest).
Their job is to identify the files that belong to the project and add them to the analysis.
flowR provides the FlowrAnalyzerProjectDiscoveryPlugin with a
defaultPlugin as the default implementation that simply collects all R source files in the given folder.
Please note that all project discovery plugins should conform to the FlowrAnalyzerProjectDiscoveryPlugin base class.
These plugins register for every file encountered by the files context and determine whether and how they can process the file.
They are responsible for transforming the raw file content into a representation that flowR can work with during the analysis.
For example, the FlowrAnalyzerDescriptionFilePlugin adds support for R DESCRIPTION files by parsing their content into key-value pairs.
These can then be used by other plugins, e.g. the FlowrAnalyzerPackageVersionsDescriptionFilePlugin that extracts package version information from these files.
If multiple file plugins could apply (DefaultFlowrAnalyzerFilePlugin::applies) to the same file,
the loading order of these plugins determines which plugin gets to process the file.
Please ensure that no two file plugins apply to the same file,
as this could lead to unexpected behavior.
Also, make sure that all file plugins conform to the FlowrAnalyzerFilePlugin base class.
These plugins should identify which R packages are required with which versions for the analysis.
This information is then used to setup the R environment for the analysis correctly.
For example, the FlowrAnalyzerPackageVersionsDescriptionFilePlugin extracts package version information from DESCRIPTION files
to identify the required packages and their versions.
All dependency identification plugins should conform to the FlowrAnalyzerPackageVersionsPlugin base class.
These plugins determine the order in which files are loaded and analyzed.
This is crucial for correctly understanding the dependencies between files and improved analyses, especially in larger projects.
For example, the FlowrAnalyzerLoadingOrderDescriptionFilePlugin provides a basic implementation that orders files based on
the specification in a DESCRIPTION file, if present.
All loading order plugins should conform to the FlowrAnalyzerLoadingOrderPlugin base class.
If you want to make a new plugin you first have to decide which type of plugin you want to create (see Plugin Types above).
Then, you must create a new class that extends the corresponding base class (e.g., FlowrAnalyzerFilePlugin for file loading plugins).
In general, most plugins operate on the context information provided by the analyzer.
Usually it is a good idea to have a look at the existing plugins of the same type to get an idea of how to implement your own plugin.
Once you have your plugin you should register it with a sensible name using the registerPluginMaker function.
This will allow users to register your plugin easily by name using the builder's FlowrAnalyzerBuilder::registerPlugins method.
Otherwise, users will have to provide an instance of your plugin class directly.
The FlowrAnalyzer provides various context information during the analysis.
You can access the context with FlowrAnalyzer::inspectContext
to receive a read-only view of the current analysis context.
Likewise, you can use FlowrAnalyzerContext::inspect to get a read-only view of a given context.
These read-only views prevent you from accidentally modifying the context during the analysis which may cause inconsistencies (this should be done either by
wrapping methods or by plugins).
The context is divided into multiple sub-contexts, each responsible for a specific aspect of the analysis.
These sub-contexts are described in more detail below.
For the general structure from an implementation perspective, please have a look at FlowrAnalyzerContext.
Tip
If you need a context for testing or to create analyses with lower-level components, you can use
either contextFromInput to create a context from input data (which lifts the old requestFromInput) or
contextFromSources to create a context from source files (e.g., if you need a virtual file system).
If for whatever reason you need to reset the context during an analysis, you can use
FlowrAnalyzerContext::reset.
To pre-compute all possible information in the context before starting the main analysis, you can use
FlowrAnalyzerContext::resolvePreAnalysis.
First, let's have look at the FlowrAnalyzerFilesContext class that provides access to the files to be analyzed and their loading order:
-
FlowrAnalyzerFilesContext
This is the analyzer file context to be modified by all plugins that affect the files. If you are interested in inspecting these files, refer toReadOnlyFlowrAnalyzerFilesContext. Plugins, however, can use this context directly to modify files.
(Defined at ./src/project/context/flowr-analyzer-files-context.ts#L112)View more (AbstractFlowrAnalyzerContext, ReadOnlyFlowrAnalyzerFilesContext)
-
AbstractFlowrAnalyzerContext
Abstract class representing the context, a context may be modified and enriched by plugins (seeFlowrAnalyzerPlugin). Please use the specialized contexts likeFlowrAnalyzerFilesContextorFlowrAnalyzerLoadingOrderContextto work with flowR and in general, use theFlowrAnalyzerContextto access the full project context.
(Defined at ./src/project/context/abstract-flowr-analyzer-context.ts#L11) -
ReadOnlyFlowrAnalyzerFilesContext
This is the read-only interface for the files context, which is used to manage all files known to theFlowrAnalyzer. It prevents you from modifying the available files, but allows you to inspect them (which is probably what you want when using theFlowrAnalyzer). If you are aFlowrAnalyzerProjectDiscoveryPluginand want to modify the available files, you can use theFlowrAnalyzerFilesContextdirectly.
(Defined at ./src/project/context/flowr-analyzer-files-context.ts#L61)
-
Using the available plugins,
the files context categorizes files by their FileRole (e.g., source files or DESCRIPTION files)
and makes them accessible by these roles (e.g., via FlowrAnalyzerFilesContext::getFilesByRole).
It also provides methods to check for whether a file exists (e.g., FlowrAnalyzerFilesContext::hasFile,
FlowrAnalyzerFilesContext::exists)
and to translate requests so they respect the context (e.g., FlowrAnalyzerFilesContext::resolveRequest).
For legacy reasons it also provides the list of files considered by the dataflow analysis via
FlowrAnalyzerFilesContext::consideredFilesList.
Note
Please be aware that the loading order is inherently tied to the files context (as it determines which files are available for ordering).
Hence, the FlowrAnalyzerLoadingOrderContext is accessible (only) via the FlowrAnalyzerFilesContext.
Here is the structure of the FlowrAnalyzerLoadingOrderContext that provides access to the identified loading order of files:
-
FlowrAnalyzerLoadingOrderContext
This context is responsible for managing the loading order of script files in a project, including guesses and known orders provided byFlowrAnalyzerLoadingOrderPlugins. If you are interested in inspecting these orders, refer toReadOnlyFlowrAnalyzerLoadingOrderContext. Plugins, however, can use this context directly to modify order guesses.
(Defined at ./src/project/context/flowr-analyzer-loading-order-context.ts#L50)View more (AbstractFlowrAnalyzerContext, ReadOnlyFlowrAnalyzerLoadingOrderContext)
-
AbstractFlowrAnalyzerContext
Abstract class representing the context, a context may be modified and enriched by plugins (seeFlowrAnalyzerPlugin). Please use the specialized contexts likeFlowrAnalyzerFilesContextorFlowrAnalyzerLoadingOrderContextto work with flowR and in general, use theFlowrAnalyzerContextto access the full project context.
(Defined at ./src/project/context/abstract-flowr-analyzer-context.ts#L11) -
ReadOnlyFlowrAnalyzerLoadingOrderContext
Read-only interface for the loading order context, which is used to determine the order in which script files are loaded in a project. This interface prevents you from modifying the available files, but allows you to inspect them (which is probably what you want when using theFlowrAnalyzer). If you are aFlowrAnalyzerLoadingOrderPluginand want to modify the available orders, you can use theFlowrAnalyzerLoadingOrderContextdirectly.
(Defined at ./src/project/context/flowr-analyzer-loading-order-context.ts#L14)
-
Using the available plugins, the loading order context determines the order in which files are loaded and analyzed by flowR's analyzer.
You can inspect the identified loading order using
FlowrAnalyzerLoadingOrderContext::getLoadingOrder.
If there are multiple possible loading orders (e.g., due to circular dependencies),
you can use FlowrAnalyzerLoadingOrderContext::currentGuesses.
Here is the structure of the FlowrAnalyzerDependenciesContext that provides access to the identified dependencies and their versions,
including the version of R:
-
FlowrAnalyzerDependenciesContext
This context is responsible for managing the dependencies of the project, including their versions and interplays withFlowrAnalyzerPackageVersionsPlugins. If you are interested in inspecting these dependencies, refer toReadOnlyFlowrAnalyzerDependenciesContext.
(Defined at ./src/project/context/flowr-analyzer-dependencies-context.ts#L33)View more (AbstractFlowrAnalyzerContext, ReadOnlyFlowrAnalyzerDependenciesContext)
-
AbstractFlowrAnalyzerContext
Abstract class representing the context, a context may be modified and enriched by plugins (seeFlowrAnalyzerPlugin). Please use the specialized contexts likeFlowrAnalyzerFilesContextorFlowrAnalyzerLoadingOrderContextto work with flowR and in general, use theFlowrAnalyzerContextto access the full project context.
(Defined at ./src/project/context/abstract-flowr-analyzer-context.ts#L11) -
ReadOnlyFlowrAnalyzerDependenciesContext
This is a read-only interface to theFlowrAnalyzerDependenciesContext. It prevents you from modifying the dependencies, but allows you to inspect them (which is probably what you want when using theFlowrAnalyzer). If you are aFlowrAnalyzerPackageVersionsPluginand want to modify the dependencies, you can use theFlowrAnalyzerDependenciesContextdirectly.
(Defined at ./src/project/context/flowr-analyzer-dependencies-context.ts#L13)
-
Probably the most important method is
FlowrAnalyzerDependenciesContext::getDependency
that allows you to query for a specific dependency by name.
Here is the structure of the FlowrAnalyzerEnvironmentContext that provides access to the built-in environment:
-
FlowrAnalyzerEnvironmentContext
This context is responsible for providing the built-in environment. It creates the built-in environment based on the configuration provided in theFlowrAnalyzerContext.
(Defined at ./src/project/context/flowr-analyzer-environment-context.ts#L45)View more (ReadOnlyFlowrAnalyzerEnvironmentContext)
-
ReadOnlyFlowrAnalyzerEnvironmentContext
This is the read-only interface to theFlowrAnalyzerEnvironmentContext, which provides access to the built-in environment used during analysis.
(Defined at ./src/project/context/flowr-analyzer-environment-context.ts#L13)
-
ReadOnlyFlowrAnalyzerEnvironmentContext
The environment context provides access to the built-in environment via
FlowrAnalyzerEnvironmentContext::makeCleanEnv.
It also provides the empty built-in environment, which only contains primitives, via
FlowrAnalyzerEnvironmentContext::makeCleanEnvWithEmptyBuiltIns.
To speed up analyses, flowR provides a caching mechanism that stores intermediate results of the analysis.
The cache is maintained by the FlowrAnalyzerCache class and is used automatically by the analyzer during the analysis.
Underlying, it relies on the PipelineExecutor to cache results of different pipeline stages.
Usually, you do not have to worry about the cache, as it is managed automatically by the analyzer.
If you want to overwrite cache information, the analysis methods in FlowrAnalyzer (see Conducting Analyses above)
usually provide an optional force parameter to control whether to use the cache or recompute the results.
Currently maintained by Florian Sihler and Oliver Gerstl at Ulm University
Email | GitHub | Penguins | Portfolio
- 🧑💻 Developer Onboarding
- 💻 Setup
- 👓 Overview
- 🪟 Interfacing with flowR
- 🌋 Core
- 🧹 Testing & Linting (Benchmark Page)
⁉️ FAQ- ℹ️ Extra Information