Skip to content

Analyzer

github-actions[bot] edited this page Dec 2, 2025 · 1 revision

This document was generated from 'src/documentation/wiki-analyzer.ts' on 2025-12-01, 16:14:34 UTC presenting an overview of flowR's analyzer (v2.6.3, using R v4.5.0). Please do not edit this file/wiki page directly.

Overview

No matter whether you want to analyze a single R script, a couple of R notebooks, a complete project, or an R package, your journey starts with the FlowrAnalyzerBuilder (further described in Builder Configuration below). This builder allows you to configure the analysis in many different ways, for example, by specifying which plugins to use or what engine to use for the analysis.

When building the FlowrAnalyzer instance, the builder will take care to

The builder provides two methods for building the analyzer:

  • FlowrAnalyzerBuilder::build
    for an asynchronous build process that also initializes the engine if needed

  • FlowrAnalyzerBuilder::buildSync
    for a synchronous build process, which requires that the engine (e.g., TreeSitter) has already been initialized before calling this method. Yet, as Engines only have to be initialized once per process, this method is often more convenient to use.

    For more information on how to configure the builder, please refer to the Builder Configuration section below.

    Overview of the Analyzer

Once you have created an analyzer instance, you can add R files, folders, or even entire projects for analysis using the FlowrAnalyzer::addRequest method. All loaded plugins will be applied fully automatically during the analysis. Please note that adding new files after you already requested analysis results may cause bigger invalidations and cause re-analysis of previously analyzed files. With the files context, you can also add virtual files to the analysis to consider, or overwrite existing files with modified content. For this, have a look at the FlowrAnalyzer::addFile method.

Note

If you want to quickly try out the analyzer, you can use the following code snippet that analyzes a simple R expression:

const analyzer = await new FlowrAnalyzerBuilder()
    .setEngine('tree-sitter')
    .build();
// register a simple inline text-file for analysis
analyzer.addRequest('x <- 1; print(x)');
// get the dataflow
const df = await analyzer.dataflow();
// obtain the identified loading order
console.log(analyzer.inspectContext().files.loadingOrder.getLoadingOrder());
// run a dependency query
const results = await analyzer.query([{ type: 'dependencies' }]);

To reset the analysis (e.g., to provide new requests) you can use FlowrAnalyzer::reset. If you need to pre-compute analysis results (e.g., to speed up future queries), you can use FlowrAnalyzer::runFull.

Conducting Analyses

Please make sure to add all of the files, folder, and projects you want to analyze using the FlowrAnalyzer::addRequest method (or FlowrAnalyzer::addFile for virtual files). Afterwards, you can request different kinds of analysis results, such as:

We work on providing a set of example repositories that demonstrate how to use the analyzer in different scenarios:

Builder Configuration

If you are interested in all available options, have a look at the Builder Reference below. The following sections highlight some of the most important configuration options:

  1. How to configure flowR
  2. How to configure the engine
  3. How to register plugins

Configuring flowR

You can fundamentally change the behavior of flowR using the config file, embedded in the interface FlowrConfigOptions. With the builder you can either provide a complete configuration or amend the default configuration using:

By default, the builder uses flowR's standard configuration obtained with defaultConfigOptions.

Note

During the analysis with the FlowrAnalyzer, you can also access the configuration with the FlowrAnalyzerContext.

Configuring the Engine

FlowR supports multiple engines for parsing and analyzing R code. With the builder, you can select the engine to use with:

By default, the builder uses the TreeSitter engine with the TreeSitter parser. The builder also takes care to initialize the engine if needed during the asynchronous build process with FlowrAnalyzerBuilder::build. If you want to use the synchronous build process with FlowrAnalyzerBuilder::buildSync, please ensure that the engine has already been initialized before calling this method.

Configuring Plugins

There are various ways for you to register plugins with the builder, exemplified by the following snippet relying on the FlowrAnalyzerBuilder::registerPlugins method:

const analyzer = await new FlowrAnalyzerBuilder(false)
    .registerPlugins(
        'file:description',
        new FlowrAnalyzerQmdFilePlugin(),
        ['file:rmd', [/.*.rmd/i]]
    )
    .build();

This indicates three ways to add a new plugin:

  1. By using a predefined name (e.g., file:description for the FlowrAnalyzerDescriptionFilePlugin)
    These mappings are controlled by the registerPluginMaker function in the PluginRegistry. Under the hood, this relies on makePlugin to create the plugin instance from the name.
  2. By providing an already instantiated plugin (e.g., the new FlowrAnalyzerQmdFilePlugin instance).
    You can pass these by reference, instantiating any class that conforms to the plugin specification.
  3. By providing a tuple of the plugin name and its constructor arguments (e.g., ['file:rmd', [/.*.rmd/i]] for the FlowrAnalyzerRmdFilePlugin).
    This will also use the makePlugin function under the hood to create the plugin instance.

Please note, that by passing false to the builder constructor, no default plugins (see FlowrAnalyzerPluginDefaults) are registered (otherwise, all of the plugins in the example above would be registered by default). If you want to unregister specific plugins, you can use the FlowrAnalyzerBuilder::unregisterPlugins method.

Note

If you directly access the API, please prefer creating the objects yourself by instantiating the respective classes instead of relying on the plugin registry. This avoids the indirection and potential issues with naming collisions in the registry. Moreover, this allows you to directly provide custom configuration to the plugin constructors in a readable fashion, and to re-use plugin instances. Instantiation by text is mostly for serialized communications (e.g., via a CLI or config format).

For more information on the different plugin types and how to create new plugins, please refer to the Plugins section below.

Builder Reference

The builder provides a plethora of methods to configure the resulting analyzer instance:

To build the analyzer after you have configured the builder, you can use one of the following:

  • FlowrAnalyzerBuilder::build
    Create the FlowrAnalyzer instance using the given information. Please note that the only reason this is async is that if no parser is set, we need to retrieve the default engine instance which is an async operation. If you have already initialized the engine (e.g., with TreeSitterExecutor#initTreeSitter ), you can use the synchronous version FlowrAnalyzerBuilder#buildSync instead.
  • FlowrAnalyzerBuilder::buildSync
    Synchronous version of FlowrAnalyzerBuilder#build , please only use this if you have set the parser using FlowrAnalyzerBuilder#setParser before, otherwise an error will be thrown.

Plugins

Plugins allow you to extend the capabilities of the analyzer in many different ways. For example, they can be used to support other file formats, or to provide new algorithms to determine the loading order of files in a project. All plugins have to extend the FlowrAnalyzerPlugin base class and specify their PluginType. During the analysis, the analyzer will apply all registered plugins of the different types at the appropriate stages of the analysis. If you just want to use these plugins, you can usually ignore their type and just register them with the builder as described in the Builder Configuration section above. However, if you want to create new plugins, you should be aware of the different plugin types and when they are applied during the analysis.

Currently, flowR supports the following plugin types built-in:

Name Class Type Description
file:description FlowrAnalyzerDescriptionFilePlugin file-load This plugin provides support for R DESCRIPTION files.
file:ipynb FlowrAnalyzerJupyterFilePlugin file-load The plugin provides support for Jupyter (.ipynb) files
file:qmd FlowrAnalyzerQmdFilePlugin file-load The plugin provides support for Quarto R Markdown (.qmd) files
file:rmd FlowrAnalyzerRmdFilePlugin file-load The plugin provides support for R Markdown (.rmd) files
loading-order:description FlowrAnalyzerLoadingOrderDescriptionFilePlugin loading-order This plugin extracts loading order information from R DESCRIPTION files. It looks at the Collate field to determine the order in which files should be loaded. If no Collate field is present, it does nothing.
versions:description FlowrAnalyzerPackageVersionsDescriptionFilePlugin package-versions This plugin extracts package versions from R DESCRIPTION files. It looks at the Depends and Imports fields to find package names and their version constraints.

Plugin Types

During the construction of a new FlowrAnalyzer, plugins of different types are applied at different stages of the analysis. These plugins are grouped by their PluginType and are applied in the following order (as shown in the documentation of the PluginType):

┌───────────┐   ┌───────────────────┐   ┌─────────────┐   ┌───────────────┐   ┌───────┐
│           │   │                   │   │             │   │               │   │       │
│ *Builder* ├──>│ Project Discovery ├──>│ File Loader ├──>│ Dependencies  ├──>│ *DFA* │
│           │   │  (if necessary)   │   │             │   │   (static)    │   │       │
└───────────┘   └───────────────────┘   └──────┬──────┘   └───────────────┘   └───────┘
                                               │                                  ▲
                                               │          ┌───────────────┐       │
                                               │          │               │       │
                                               └─────────>│ Loading Order ├───────┘
                                                          │               │
                                                          └───────────────┘

Please note, that every plugin type has a default implementation (e.g., see defaultPlugin) that is always active. We describe the different plugin types in more detail below.

Project Discovery

These plugins trigger when confronted with a project analysis request (see, RProjectAnalysisRequest). Their job is to identify the files that belong to the project and add them to the analysis. flowR provides the FlowrAnalyzerProjectDiscoveryPlugin with a defaultPlugin as the default implementation that simply collects all R source files in the given folder.

Please note that all project discovery plugins should conform to the FlowrAnalyzerProjectDiscoveryPlugin base class.

File Loading

These plugins register for every file encountered by the files context and determine whether and how they can process the file. They are responsible for transforming the raw file content into a representation that flowR can work with during the analysis. For example, the FlowrAnalyzerDescriptionFilePlugin adds support for R DESCRIPTION files by parsing their content into key-value pairs. These can then be used by other plugins, e.g. the FlowrAnalyzerPackageVersionsDescriptionFilePlugin that extracts package version information from these files.

If multiple file plugins could apply (DefaultFlowrAnalyzerFilePlugin::applies) to the same file, the loading order of these plugins determines which plugin gets to process the file. Please ensure that no two file plugins apply to the same file, as this could lead to unexpected behavior. Also, make sure that all file plugins conform to the FlowrAnalyzerFilePlugin base class.

Dependency Identification

These plugins should identify which R packages are required with which versions for the analysis. This information is then used to setup the R environment for the analysis correctly. For example, the FlowrAnalyzerPackageVersionsDescriptionFilePlugin extracts package version information from DESCRIPTION files to identify the required packages and their versions.

All dependency identification plugins should conform to the FlowrAnalyzerPackageVersionsPlugin base class.

Loading Order

These plugins determine the order in which files are loaded and analyzed. This is crucial for correctly understanding the dependencies between files and improved analyses, especially in larger projects. For example, the FlowrAnalyzerLoadingOrderDescriptionFilePlugin provides a basic implementation that orders files based on the specification in a DESCRIPTION file, if present.

All loading order plugins should conform to the FlowrAnalyzerLoadingOrderPlugin base class.

How to add a new plugin

If you want to make a new plugin you first have to decide which type of plugin you want to create (see Plugin Types above). Then, you must create a new class that extends the corresponding base class (e.g., FlowrAnalyzerFilePlugin for file loading plugins). In general, most plugins operate on the context information provided by the analyzer. Usually it is a good idea to have a look at the existing plugins of the same type to get an idea of how to implement your own plugin.

Once you have your plugin you should register it with a sensible name using the registerPluginMaker function. This will allow users to register your plugin easily by name using the builder's FlowrAnalyzerBuilder::registerPlugins method. Otherwise, users will have to provide an instance of your plugin class directly.

Context Information

The FlowrAnalyzer provides various context information during the analysis. You can access the context with FlowrAnalyzer::inspectContext to receive a read-only view of the current analysis context. Likewise, you can use FlowrAnalyzerContext::inspect to get a read-only view of a given context. These read-only views prevent you from accidentally modifying the context during the analysis which may cause inconsistencies (this should be done either by wrapping methods or by plugins). The context is divided into multiple sub-contexts, each responsible for a specific aspect of the analysis. These sub-contexts are described in more detail below.

For the general structure from an implementation perspective, please have a look at FlowrAnalyzerContext.

Tip

If you need a context for testing or to create analyses with lower-level components, you can use either contextFromInput to create a context from input data (which lifts the old requestFromInput) or contextFromSources to create a context from source files (e.g., if you need a virtual file system).

If for whatever reason you need to reset the context during an analysis, you can use FlowrAnalyzerContext::reset. To pre-compute all possible information in the context before starting the main analysis, you can use FlowrAnalyzerContext::resolvePreAnalysis.

Files Context

First, let's have look at the FlowrAnalyzerFilesContext class that provides access to the files to be analyzed and their loading order:

  • FlowrAnalyzerFilesContext
    This is the analyzer file context to be modified by all plugins that affect the files. If you are interested in inspecting these files, refer to ReadOnlyFlowrAnalyzerFilesContext . Plugins, however, can use this context directly to modify files.
    (Defined at ./src/project/context/flowr-analyzer-files-context.ts#L112)

    View more (AbstractFlowrAnalyzerContext, ReadOnlyFlowrAnalyzerFilesContext)
    • AbstractFlowrAnalyzerContext
      Abstract class representing the context, a context may be modified and enriched by plugins (see FlowrAnalyzerPlugin ). Please use the specialized contexts like FlowrAnalyzerFilesContext or FlowrAnalyzerLoadingOrderContext to work with flowR and in general, use the FlowrAnalyzerContext to access the full project context.
      (Defined at ./src/project/context/abstract-flowr-analyzer-context.ts#L11)

    • ReadOnlyFlowrAnalyzerFilesContext
      This is the read-only interface for the files context, which is used to manage all files known to the FlowrAnalyzer . It prevents you from modifying the available files, but allows you to inspect them (which is probably what you want when using the FlowrAnalyzer ). If you are a FlowrAnalyzerProjectDiscoveryPlugin and want to modify the available files, you can use the FlowrAnalyzerFilesContext directly.
      (Defined at ./src/project/context/flowr-analyzer-files-context.ts#L61)

Using the available plugins, the files context categorizes files by their FileRole (e.g., source files or DESCRIPTION files) and makes them accessible by these roles (e.g., via FlowrAnalyzerFilesContext::getFilesByRole). It also provides methods to check for whether a file exists (e.g., FlowrAnalyzerFilesContext::hasFile, FlowrAnalyzerFilesContext::exists) and to translate requests so they respect the context (e.g., FlowrAnalyzerFilesContext::resolveRequest).

For legacy reasons it also provides the list of files considered by the dataflow analysis via FlowrAnalyzerFilesContext::consideredFilesList.

Loading Order Context

Note

Please be aware that the loading order is inherently tied to the files context (as it determines which files are available for ordering). Hence, the FlowrAnalyzerLoadingOrderContext is accessible (only) via the FlowrAnalyzerFilesContext.

Here is the structure of the FlowrAnalyzerLoadingOrderContext that provides access to the identified loading order of files:

  • FlowrAnalyzerLoadingOrderContext
    This context is responsible for managing the loading order of script files in a project, including guesses and known orders provided by FlowrAnalyzerLoadingOrderPlugin s. If you are interested in inspecting these orders, refer to ReadOnlyFlowrAnalyzerLoadingOrderContext . Plugins, however, can use this context directly to modify order guesses.
    (Defined at ./src/project/context/flowr-analyzer-loading-order-context.ts#L50)

    View more (AbstractFlowrAnalyzerContext, ReadOnlyFlowrAnalyzerLoadingOrderContext)
    • AbstractFlowrAnalyzerContext
      Abstract class representing the context, a context may be modified and enriched by plugins (see FlowrAnalyzerPlugin ). Please use the specialized contexts like FlowrAnalyzerFilesContext or FlowrAnalyzerLoadingOrderContext to work with flowR and in general, use the FlowrAnalyzerContext to access the full project context.
      (Defined at ./src/project/context/abstract-flowr-analyzer-context.ts#L11)

    • ReadOnlyFlowrAnalyzerLoadingOrderContext
      Read-only interface for the loading order context, which is used to determine the order in which script files are loaded in a project. This interface prevents you from modifying the available files, but allows you to inspect them (which is probably what you want when using the FlowrAnalyzer ). If you are a FlowrAnalyzerLoadingOrderPlugin and want to modify the available orders, you can use the FlowrAnalyzerLoadingOrderContext directly.
      (Defined at ./src/project/context/flowr-analyzer-loading-order-context.ts#L14)

Using the available plugins, the loading order context determines the order in which files are loaded and analyzed by flowR's analyzer. You can inspect the identified loading order using FlowrAnalyzerLoadingOrderContext::getLoadingOrder. If there are multiple possible loading orders (e.g., due to circular dependencies), you can use FlowrAnalyzerLoadingOrderContext::currentGuesses.

Dependencies Context

Here is the structure of the FlowrAnalyzerDependenciesContext that provides access to the identified dependencies and their versions, including the version of R:

Probably the most important method is FlowrAnalyzerDependenciesContext::getDependency that allows you to query for a specific dependency by name.

Environment Context

Here is the structure of the FlowrAnalyzerEnvironmentContext that provides access to the built-in environment:

The environment context provides access to the built-in environment via FlowrAnalyzerEnvironmentContext::makeCleanEnv. It also provides the empty built-in environment, which only contains primitives, via FlowrAnalyzerEnvironmentContext::makeCleanEnvWithEmptyBuiltIns.

Caching

To speed up analyses, flowR provides a caching mechanism that stores intermediate results of the analysis. The cache is maintained by the FlowrAnalyzerCache class and is used automatically by the analyzer during the analysis. Underlying, it relies on the PipelineExecutor to cache results of different pipeline stages.

Usually, you do not have to worry about the cache, as it is managed automatically by the analyzer. If you want to overwrite cache information, the analysis methods in FlowrAnalyzer (see Conducting Analyses above) usually provide an optional force parameter to control whether to use the cache or recompute the results.

Clone this wiki locally