Skip to content

Conversation

@michael-johnston
Copy link
Member

@michael-johnston michael-johnston commented Jan 22, 2026

Adds ability to specify an actuator when importing data from a CSV. This allows data obtained from elsewhere (maybe from ado) to appear as ado data (can be used as memoized data in a space with that experiment)

The actuator/experiment combination will be validated against existing actuators and if passed imported.

Notes:

  • Changes the schema of CSVSampleStoreDescriptor to simplify the code - upgrade path and warnings provided.
  • column name are no longer converted to lower case automatically. User must do this in YAML if desired.

@DRL-NextGen
Copy link
Member

DRL-NextGen commented Jan 22, 2026

Checks Summary

Last run: 2026-01-26T20:34:03.488Z

Code Risk Analyzer vulnerability scan found 2 vulnerabilities:

Severity Identifier Package Details Fix
◻ Unknown CVE-2025-53000 nbconvert
nbconvert has an uncontrolled search path that leads to unauthorized code execution on WindowsGHSA-xm59-rqc7-hhvf

nbconvert:7.16.6->ado-core:1.3.3
>7.16.6
◻ Unknown CVE-2026-0994 protobuf
protobuf affected by a JSON recursion depth bypassGHSA-7gcm-g887-7qv7

protobuf:6.33.4->ado-core:1.3.3,protobuf:6.33.4,vllm:0.14.1
>6.33.4

Mend Unified Agent vulnerability scan found 3 vulnerabilities:

Severity Identifier Package Details Fix
❗ Critical CVE-2025-56005 ply-3.11-py2.py3-none-any.whl
An undocumented and unsafe feature in the PLY (Python Lex-Yacc) library 3.11 allows Remote Code Exec...An undocumented and unsafe feature in the PLY (Python Lex-Yacc) library 3.11 allows Remote Code Execution (RCE) via the "picklefile" parameter in the "yacc()" function. This parameter accepts a ".pkl" file that is deserialized with "pickle.load()" without validation. Because "pickle" allows execution of embedded code via "reduce()", an attacker can achieve code execution by passing a malicious pickle file. The parameter is not mentioned in official documentation or the GitHub repository, yet it is active in the PyPI version. This introduces a stealthy backdoor and persistence risk.
Not Available
🔺 High CVE-2025-53000 nbconvert-7.16.6-py3-none-any.whl
The nbconvert tool, jupyter nbconvert, converts Jupyter notebooks to various other formats via Jinja...The nbconvert tool, jupyter nbconvert, converts Jupyter notebooks to various other formats via Jinja templates. Versions of nbconvert up to and including 7.16.6 on Windows have a vulnerability in which converting a notebook containing SVG output to a PDF results in unauthorized code execution. Specifically, a third party can create a "inkscape.bat" file that defines a Windows batch script, capable of arbitrary code execution. When a user runs "jupyter nbconvert --to pdf" on a notebook containing SVG output to a PDF on a Windows platform from this directory, the "inkscape.bat" file is run unexpectedly. As of time of publication, no known patches exist.
Not Available
🔺 High CVE-2026-0994 protobuf-6.33.4-cp39-abi3-manylinux2014_x86_64.whl
A denial-of-service (DoS) vulnerability exists in google.protobuf.json_format.ParseDict() in Python,...A denial-of-service (DoS) vulnerability exists in google.protobuf.json_format.ParseDict() in Python, where the max_recursion_depth limit can be bypassed when parsing nested google.protobuf.Any messages.
Due to missing recursion depth accounting inside the internal Any-handling logic, an attacker can supply deeply nested Any structures that bypass the intended recursion limit, eventually exhausting Python’s recursion stack and causing a RecursionError.
Not Available

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com>
Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
@michael-johnston michael-johnston marked this pull request as draft January 23, 2026 11:02
michael-johnston and others added 8 commits January 23, 2026 14:39
- Moved all logic related experiments into ExperimentDescriptors - included constiutive properties
- Two subclasses once for External (Replay) experiments one for Internal
- This greatly simplifies CSVSampleStoreDescriptor
- This also simplifies CSVSampleStore
- Provide a warning and upgrade path for old formats.
Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
@michael-johnston michael-johnston marked this pull request as ready for review January 23, 2026 15:19
@michael-johnston michael-johnston dismissed AlessandroPomponio’s stale review January 26, 2026 10:30

Code updated. Rereview required

@michael-johnston
Copy link
Member Author

@AlessandroPomponio What is the process to resolve the vulnerability scan issues? I can't see anything reported in pipeline.

Comment on lines +223 to +228
The key here is that you **must** define which columns in the CSV are observed properties
and which are constitutive properties.
If you want to use the column names directly as observed/constitutive property names
you can pass a list to the relevant field.
If you want to define new observed/constitutive property names for each column you
can pass a dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what AI suggest rewriting this to to make it clearer. I like this one better with the example next to the explanation. Feel free to edit.

NOTE: the code block renders incorrectly. Please click on EDIT for this comment to see the real suggestion

You must specify which CSV columns contain observed properties (measurements/results) 
and which contain constitutive properties (input parameters/configurations).

**Two ways to map columns:**

1. **Use CSV column names as-is** - Pass a list:
   ```yaml
   constitutivePropertyMap:
     - cpu_value
     - memory_gb

This uses cpu_value and memory_gb as both the column names AND property names.

  1. Rename columns - Pass a dictionary:
    observedPropertyMap:
      wallClockRuntime: 'wall-clock runtime'
      throughput: 'requests_per_sec'
    This reads from CSV columns wall-clock runtime and requests_per_sec,
    but names them wallClockRuntime and throughput in the experiment.

idColumn: Column containing entity identifiers
generatorIdentifier: Optional identifier for the entity generator
experimentIdentifier: Experiment identifier
actuatorIdentifier: Actuator identifier (defaults to 'replay')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defaults to None in the signature

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com>
Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants