Skip to content

Conversation

@bpowers
Copy link
Owner

@bpowers bpowers commented Jan 28, 2026

Add three new functions to libsimlin for directly retrieving variable
details as JSON arrays: simlin_model_get_stocks_json,
simlin_model_get_flows_json, and simlin_model_get_auxs_json. These
functions allow callers to get stocks, flows, or auxiliaries without
serializing the entire project.

The new APIs follow the existing libsimlin memory ownership pattern -
the returned buffer is owned by the caller and must be freed with
simlin_free(). This provides a significant simplification for engine2
and pysimlin, which previously had to serialize the entire project to
JSON just to access variable information.

Changes include:

  • libsimlin: Three new FFI functions with proper documentation and tests
  • engine2: WASM bindings and Model class simplification using direct APIs
  • pysimlin: FFI bindings and Model class simplification using direct APIs

The simplification is particularly beneficial for large projects where
serializing the entire project is expensive when only variable info is
needed.

https://claude.ai/code/session_01PujQhTADskFsjJVY3GLuUQ

Add three new functions to libsimlin for directly retrieving variable
details as JSON arrays: simlin_model_get_stocks_json,
simlin_model_get_flows_json, and simlin_model_get_auxs_json. These
functions allow callers to get stocks, flows, or auxiliaries without
serializing the entire project.

The new APIs follow the existing libsimlin memory ownership pattern -
the returned buffer is owned by the caller and must be freed with
simlin_free(). This provides a significant simplification for engine2
and pysimlin, which previously had to serialize the entire project to
JSON just to access variable information.

Changes include:
- libsimlin: Three new FFI functions with proper documentation and tests
- engine2: WASM bindings and Model class simplification using direct APIs
- pysimlin: FFI bindings and Model class simplification using direct APIs

The simplification is particularly beneficial for large projects where
serializing the entire project is expensive when only variable info is
needed.

https://claude.ai/code/session_01PujQhTADskFsjJVY3GLuUQ
@bpowers
Copy link
Owner Author

bpowers commented Jan 28, 2026

@codex review

@claude
Copy link

claude bot commented Jan 28, 2026

Pull Request Review: libsimlin JSON APIs for Variable Retrieval

This PR introduces targeted JSON APIs for retrieving stocks, flows, and auxiliaries from models without requiring full project serialization. Overall, this is a well-designed performance optimization with good test coverage. Here are my findings:


✅ Strengths

  1. Clear Performance Win: The motivation is sound - avoiding full project serialization when only variable information is needed will provide significant performance benefits for large projects.

  2. Consistent API Design: All three functions (simlin_model_get_stocks_json, simlin_model_get_flows_json, simlin_model_get_auxs_json) follow identical patterns, making them predictable and maintainable.

  3. Good Memory Ownership Model: The APIs correctly follow the established libsimlin pattern where returned buffers are caller-owned and must be freed with simlin_free().

  4. Comprehensive Test Coverage: The Rust tests cover:

    • Happy path (models with variables)
    • Empty results (models without the requested variable type)
    • Error cases (null model, null output pointers)
    • Validation of returned JSON structure and content
  5. Proper Documentation: Both the C header and Rust implementation have clear safety and ownership documentation.

  6. Consistent Implementation Across Platforms: The changes are implemented consistently in engine2 (WASM/TypeScript) and pysimlin (Python), both simplifying their Model classes by eliminating the need for full project serialization.


🔍 Code Quality Observations

Rust Implementation (libsimlin/src/lib.rs)

Potential Issue - Lock Handling:

let project_locked = (*model_ref.project).project.lock().unwrap();

The code uses .unwrap() on a mutex lock, which will panic if the lock is poisoned. According to CLAUDE.md guidelines, .unwrap() is preferred over .unwrap_or_default() to catch assumption violations, but for a mutex lock in a public FFI function, this could crash the process. Consider:

  • Is this acceptable behavior for the FFI boundary?
  • Should poisoned locks be handled as errors returned to the caller?

Minor: Code Duplication
The three functions share 95% identical code structure. While this makes each function clear and independent, consider extracting a generic helper if this pattern needs to expand (e.g., for modules or other variable types):

fn get_variables_json<T, F>(
    model: *mut SimlinModel,
    out_buffer: *mut *mut u8,
    out_len: *mut usize,
    out_error: *mut *mut SimlinError,
    filter_map: F,
    error_context: &str,
) where
    F: Fn(&engine::datamodel::Variable) -> Option<T>,
    T: serde::Serialize,

However, given there are only 3 variable types and they're unlikely to expand, the current explicit approach is reasonable.

Memory Safety - Excellent:
The functions correctly:

  • Validate output pointers before use
  • Initialize output pointers to null/0 on entry
  • Use simlin_malloc and properly handle allocation failures
  • Use std::ptr::copy_nonoverlapping for safe memory copying

TypeScript Implementation (engine2)

Good Use of RAII Pattern:
The try/finally blocks ensure output pointers are always freed:

try {
  fn(model, outBufPtr, outLenPtr, outErrPtr);
  // ... error handling and result processing ...
  return data;
} finally {
  free(outBufPtr);
  free(outLenPtr);
  free(outErrPtr);
}

Proper Memory Management:
The code correctly:

  • Copies data from WASM using copyFromWasm() before freeing
  • Frees the buffer pointer returned by the FFI call
  • Uses TextDecoder for UTF-8 decoding

Minor - Repeated Code:
Similar to the Rust side, the three TypeScript wrapper functions are nearly identical. This is acceptable given:

  • Clear, simple functions
  • Type safety per function
  • Low likelihood of changes

Python Implementation (pysimlin)

Correct CFFI Usage:

output_ptr = ffi.new("uint8_t **")
output_len_ptr = ffi.new("uintptr_t *")
# ...
try:
    return bytes(ffi.buffer(output_ptr[0], output_len_ptr[0]))
finally:
    lib.simlin_free(output_ptr[0])

Properly uses CFFI's buffer protocol and ensures cleanup in finally block.

Good Error Handling:
Uses the existing check_out_error() helper for consistent error translation to Python exceptions.


🎯 Model Class Simplification

The changes successfully simplify the Model classes in both engine2 and pysimlin:

Before:

model = self._get_model_json()  # Serializes entire project
self._cached_stocks = ... for s in model.stocks ...

After:

json_bytes = get_stocks_json(self._ptr)  # Direct variable access
stocks_json = json.loads(json_bytes.decode("utf-8"))

This is cleaner and more efficient, especially for workflows that only query variable information.


🔒 Security

No Security Concerns Identified:

  • Proper bounds checking on buffer operations
  • Safe UTF-8 handling
  • No unsafe casting or pointer arithmetic beyond standard FFI patterns
  • Memory allocation failures handled correctly

⚡ Performance

Expected Benefits:

  1. Reduced serialization cost: For large projects, serializing only the needed variables (stocks, flows, or auxs) rather than the entire project will significantly reduce CPU and memory usage.
  2. Lower memory pressure: Smaller JSON payloads mean less allocation and GC pressure.
  3. Better cache locality: Operations that only need one variable type won't thrash the cache with unneeded data.

Potential Concern:
Each call still requires:

  • JSON serialization (via serde_json::to_vec)
  • UTF-8 decoding
  • JSON parsing on the client side

For extremely high-frequency calls, a more direct structured API might be faster, but for typical usage patterns this should be fine.


📝 Testing

Test Quality: Excellent

The tests verify:

  • ✅ Correct JSON structure and content
  • ✅ Empty results for models without variables
  • ✅ Null pointer error handling
  • ✅ Memory ownership (tests free returned buffers)

Missing Test Coverage:
Consider adding tests for:

  1. Large models: Verify performance characteristics with models containing hundreds/thousands of variables
  2. Unicode in variable names: Ensure UTF-8 encoding handles non-ASCII characters correctly
  3. Concurrent access: If the model can be accessed from multiple threads, verify thread safety

🎨 Style & Conventions

Commit Message: The commit follows the repository style perfectly:

  • Component prefix: libsimlin:
  • Lowercase imperative: add JSON APIs for model variable retrieval
  • Good body with context and reasoning
  • Appropriate length

Code Style: All code follows established patterns in the codebase. No style issues.


🤔 Questions/Suggestions

  1. Should these APIs be extended to modules? If users want to query variables by module, would a similar API be useful?

  2. Benchmark results? It would be valuable to add a comment documenting the performance improvement for a representative large model (e.g., "For the World3 model with 500+ variables, this reduces query time from 15ms to 2ms").

  3. API Completeness: Consider if there are other common query patterns that would benefit from similar optimizations (e.g., getting all variables of a specific dimension, or variables matching a name pattern).


✅ Verdict

LGTM with minor suggestions

This PR is well-implemented, properly tested, and provides a clear performance benefit. The code quality is high, follows repository conventions, and introduces no security concerns. The only substantive question is about mutex lock handling in FFI functions - the current behavior (panic on poisoned lock) may be acceptable, but worth confirming.

Recommended before merge:

  • Confirm the mutex .unwrap() panic behavior is acceptable for FFI
  • Consider adding a comment documenting expected performance wins
  • Consider adding tests for Unicode variable names

Nice work! 🚀

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4bdac1488

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1493 to +1500
// Extract stocks from the datamodel and convert to JSON types
let stocks: Vec<engine::json::Stock> = datamodel_model
.variables
.iter()
.filter_map(|var| match var {
engine::datamodel::Variable::Stock(stock) => {
Some(engine::json::Stock::from(stock.clone()))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve deterministic variable ordering

The new JSON APIs emit variables in the datamodel’s insertion order, but the existing JSON serialization path sorts stocks/flows/auxiliaries by canonical identifier for determinism (json::Model does sort_by_key(canonicalize) in src/simlin-engine/src/json.rs). Since the JS/Python Model.* accessors previously used project serialization, callers implicitly got a stable, canonical order. With this change, the same model can return a different order depending on import/edit history, which can break deterministic outputs and UI/tests that assume sorted lists. Consider sorting the collected vectors before serializing so these APIs match the prior behavior.

Useful? React with 👍 / 👎.

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 71.28713% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.79%. Comparing base (e3daec4) to head (a4bdac1).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/libsimlin/src/lib.rs 71.28% 29 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #219      +/-   ##
==========================================
- Coverage   76.82%   76.79%   -0.03%     
==========================================
  Files          69       69              
  Lines       20523    20624     +101     
==========================================
+ Hits        15766    15839      +73     
- Misses       4757     4785      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bpowers bpowers closed this Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants