Enhance Graph.update() and add whole-graph update tests by Andy-Jost · Pull Request #1843 · NVIDIA/cuda-python

Andy-Jost · 2026-03-31T18:25:13Z

Extend tests of the exsiting Graph.update function and refactor existing graph code in preparation for further work.

Summary

Extends Graph.update() to accept both GraphBuilder and GraphDef as sources, giving users flexibility to update instantiated graphs from either the stream-capture or explicit-graph API
Surfaces detailed CUgraphExecUpdateResultInfo on update failure (reason enum + docstring) instead of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE
Splits the monolithic _graphdef.pyx (2000+ lines) into a _graph_def/ subpackage with three focused modules for maintainability
Reorganizes graph test files into thematic groups with module docstrings
Adds new tests for whole-graph update covering happy paths and error cases

Changes

cuda/core/_graph/_graph_builder.pyx: Refactored Graph.update() to dispatch on GraphBuilder vs GraphDef, call cuGraphExecUpdate with a CUgraphExecUpdateResultInfo struct, and raise a descriptive CUDAError on failure
cuda/core/_graph/_graph_def/: Split _graphdef.pyx into _graph_def.pyx (Condition, GraphAllocOptions, GraphDef), _graph_node.pyx (GraphNode base class and builder methods with GN_* inline helpers), and _subclasses.pyx (all concrete node subclasses). Handle property annotations updated to use driver.* types consistently.
tests/graph/: Renamed test files to reflect their scope (test_graph_builder.py, test_graph_builder_conditional.py, test_graph_memory_resource.py, test_graph_update.py, test_graphdef*.py, test_device_launch.py); added module docstrings; moved tests to appropriate files
tests/graph/test_graph_update.py: Added parametrized test_graph_update_kernel_args (GraphBuilder + GraphDef), test_graph_update_conditional, test_graph_update_unfinished_builder, test_graph_update_topology_mismatch, test_graph_update_wrong_type

Test Coverage

Parametrized happy path: kernel-only graph updated with new pointer args, tested via both GraphBuilder and GraphDef
Conditional switch update: existing test (renamed) exercising topology-compatible conditional graph updates
Unfinished builder: ValueError when source GraphBuilder hasn't finished capturing
Topology mismatch: CUDAError with descriptive reason from CUgraphExecUpdateResultInfo
Wrong type: TypeError for invalid argument types

Related Work

Closes phase 1 of CUDA graph phase N - graph updates #1330 (CUDA graph updates)
Phase 2–4 will add edge mutation, node property setters, and exec-level node updates

Rename test files to reflect what they actually test: - test_basic -> test_graph_builder (stream capture tests) - test_conditional -> test_graph_builder_conditional - test_advanced -> test_graph_update (moved child_graph and stream_lifetime tests into test_graph_builder) - test_capture_alloc -> test_graph_memory_resource - test_explicit* -> test_graphdef* Made-with: Cursor

- Extend Graph.update() to accept both GraphBuilder and GraphDef sources - Surface CUgraphExecUpdateResultInfo details on update failure instead of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE message - Release the GIL during cuGraphExecUpdate via nogil block - Add parametrized happy-path test covering both GraphBuilder and GraphDef - Add error-case tests: unfinished builder, topology mismatch, wrong type Made-with: Cursor

github-actions · 2026-03-31T18:41:47Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1843/
https://nvidia.github.io/cuda-python/pr-preview/pr-1843/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1843/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1843/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

- Chain GraphDef kernel launches sequentially (n.launch instead of g.launch) to avoid concurrent writes to the same memory location - Update GraphDef.handle and GraphNode.handle annotations to reflect that as_py returns driver types (CUgraph, CUgraphNode), not int Made-with: Cursor

The monolithic _graphdef.pyx (2000+ lines) is split into three focused modules under _graph_def/: _graph_def.pyx (Condition, GraphAllocOptions, GraphDef), _graph_node.pyx (GraphNode base class and builder methods), and _subclasses.pyx (all concrete node subclasses). Long method bodies in GraphNode are factored into cdef inline GN_* helpers following existing codebase conventions. Handle property annotations updated to use driver.* types consistently. Made-with: Cursor

Andy-Jost · 2026-03-31T21:02:16Z

_graphdef.pyx was broken into 3 parts under _graph_def/. No need to review those in detail.

Update two references that still used _graphdef instead of _graph_def after the subpackage split. Made-with: Cursor

rwgk

I'm assuming we don't have to worry about the import/cimport export surface, is that a valid assumption?

rwgk · 2026-04-01T16:51:24Z

cuda_core/cuda/core/_graph/_graph_builder.pyx

+        cdef cydriver.CUresult err
+        with nogil:
+            err = cydriver.cuGraphExecUpdate(cu_exec, cu_graph, &result_info)
+        if err != cydriver.CUresult.CUDA_SUCCESS:


I used Cursor GPT-5.4 1M High to "comb" through this "very complex and very large" PR. It only found this one "High" item:

I think this would be a bit safer if it distinguished the graph-update failure case from ordinary driver errors, e.g.

cdef cydriver.CUgraphExecUpdateResultInfo result_info cdef cydriver.CUresult err with nogil: err = cydriver.cuGraphExecUpdate(cu_exec, cu_graph, &result_info) if err == cydriver.CUresult.CUDA_SUCCESS: return if err == cydriver.CUresult.CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE: reason = driver.CUgraphExecUpdateResult(result_info.result) msg = f"Graph update failed: {reason.__doc__.strip()} ({reason.name})" raise CUDAError(msg) raise CUDAError(err)

Rationale:

Using cydriver.cuGraphExecUpdate(...) directly here makes sense, since the higher-level binding drops resultInfo on non-success and would lose the detailed update reason entirely.

But resultInfo appears to be the structured explanation for the specific CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE path, not necessarily for every possible non-success CUresult.

Even when result_info.result == CU_GRAPH_EXEC_UPDATE_ERROR, the enum docs say the actual explanation is described by the function return value. The current code discards err, so it may collapse distinct driver failures into the same generic resultInfo-based message.

This shape preserves the nice detailed message for graph-update incompatibilities while still surfacing ordinary driver errors accurately.

According to the docs, cuGraphExecUpdate only returns CUDA_SUCCESS or CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE.

"very complex and very large"

To be clear, nearly all of this change is refactoring and code movement. The graph tests were regrouped slightly and renamed. The huge _graphdef module was split into three parts. The are not many functional changes here.

According to the docs, cuGraphExecUpdate only returns CUDA_SUCCESS or CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE.

Documentation tends to be imprecise, or become imprecise over time without anyone noticing.

The suggested change improves the quality of implementation at a very small cost.

I checked the driver code and the docs are indeed incorrect.

Andy-Jost · 2026-04-01T19:00:04Z

I'm assuming we don't have to worry about the import/cimport export surface, is that a valid assumption?

This is correct. We do not make any guarantees whatsoever about Cython interface stability. The public Python API consists of what we expose at cuda.core. All of these submodules are private and have underscore-prefixed names.

Made-with: Cursor

Check for CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE first to provide the rich error message with the update result reason, then fall through to HANDLE_RETURN for any other error code (CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED, etc.) or success. Made-with: Cursor

Andy-Jost added 2 commits March 31, 2026 10:05

Andy-Jost added this to the cuda.core v1.0.0 milestone Mar 31, 2026

Andy-Jost added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Mar 31, 2026

Andy-Jost self-assigned this Mar 31, 2026

Andy-Jost requested review from cpcloud, leofang, mdboom, rparolin and rwgk and removed request for leofang March 31, 2026 18:25

Andy-Jost added 2 commits March 31, 2026 12:16

Andy-Jost added 2 commits March 31, 2026 15:37

Merge remote-tracking branch 'origin/main' into graph-updates

17715f2

Fix stale _graphdef import paths after subpackage rename

60964fd

Update two references that still used _graphdef instead of _graph_def after the subpackage split. Made-with: Cursor

rwgk reviewed Apr 1, 2026

View reviewed changes

Andy-Jost added 2 commits April 1, 2026 17:38

Assert specific error code from cuGraphExecUpdate

e924fde

Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Graph.update() and add whole-graph update tests#1843

Enhance Graph.update() and add whole-graph update tests#1843
Andy-Jost wants to merge 8 commits intoNVIDIA:mainfrom
Andy-Jost:graph-updates

Andy-Jost commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost commented Mar 31, 2026

Uh oh!

rwgk left a comment

Uh oh!

rwgk Apr 1, 2026

Uh oh!

Andy-Jost Apr 1, 2026

Uh oh!

Andy-Jost Apr 1, 2026

Uh oh!

rwgk Apr 1, 2026

Uh oh!

Andy-Jost Apr 2, 2026

Uh oh!

Andy-Jost commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Andy-Jost commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Coverage

Related Work

Uh oh!

github-actions bot commented Mar 31, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost commented Mar 31, 2026

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

rwgk Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Andy-Jost commented Mar 31, 2026 •

edited

Loading