[POC] Metal via tvm-ffi #18634

oraluben · 2026-01-05T03:41:47Z

No description provided.

…p setup.py by removing unused import

…command for tvm_cython target

…cordingly

…of BoolOp nodes, improving code clarity.

… details in the error message for better debugging context.

…ity and maintainability.

…ctRef> to Map<String, Any> for improved flexibility.

…bility.

… compatibility." This reverts commit 9574805.

…ing variable naming consistency.

… Map<String, Any> for enhanced flexibility in handling annotations.

…tilelang_main

…ith various attributes for enhanced GPU compatibility (apache#7) Co-authored-by: xinyxiao <[email protected]>

* Add tilelang assume attribute to support custom assumption * Add constraint guard in IRMutator

* Add tilelang assume attribute to support custom assumption * Add constraint guard in IRMutator * Fix typo in IR mutator

… improved clarity and efficiency.

- Added support for processing container types like Array that may contain Vars, Buffers, Exprs, and Stmts within the IRConvertSSA class. - Implemented logic to rewrite elements in the container, ensuring proper remapping of variables and buffers. - Improved the mutator's ability to detect changes in the container, updating the value accordingly.

…tilelang_main

* fix z3 for macos * upd

- Introduced a mechanism to track visiting variables using an unordered set to prevent infinite loops during evaluation. - Added comments to clarify the purpose of the new logic for detecting cycles in variable dependencies.

…tilelang_main

…ammatic Dependent Launch and cuLaunchCooperativeKernel (apache#18) * [CUDA][FFI] Add support for Programmatic Dependent Kernel Launch (PDL) in TVM CUDA FFI * tir: add launch param tag for programmatic dependent launch * tir: add param tag for cuLaunchCooperativeKernel --------- Co-authored-by: senhtry <[email protected]>

…tilelang_main

- Introduced an `annotations` field in the `CallNode` class to store additional metadata for lowering passes. - Updated the `Call` constructor and related methods to accept and handle the new `annotations` parameter. - Modified existing calls to `Call` to include the `annotations` argument where applicable, ensuring backward compatibility. - Enhanced the Python interface for the `Call` class to support annotations, improving usability for users needing to pass extra information during function calls.

gemini-code-assist · 2026-01-05T03:44:17Z

Summary of Changes

Hello @oraluben, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances TVM's arithmetic analysis capabilities by integrating the Z3 SMT solver, allowing for more sophisticated symbolic proofs and constraint satisfaction. It also improves the Analyzer's state management through deep cloning and expands Python compatibility to version 3.8. Core TIR expressions like Call now support annotations for richer metadata, and scheduling primitives like ReIndex offer more control over simplification. Additionally, the PR includes improvements to C host code generation for better debugging, and introduces advanced features for Metal and CUDA runtime environments.

Highlights

Z3 SMT Solver Integration: Introduced a new Z3Prover class and CMake options to integrate the Z3 SMT solver into the arithmetic analyzer, enabling more powerful symbolic proof capabilities for expressions.
Analyzer Deep Cloning: The Analyzer class now supports deep cloning, allowing its entire internal state, including all sub-analyzers, to be copied, which is crucial for advanced analysis scenarios.
Python 3.8 Compatibility: The minimum Python requirement for TVM has been lowered from 3.9 to 3.8, broadening the supported development environments.
TIR Call Annotations: The tir.Call expression now includes an annotations field, allowing arbitrary metadata to be attached to calls, which can be leveraged by various lowering passes.
ReIndex Simplification Control: The ReIndex schedule primitive gains a new skip_simplify parameter, providing finer control over whether indices are simplified during reindexing operations.
Enhanced C Host Assert Messages: Generated C host code now includes actual left-hand side and right-hand side values in assert messages for equality checks, significantly improving debugging clarity.
Metal Stream Integration: New CBStream and SetMetalStream functionalities have been added to improve integration with Metal command buffers, potentially benefiting frameworks like PyTorch MPS.
Advanced CUDA Launch Configurations: The CUDA kernel launch mechanism has been updated to support programmatic dependent launch and cooperative launch, alongside more robust handling of dynamic shared memory.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new functionality, most notably the integration of the Z3 SMT solver for more powerful expression proving within the arithmetic analyzer. It also includes changes to support Metal integration via tvm-ffi, likely for interoperability with frameworks like PyTorch on Apple Silicon. Beyond these major features, there are numerous improvements and fixes across the codebase, including enhanced boolean simplification, better handling of dynamic shared memory in CUDA, and Python 3.8 compatibility fixes. My review focuses on a few potentially problematic changes that could affect correctness or cross-platform compatibility, such as the removal of validation checks in TIR nodes and platform-specific code that isn't properly guarded.

I am having trouble creating individual review comments. Click here to see my feedback.

python/tvm/base.py (45)

The flag os.RTLD_LAZY is not available on Windows. This line will raise an AttributeError when running on Windows. This change should be guarded by a platform check, for example if sys.platform != 'win32':.

    if sys.platform.startswith("win32"):
        lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
    else:
        lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL | os.RTLD_LAZY)

include/tvm/topi/transform.h (1294-1296)

The out-of-bounds value is hardcoded to std::numeric_limits<float>::quiet_NaN(). This will cause issues if the tensor's data type (a->dtype) is not a float (e.g., int, bfloat16). It would be safer to use a type-aware NaN, for example by creating a helper similar to tvm::nan(dtype) that can be used here.

src/tir/ir/expr.cc (796-799)

The check that ensures only the last index of a buffer access can be a vector has been commented out. This seems like a significant change in validation logic. If this is intentional to support more general vectorized access patterns, it could have broad implications for downstream passes that might not be prepared to handle this. Could you clarify the reasoning for this change?

src/tir/ir/stmt.cc (248-253)

The check IsPointerType(buffer_var->type_annotation, dtype) has been removed from the Allocate constructor. This check verified that the data type of the allocation matches the element type of the pointer variable. Removing it could potentially lead to type mismatches that are harder to debug later. What was the motivation for removing this check?

src/target/source/codegen_c.cc (936-948)

This change comments out a deep_equal_ check that prevents a Var from being rebound to a different value within a LetNode. Removing this check might hide potential bugs or lead to incorrect code generation if a variable is indeed redefined with a different expression. Could you clarify the reason for removing this check? If it's no longer needed, a comment explaining why would be helpful. Otherwise, it seems safer to keep this assertion.

CMakeLists.txt (796)

This change comments out the FILE_PREFIX_MAP_FLAG. This flag is useful for creating reproducible builds and for debugging by mapping relative source paths to absolute paths in the debug info. Was removing this intentional? If it was for temporary debugging, it should probably be restored.

src/target/z3/z3_prover_on.cc (111)

The rlimit is set using a float literal 1e4. While this will likely be converted correctly to an integer, it's clearer and safer to use an integer literal 10000 for an unsigned integer parameter.

    SetRLimit(10000);

Hzfengsy and others added 30 commits June 24, 2025 10:42

TVM Patch for TileLang

f77817b

Update CMakeLists.txt to include Python include directory and clean u…

3427445

…p setup.py by removing unused import

phaseout ck dependency

d230129

phaseout flashinfer

2139f47

phase out vta

9249de3

support T.address_of(B[i, j])

39d113b

Fix CMakeLists.txt to remove unnecessary '-I' flag from Python build …

3c72b8f

…command for tvm_cython target

Merge branch 'main' of https://github.com/apache/tvm into upstream-dev

ce08d9c

c api fix

9611cc7

[FFI] Remove unused Grid constant and add HANDLE_TO_REFERENCE conversion

493f937

preserve unit loop for reindex scheduling.

9a00cd6

Add skip_simplify option to reindex method for improved index handling

fc29e7b

fix

5cc56c9

Update LetFrameNode to allow mutable value and register reflection ac…

763f196

…cordingly

Refactor argument extraction in ExprEvaluator to streamline handling …

ab733d1

…of BoolOp nodes, improving code clarity.

Enhance error reporting in IndexMapInverseImpl by including index map…

ccc68f5

… details in the error message for better debugging context.

Remove redundant type check in Allocate constructor for improved clar…

555cc71

…ity and maintainability.

Change annotations type in Allocate constructor from Map<String, Obje…

d39953f

…ctRef> to Map<String, Any> for improved flexibility.

Update minimum Python version requirement from 3.9 to 3.8 for compati…

9574805

…bility.

Revert "Update minimum Python version requirement from 3.9 to 3.8 for…

a08b7c3

… compatibility." This reverts commit 9574805.

Refactor stride naming in Namer to use name_hint when defined, improv…

cb0fd6d

…ing variable naming consistency.

Refactor MergeAnnotations function to accept Map<Any, Any> instead of…

e11521e

… Map<String, Any> for enhanced flexibility in handling annotations.

Merge branch 'tilelang_main' of https://github.com/TileLang/tvm into …

e5558ac

…tilelang_main

phaseout legacy components

5a433cc

Add support for 'tir.exp2' operation and register 'hip' target kind w…

a64a592

…ith various attributes for enhanced GPU compatibility (apache#7) Co-authored-by: xinyxiao <[email protected]>

Add tilelang assume attribute to support custom assumption (apache#9)

1a07fda

Add tl.assume attr in tvm (apache#10)

ee6d522

* Add tilelang assume attribute to support custom assumption * Add constraint guard in IRMutator

kurisu add assume attr patch 1 (apache#11)

1fc7578

* Add tilelang assume attribute to support custom assumption * Add constraint guard in IRMutator * Fix typo in IR mutator

Refactor buffer allocation logic in IRBuilder to use GetLastFrame for…

eddefbd

… improved clarity and efficiency.

Refactor BlockReadWriteDetector analysis on BlockRealizeNode

87b845f

LeiWang1999 and others added 27 commits December 14, 2025 16:12

Merge branch 'tilelang_main' of https://github.com/TileLang/tvm into …

68aa846

…tilelang_main

fix bool bug in z3

20a5922

remove z3

d730446

simplify z3 integration

050815c

delete z3 include in z3_prover_off.cc

7514242

fix z3 for macos (apache#15)

78b4caf

* fix z3 for macos * upd

patch z3 when building tvm

1dde5c8

fix typo

d9ccc03

add comment to print z3 soname

c43fd9b

Merge branch 'z3' into tilelang_main

4d3ec92

Analyzer: require loop extent > 0 when entering loop

0a7a6ea

fix floordiv & floormod converting in z3 prover

8f4da61

fix when patchelf not found (apache#16)

88778fa

use static Z3 context

6dc8b76

Update Z3 context to be thread-local for improved thread safety

79ed747

Update library loading to use lazy loading

03ad7cc

Add cyclic dependency detection in IntervalSetEvaluator

1eeadc6

- Introduced a mechanism to track visiting variables using an unordered set to prevent infinite loops during evaluation. - Added comments to clarify the purpose of the new logic for detecting cycles in variable dependencies.

Merge branch 'tilelang_main' of https://github.com/TileLang/tvm into …

315036d

…tilelang_main

Remove Z3 subproject as it is no longer needed in the repository.

d9d3e9d

Add a rewrite pattern

62af333

[Z3] change z3 timeout to determinstic rlimit

ce96c60

Merge branch 'tilelang_main' of https://github.com/TileLang/tvm into …

b487ec4

…tilelang_main

Merge commit '8ae9be35a' into tilelang_main

23bce01

POC for metal w. tvm-ffi

e1d4a29

oraluben closed this Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[POC] Metal via tvm-ffi #18634

[POC] Metal via tvm-ffi #18634

oraluben commented Jan 5, 2026

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[POC] Metal via tvm-ffi #18634

[POC] Metal via tvm-ffi #18634

Conversation

oraluben commented Jan 5, 2026

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

python/tvm/base.py (45)

include/tvm/topi/transform.h (1294-1296)

src/tir/ir/expr.cc (796-799)

src/tir/ir/stmt.cc (248-253)

src/target/source/codegen_c.cc (936-948)

CMakeLists.txt (796)

src/target/z3/z3_prover_on.cc (111)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants