Releases: tile-ai/tilelang
Releases · tile-ai/tilelang
v0.1.2.post1
Why we need this post release?
The v0.1.2 prebuild package used a legacy cython file, which may lead to some bugs.
What's Changed
- [Docker] Add libstdcxx-ng-12 to Dockerfiles for CUDA versions by @LeiWang1999 in #160
- Add cpu jit with backend ctypes by @xs-keju in #154
- [Carver] Multi-Threads Compilation for Fast Auto Tuning by @SiriusNEO in #156
- [Refactor] Replace T.If with native Python if statement for mla paged kernel by @LeiWang1999 in #162
- [Enhancement] Improve CUDA path detection by @xwhzz in #157
- [Refactor] Replace
T.thread_binding
withT.get_thread_binding
in examples and test cases by @LeiWang1999 in #163 - [Bugfix] Cast bool dtype into int8 in blocksparse examples by @LeiWang1999 in #167
- [Example] Implement NSA Decode tilelang exampls by @LeiWang1999 in #168
New Contributors
Full Changelog: v0.1.2...v0.1.2.post1
v0.1.2
What's Changed
- [Dev] Add MLA and GQA decode examples by @chengyupku in #109
- [Example] Add Split-K and Stream-K Examples and move MLA from fld to mla by @LeiWang1999 in #110
- [Typo] Fix a typo in gemm splitk examples by @LeiWang1999 in #111
- [Typo] Fix links in installation instructions in README.md by @xwhzz in #112
- [Typo] Fix formatting in installation instructions in README.md by @xwhzz in #113
- [Benchmark] Add benchmark scripts for block sparse attention by @LeiWang1999 in #114
- [Dev] Support vectorized value pack and atomicAdd for BFloat16 DType by @LeiWang1999 in #116
- [Bugfix] Bugfix of pass order for hopper by @chengyupku in #117
- [Dev] Update MLA decode kernel by @chengyupku in #120
- [Example] Add GQA Example by @LeiWang1999 in #118
- [Example] Implement TileLang Native Sparse Attention Kernel by @LeiWang1999 in #121
- [Doc] Update README.md with new example links for Flash MLA Decoding and Native Sparse Attention by @chengyupku in #122
- [Example] Update GEMM FP8 Example by @LeiWang1999 in #123
- [Dev] Add RetNet Linear Attention example by @chengyupku in #124
- [JIT] Enhance cython/ctypes wrapper for tma descriptor by @LeiWang1999 in #126
- [Dev][Bugfix] Fix bug in ThreadTagChecker; Add WgmmaSync rewriter and add MHA WGMMA pipelined example by @chengyupku in #128
- [Dev] Remove buffer flatten when debug print a shared buffer by @LeiWang1999 in #129
- [Debug] Support
T.print
forfragment
scope by @LeiWang1999 in #130 - [Example] Implememt FMHA Varlen Example by @LeiWang1999 in #131
- [Refactor] Set default log level from waning into info by @LeiWang1999 in #132
- [Kernel] Implement different SEQ Q/KV examples with block sparse by @LeiWang1999 in #133
- [Dev][Doc] Add DeepSeek MLA Decode Example with Documentation and Performance Benchmarks by @chengyupku in #134
- [Doc] Update MLA Documentation by @chengyupku in #135
- [Debug] Improve Memory Layout Plot by @LeiWang1999 in #136
- [Doc] Add MLA Decoding Performance Benchmarks and Documentation by @chengyupku in #137
- [Bugfix] Add missing definition for AtomicAdd by @LeiWang1999 in #138
- [Dev][Doc] Enhance Flash Attention Implementation in GQA Decoding Example and Fix Typo by @chengyupku in #139
- [Dev] Adjust computation logic to avoid precision loss when casting acc_s from float to float16 by @chengyupku in #141
- [Refactor] Rename gemm fp8 example as we currently lack
T.gemm
support for fp8 by @LeiWang1999 in #144 - [Enhancement] Support debug print for unsigned char datatype by @LeiWang1999 in #145
- [Enhancement] Enable runtime tensor data type validation by @LeiWang1999 in #146
- [Refactor] Adapt Caver to benchmark by @LeiWang1999 in #148
- [Refactor] Remove BitBLAS Import Check in Benchmark by @SiriusNEO in #150
- [Enhancement] Optimize TileLang install scripts with Dynamic CPU Cores by @LeiWang1999 in #152
- [Carver] Enhance Carver Adaptation for MatMul Benchmarking by @LeiWang1999 in #153
- [Dev][Benchmark] Add MLA paged decoding example and benchmark script by @chengyupku in #158
- [Release] Bump Version to v0.1.2 by @LeiWang1999 in #155
New Contributors
- @SiriusNEO made their first contribution in #150
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
- [Doc] Update release news by @LeiWang1999 in #80
- [Doc] Convert docs from rst format to Markdown format. by @xwhzz in #82
- [Bugfix] Bugfix of installing with develop mode by @LeiWang1999 in #81
- [WHL] Support whl building for different python versions via tox by @LeiWang1999 in #83
- [Refactor] Separate tilelang Pass Thread Sync (with Hopper support) from tvm by @LeiWang1999 in #85
- [Backend][WebGPU] Support WebGPU WGSL code generation by @LeiWang1999 in #86
- [Wheel] Support pypi build scripts for different python via tox by @LeiWang1999 in #93
- [Wrap] Use a ctypes-based kernel wrapper instead of dlpack for runtime efficiency by @LeiWang1999 in #95
- [Bugfix] Update Dockerfile.cu120 by @LeiWang1999 in #98
- [Bugfix] Put
InjectPtxAsyncCopy
Pass behindThreadSync
Pass by @LeiWang1999 in #97 - [Feature] Add CTypes JIT kernel support by @LeiWang1999 in #100
- [Docker] Add Dockerfiles for multiple CUDA versions by @LeiWang1999 in #103
- [JIT] Support Cython jit and make cython a default execution backend by @LeiWang1999 in #102
- [Refactor] Phrase out torch cpp extension backend by @LeiWang1999 in #104
- [Wheel] Provide a bare docker scripts to help build wheels for manylinux by @LeiWang1999 in #105
- [Example] Implement simple block sparse kernel by @LeiWang1999 in #106
- [Release] Bumpy version to v0.1.1 by @LeiWang1999 in #107
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's Changed
- [LICENSE] Add LICENSE for flashinfer by @LeiWang1999 in #19
- [Doc] Fix installation scripts and docs for dequantize gemm by @LeiWang1999 in #20
- [Doc] Use sphinx to generate docs. by @xwhzz in #21
- [Doc] update installation.md and readme by @Cunxiao2002 in #22
- [Doc] fix a typo in installation.rst by @Cunxiao2002 in #24
- [Doc] Remove legacy files and update reference by @LeiWang1999 in #25
- [CI][Test] Add test cases for tilelang transform
AnnotateDeviceRegions
andMakePackedAPI
by @LeiWang1999 in #26 - [Doc] Create a workflow to host docs using GitHub Pages. by @xwhzz in #28
- [CI][Test] Add test cases for tilelang transform InjectSoftwarePipeline and FrontendLegalize by @Cunxiao2002 in #30
- [Bugfix] Replace thread binding detector in LayoutInference Pass by @LeiWang1999 in #31
- [CI] Comprehensive Test cases Implementation of Matmul Dequantize by @LeiWang1999 in #32
- [Doc] Update GitHub Actions workflow for documentation deployment and add CNAME file. by @xwhzz in #33
- [Refactor] Simplify interface via replacing argument thread binding of intrinsics with
KernelFrame.Current
by @LeiWang1999 in #34 - [Bugfix] Reorder Passes: Place Vectorize Loop Before StorageFlatten and FlattenBuffer to Prevent Redundant Allocations by @LeiWang1999 in #37
- [Doc] Update documentation structure and content by @LeiWang1999 in #39
- [Doc][CI] Update GitHub Actions workflow for documentation build and deployment. by @xwhzz in #42
- [CI] Allow manual triggering of documentation workflow in addition to… by @xwhzz in #43
- [CI][Test] Add test cases for tilelang transform PipelinePlanning by @Cunxiao2002 in #44
- [CI][Test] Add test cases for tilelang transform
LayoutInference
andLowerTileOp
on loop tail split functionality by @tzj-fxz in #29 - [Debug] Introduce
T.print
for buffer and variables logging on frontend by @LeiWang1999 in #45 - [CI] Change pull request trigger to
pull_request_target
for documen… by @xwhzz in #48 - [Dev] Add FlashDecoding example by @chengyupku in #46
- [Doc] update README that tilelang has been used in AttentionEngine by @smallscientist1 in #50
- [Doc] Remove unnecessary layout annotation by @LeiWang1999 in #49
- [CI][Test] Add test cases for tilelang kernel convolution by @chengyupku in #51
- [Dev] Implement test case for tilelang transformations by @LeiWang1999 in #53
- [CI][Test] Add test cases for tilelang kernel FlashAttention by @chengyupku in #54
- [CI][Test] Add test cases for element_add by @Cunxiao2002 in #47
- [CI] Clean up target repository before publishing documentation. by @xwhzz in #55
- [CI][Test] Add test cases for tilelang transform ClusterPlanning by @chengyupku in #57
- [Doc] Append debug relevant testing and documentations by @LeiWang1999 in #58
- [CI][Test] Add test cases for tilelang transform LowerHopperIntrin by @chengyupku in #59
- [Doc] Add matmul kernel tutorial with tile library by @LeiWang1999 in #60
- [Dev] Separate
LoopVectorize
Pass from upstream tvm by @LeiWang1999 in #62 - [Dev] Support FP8 Codegen for cuda backend by @LeiWang1999 in #64
- [Dev] Add test case for bfloat16 and int4 gemm with mma by @LeiWang1999 in #65
- [CI][Test] Add test cases for tilelang transform InjectFenceProxy by @chengyupku in #66
- [Tools] Introduce
plot_layout
to visualize the fragment layout by @LeiWang1999 in #68 - [Dev] Remove unnecessary python dependencies by @LeiWang1999 in #69
- [Carver] Introduce a tile-structure based cost model for auto tuning by @LeiWang1999 in #70
- [Bugfix] bug fix for bitblas dependency by @LeiWang1999 in #71
- [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized by @chengyupku in #72
- [CostModel][Carver] Support Hint Recommend for Shared memory Kernel Fusion by @LeiWang1999 in #73
- [Carver] Remove legacy todo items in carver's readme by @LeiWang1999 in #74
- [Dev] Add mha backward example by @chengyupku in #77
- [Release] Bump version into v0.1.0 by @LeiWang1999 in #76
New Contributors
- @xwhzz made their first contribution in #21
- @Cunxiao2002 made their first contribution in #22
- @tzj-fxz made their first contribution in #29
- @chengyupku made their first contribution in #46
- @smallscientist1 made their first contribution in #50
Full Changelog: v0.0.1...v0.1.0
TileLang v0.0.1 Pre-release
Pre-release for the v0.0.1. Under testing, Only cuda prebuilt are provided.
What's Changed
- [Doc] Update the example figures in README by @LeiWang1999 in #3
- [Doc] Replace SVG Figures with PNG due to some format issues by @LeiWang1999 in #4
- [Dev][Language] Separate Base AST with Sugar Syntax by @LeiWang1999 in #9
- [Dev] Enhance examples on README by @LeiWang1999 in #10
- [Doc] Revert repo link by @LeiWang1999 in #11
- [Dev][jit] Introduce jit for kernel functions by @LeiWang1999 in #12
- Update README.md by @rkinas in #14
- [CI] Remove Code QL workflow by @LeiWang1999 in #16
- [Doc] Add benchmark link in README by @LeiWang1999 in #17
- [Release] Bump Version into 0.0.1 by @LeiWang1999 in #18
New Contributors
- @LeiWang1999 made their first contribution in #3
- @rkinas made their first contribution in #14
Full Changelog: https://github.com/tile-ai/tilelang/commits/v0.0.1