HF Kernel for CPU: AMX, AVX2, AVX512 optimized#2232
HF Kernel for CPU: AMX, AVX2, AVX512 optimized#2232Qubitium merged 12 commits intoModelCloud:mainfrom
Conversation
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
kernel PR merged huggingface/kernels-community#81, pls make it to ready for review once kernel binary propagated, @jiqing-feng |
|
Hi @Qubitium . This PR is ready to be reviewed. |
|
@jiqing-feng Awesome. We now are approaching a threshold where we have more mature CPU kernels than GPU ones thanks to Intel. =) Please add And maybe change the |
|
@jiqing-feng One thing. Please add |
|
Hi @Qubitium . I have fixed your comments. Please verify it cause I cannot load the test model |
Ok. Thanks. I will run the unit test and merge after it passes. |
|
CI tests passed |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
Hi @Qubitium . Please let me know what needs to be changed before merging. Thanks. |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Add HF Kernel for CPU, can get significant speed-up on TTFT compared to torch_fused.
Requires review after kernels-community is ready.