-
-
Notifications
You must be signed in to change notification settings - Fork 889
Open
grepdemos/ImageSharp
#3Milestone
Description
As @saucecontrol pointed out in his comment, we can get rid of VPERMS in the following code:
ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs
Lines 104 to 112 in e2211c3
| result256_0 = Fma.MultiplyAdd( | |
| Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef), | |
| Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)bufferStart).AsSingle(), mask), | |
| result256_0); | |
| result256_1 = Fma.MultiplyAdd( | |
| Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)), | |
| Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)(bufferStart + 2)).AsSingle(), mask), | |
| result256_1); |
If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:
ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs
Lines 115 to 120 in e2211c3
| public static ResizeKernelMap Calculate<TResampler>( | |
| in TResampler sampler, | |
| int destinationSize, | |
| int sourceSize, | |
| MemoryAllocator memoryAllocator) | |
| where TResampler : struct, IResampler |
saucecontrol