Inquiry about the implementation feasibility of a FFT-based algorithm using this lib #46

Vandermode · 2025-10-20T13:55:37Z

Vandermode
Oct 20, 2025

Dear Developer @markjolah ,

Thank you very much for developing this fantastic lib!

I am wondering if I can leverage this lib to implement a FFT-based algorithm for scientific computing purpose.

The algorithm of interest basically can be viewed as an iterative execution of a sequence of FFT-based 2D convolutions (as follows)

def multislice_propagate_vkfft(prop_cfg, vkfft_app_conv, field, aperture, z_coords, surface_height, H_source, H_dest):  # this assumes only two media involved (doe and air)
    pad_y, pad_x = prop_cfg.pad_yx
    
    for z in z_coords:
        source_slice = (z < surface_height).astype(field.dtype)
        
        field_source = cp.pad(field, pad_width=prop_cfg.pad_width, mode="constant", constant_values=0)
        field_dest = field_source.copy()
        # this is a FFT-based convolution (rather than a single FFT) --- 2D-FFT -> elementwise multiplication with H_source -> 2D IFFT
        field_source = vkfft_app_conv.fft(field_source, convolve_kernel=H_source)  
        # this is a FFT-based convolution (rather than a single FFT) --- 2D-FFT -> elementwise multiplication with H_dest -> 2D IFFT
        field_dest = vkfft_app_conv.fft(field_dest, convolve_kernel=H_dest)
        field_source = field_source[..., pad_y:-pad_y, pad_x:-pad_x]
        field_dest = field_dest[..., pad_y:-pad_y, pad_x:-pad_x]
        field = field_source * source_slice + field_dest * (1 - source_slice)
        field = field * aperture
        
    return field

for each iteration, we perform two FFT-based convolutions (with two fixed kernels) on the input, and weighted combine the results to get the input for next iteration.
For real-world task, hundreds of thousand iterations might be needed to finish one round, making the implementation efficiency especially matters.

Currently, I am using the high-performant VkFFT lib's fused convolution kernel (https://github.com/DTolm/VkFFT) which is more efficient than the seperate CuFFT calls.

I am curious if we can implement the whole iterative algorithm as a single kernel using this lib (rather than just fusing the convolution op)? That's said, is there any space I can exploit to further accelerate this implementation?

Thanks a lot!

samaid · 2025-10-22T16:25:57Z

samaid
Oct 22, 2025
Maintainer

Hi @Vandermode
Thank you for sharing your use case with us. Let us take a closer look into this. We will get back to you

1 reply

Vandermode Oct 22, 2025
Author

thanks! Please let me know if any algorithm details are unclear:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inquiry about the implementation feasibility of a FFT-based algorithm using this lib #46

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Inquiry about the implementation feasibility of a FFT-based algorithm using this lib #46

Uh oh!

Uh oh!

Vandermode Oct 20, 2025

Replies: 1 comment · 1 reply

Uh oh!

samaid Oct 22, 2025 Maintainer

Uh oh!

Vandermode Oct 22, 2025 Author

Vandermode
Oct 20, 2025

Replies: 1 comment 1 reply

samaid
Oct 22, 2025
Maintainer

Vandermode Oct 22, 2025
Author