Skip to content

Conversation

@maddyscientist
Copy link
Member

(this isn't ready to be merged, I'm creating this PR as a placeholder.)

…oduce new CMake set types: real_t (QUDA_SCALAR_TYPE) - the host side scalar precision, complex_t the complex version of this (replaces Complex), device_reduce_t (QUDA_REDUCTION_TYPE). Eventually we will be able to set these to non-double types, but we're there yet....
…educe_t are different types, e.g., double vs doubledouble
… a different type (needed when copying from deviation_t<doubledouble> to deviation_t<double> for example
…ble (need to split into 64-bit words) and small generic cleanup
…so updates the coalesced writing to sysmem to work with large reduce_t types, such that sizeof(device_reduce_t) / sizeof(atomic_type<device_reduce_t>) > warp_size, which previously was a restriction: we now use a warp-stride loop to do a coalesced write to sysmem
…MP at present and just a simple gather method for now
…to real_t done after the multi-process reduction
…r direct comparisons, use max error not error sum when multiple norms are used to check correctness, print out the deviation when verbosity >= QUDA_VERBOSE
…itations representing this being WIP (bin bounds LUT repeatadly recomputed on the host, bin bounds LUT presently in explicit constant, CG reduction not supported, warp reductions rather register heavy, etc.)
@maddyscientist maddyscientist requested review from a team as code owners March 18, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant