Skip to content

[nimbus] Reduce use of allocas #578

@mratsim

Description

@mratsim

Nimbus EL is compiling with the flag -Werror=stack-usage= and this errors on allocas.

In particular this one https://github.com/status-im/nimbus-eth1/actions/runs/15843672213/job/44660951226?pr=3389#step:12:102 from status-im/nimbus-eth1#3389 is about this batch curve coordinates conversion:

Image

In practice, batchAffine in the EVM would only be used for KZG_POINT_PRECOMPILE which is always compiled in (no dead-code elimination) as once you import the eth_evm_precompile file, everything is tagged exportc. And in-practice it is bounded to 16 blobs (or 64 with PeerDAS).

In any case, an optimization, using a destination buffer as temporary storage, is mentioned in the comment to remove the stack allocation which is not materialized in the code:

func batchAffine*[N: static int, F, G](
affs: var array[N, EC_ShortW_Aff[F, G]],
projs: array[N, EC_ShortW_Prj[F, G]]) {.inline.} =
batchAffine(affs.asUnchecked(), projs.asUnchecked(), N)
func batchAffine*[F, G](
affs: ptr UncheckedArray[EC_ShortW_Aff[F, G]],
jacs: ptr UncheckedArray[EC_ShortW_Jac[F, G]],
N: int) {.noInline, tags:[Alloca], meter.} =
# Algorithm: Montgomery's batch inversion
# - Speeding the Pollard and Elliptic Curve Methods of Factorization
# Section 10.3.1
# Peter L. Montgomery
# https://www.ams.org/journals/mcom/1987-48-177/S0025-5718-1987-0866113-7/S0025-5718-1987-0866113-7.pdf
# - Modern Computer Arithmetic
# Section 2.5.1 Several inversions at once
# Richard P. Brent and Paul Zimmermann
# https://members.loria.fr/PZimmermann/mca/mca-cup-0.5.9.pdf
# To avoid temporaries, we store partial accumulations
# in affs[i].x and whether z == 0 in affs[i].y
var zeroes = allocStackArray(SecretBool, N)
affs[0].x = jacs[0].z
zeroes[0] = affs[0].x.isZero()
affs[0].x.csetOne(zeroes[0])

Note that while we can solve the batchAffine mentioned, we cannot do the same for batchInversion:

func batchInv*[F](
dst: ptr UncheckedArray[F],
elements: ptr UncheckedArray[F],
N: int) {.noInline.} =
## Batch inversion
## If an element is 0, the inverse stored will be 0.
var zeros = allocStackArray(SecretBool, N)
zeroMem(zeros, N)
var acc: F
acc.setOne()
for i in 0 ..< N:
# Skip zeros
zeros[i] = elements[i].isZero()
var z = elements[i]
z.csetOne(zeros[i])
dst[i] = acc
if i != N-1:
acc.prod(acc, z, lazyReduce = true)
else:
acc.prod(acc, z, lazyReduce = false)
acc.inv()
for i in countdown(N-1, 0):
# Extract 1/elemᵢ
dst[i] *= acc
dst[i].csetZero(zeros[i])
# next iteration
var eli = elements[i]
eli.csetOne(zeros[i])
acc.prod(acc, eli, lazyReduce = true)

but I don't think it's called for any EVM precompile.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions