Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion beacon_chain/nimbus_beacon_node.nim
Original file line number Diff line number Diff line change
Expand Up @@ -1713,6 +1713,16 @@ proc reconstructDataColumns(node: BeaconNode, slot: Slot) =
warn "Failed to get the current slot head"
return

let
currentCgc = node.dataColumnQuarantine.custodyColumns.lenu64
nonSupernode =
currentCgc > node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS div 2 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boundary condition seems to create an odd hole for node.dataColumnQuarantine.custodyColumns.lenu64 == node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS div 2. It won't be rejected by the earlier

  if node.dataColumnQuarantine.custodyColumns.lenu64 <
      node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS div 2:
    return

and it won't trigger nonSupernode status here (64 > 64 and 64 < 128 == false).

Maybe >= was meant? But see comment below for a slightly broader point, related to this.

currentCgc < node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS

if not(node.config.peerdasSupernode) or
nonSupernode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> len('  if not(node.config.peerdasSupernode) or nonSupernode:')
55

so not really sure why it's two lines.

But, beyond the most superficial cosmetics, earlier in the function there's already a

if node.dataColumnQuarantine.custodyColumns.lenu64 <
node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS div 2:
return

which is effectively part of this logic too. I'm not sure why these two chunks of custody column checking should exist separately, and it's simpler it they're combined: to achieve a similar effect, check once for not(node.config.peerdasSupernode) or node.dataColumnQuarantine.custodyColumns.lenu64 < node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS. Nothing about this function's current behavior changes between custody count of 63 and 65 (why 64? is that intentional?), it's just ruling everything under 128 out.

I'm not really sure in what circumstances node.config.peerdasSupernode could hold though and have also node.dataColumnQuarantine.custodyColumns.lenu64 < node.dag.cfg.NUMBER_OF_CUSTODY_GROUPS.

return

withBlck(blck):
when consensusFork >= ConsensusFork.Fulu:
let maxColCount = node.dag.cfg.NUMBER_OF_COLUMNS
Expand Down Expand Up @@ -1741,7 +1751,8 @@ proc reconstructDataColumns(node: BeaconNode, slot: Slot) =

# Reconstruct columns
let recovered = recover_cells_and_proofs_parallel(
node.batchVerifier[].taskpool, columns).valueOr:
node.batchVerifier[].taskpool, columns,
node.dag.cfg.time.SECONDS_PER_SLOT.int64).valueOr:
error "Data column reconstruction incomplete"
return
let rowCount = recovered.len
Expand Down
22 changes: 15 additions & 7 deletions beacon_chain/spec/peerdas_helpers.nim
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

# Uncategorized helper functions from the spec
import
chronos, chronicles, results, taskpools,
chronos, chronicles, results, taskpools, times,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

times is a stdlib library. should be on a separate line and prefixed by std/:

import
  std/times,
  chronos, chronicles, results, taskpools,

eth/p2p/discoveryv5/node,
kzg4844/kzg,
ssz_serialization/[
Expand Down Expand Up @@ -151,7 +151,8 @@ proc recoverCellsAndKzgProofsTask(cellIndices: seq[CellIndex],

proc recover_cells_and_proofs_parallel*(
tp: Taskpool,
dataColumns: seq[ref fulu.DataColumnSidecar]):
dataColumns: seq[ref fulu.DataColumnSidecar],
slotDuration: int64):
Result[seq[CellsAndProofs], cstring] =
## This helper recovers blobs from the data column sidecars parallelly
if dataColumns.len == 0:
Expand All @@ -170,13 +171,15 @@ proc recover_cells_and_proofs_parallel*(
res = newSeq[CellsAndProofs](blobCount)

let startTime = Moment.now()
const reconstructionTimeout = 2.seconds
let reconstructionTimeout =
(initDuration(nanoseconds = slotDuration * 100_000_000)).inNanoseconds()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100_000_000 is off by a factor of 10.

More generally, initDuration supports milliseconds too, so there's no particular reason to use nanoseconds here.

inNanoSeconds just converts all this back into an int64 anyway, so this whole line is a roundbout method of multiplying by what's supposed to be 1_000_000_000 which invokes initDuration/inNanoseconds for no obvious reason.

That is, one should be able to do something like

  let reconstructionTimeout = slotDuration * 1000

or similar, rather than round-tripping through these std/times types. Also, as SLOT_DURATION_MS in milliseconds will come in, this will have to support milliseconds, so that's probably the correct granularity. Seconds will have insufficient granularity (already in some cases) and microseconds pointlessly large to handle, and the times library does like to operate in integers.

While the two

    if (now - startTime).nanoseconds > reconstructionTimeout

lines use nanoseconds, they could use milliseconds just as easily without the evidently easy-to-typo large numbers.



# ---- Spawn phase with time limit ----
for blobIdx in 0 ..< blobCount:
let now = Moment.now()
if (now - startTime) > reconstructionTimeout:
debug "PeerDAS reconstruction timed out while preparing columns",
if (now - startTime).nanoseconds > reconstructionTimeout:
debug "PeerDAS column reconstruction timed out while preparing columns",
spawned = pendingFuts.len, total = blobCount
break # Stop spawning new tasks

Expand All @@ -191,8 +194,8 @@ proc recover_cells_and_proofs_parallel*(
# ---- Sync phase ----
for i in 0 ..< pendingFuts.len:
let now = Moment.now()
if (now - startTime) > reconstructionTimeout:
debug "PeerDAS reconstruction timed out",
if (now - startTime).nanoseconds > reconstructionTimeout:
debug "PeerDAS column reconstruction timed out while preparing columns",
completed = i, totalSpawned = pendingFuts.len
return err("Data column reconstruction timed out")

Expand All @@ -207,6 +210,11 @@ proc recover_cells_and_proofs_parallel*(

ok(res)

proc recover_cells_and_proofs_parallel*(
tp: Taskpool,
dataColumns: seq[ref fulu.DataColumnSidecar]):
Result[seq[CellsAndProofs], cstring] =
recover_cells_and_proofs_parallel(tp, dataColumns, 10000'i64)

proc assemble_data_column_sidecars*(
signed_beacon_block: fulu.SignedBeaconBlock | gloas.SignedBeaconBlock,
Expand Down
Loading