Skip to content

Conversation

sigurpol
Copy link
Contributor

@sigurpol sigurpol commented Oct 10, 2025

Pre-migration:

  • RC snapshot: taken at block with state transition to AccountsMigrationInit
  • AH snapshot: taken at block with state transition to DataMigrationOngoing + 1

Post-migration:

  • RC snapshot: taken at block with state transition to CoolOff
  • AH snapshot: taken at block with state transition to CoolOff

The monitor process just monitors RC and AH migration state and saves block hashes. Snapshots are taken after both AH and RC are in CoolOff migration state.
Starting from a34356e, instead of WebSocket constant monitoring, we poll via RPC periodically (every 5min) until migration is done, and then we take the 4 snapshots.

Driven-by: update runtime to include CoolOff as state transition for AH coming from runtime polkadot-fellows/runtimes#952 (still to be merged!) -> UPDATE: now runtime points to polkadot-fellows/runtimes#956 (current commit: 3039b0c88ace9471e75a6da4efd378817ee56e34)

TODO: update to latest and greatest runtime once 956 is merged

How to test

CI

Test on CI via usual AHM flow (once we solve the issue for which we can't run AHM flow on CI!).

How to test manually

  1. take pre-db from CI AHM flow with proper runtime
  2. from terminal 1: just zb spawn polkadot-bite and wait for network to be up
  3. from terminal 2: just zb monitor-snapshots polkadot-bite polkadot
  4. from terminal 3: just zb start-migration polkadot-bite
    (or ofc, do 1. 2. and 3. from a single terminal with & if you prefer)

Wait for migration to end, at the end, the monitor script should take the 4 snaps sequentially.

Pre-migration:
  - RC snapshot: taken at block with state transition to AccountsMigrationInit
  - AH snapshot: taken at block with state transition to DataMigrationOngoing + 1
Post-migration:
  - RC snapshot: taken at block with state transition to CoolOff
  - AH snapshot: taken at block with state transition to CoolOff

Driven-by: update runtime to include CoolOff as state transition for AH.
@sigurpol sigurpol marked this pull request as draft October 10, 2025 17:50
@sigurpol sigurpol marked this pull request as ready for review October 10, 2025 22:02
@sigurpol
Copy link
Contributor Author

@sigurpol sigurpol requested review from muharem and pepoviola October 10, 2025 22:12
@sigurpol
Copy link
Contributor Author

sigurpol commented Oct 14, 2025

let's see: https://github.com/paritytech/ahm-dryrun/actions/runs/18419444332 🍿

   2025-10-10 22:42:37 [debug]: RC Finalized Block #30471701: 0xea0c8876856d9782c96fd208f6077949cf483844f1b2f0a7561026c8a1d6ca56 {
    "service": "ahm"
  }
  2025-10-10 22:42:37 [info]: RC migration finished {
    "service": "ahm",
    "blockNumber": 30471701
  }
  2025-10-10 22:42:37 [debug]: AH Finalized Block #11193966: 0x877687cda9411ed3149375effe8879b34aa9b00e81924a32b1f360f240d788e1 {
    "service": "ahm"
  }
  2025-10-10 22:42:37        RPC-CORE: getStorage(key: StorageKey, at?: BlockHash): StorageData:: Unable to decode storage ahMigrator.ahMigrationStage:: createType(PalletAhMigratorMigrationStage):: {"_enum":{"Pending":"Null","DataMigrationOngoing":"Null","CoolOff":"{\"endAt\":\"u32\"}","MigrationDone":"Null"}}:: Decoded input doesn't match input, received 0x02 (1 bytes), created 0x0200000000 (5 bytes)
  file:///__w/ahm-dryrun/ahm-dryrun/node_modules/@polkadot/rpc-core/bundle.js:407
              throw new Error(`Unable to decode storage ${key.section || 'unknown'}.${key.method || 'unknown'}:${entryNum}: ${error.message}`);
                    ^
  
  Error: Unable to decode storage ahMigrator.ahMigrationStage:: createType(PalletAhMigratorMigrationStage):: {"_enum":{"Pending":"Null","DataMigrationOngoing":"Null","CoolOff":"{\"endAt\":\"u32\"}","MigrationDone":"Null"}}:: Decoded input doesn't match input, received 0x02 (1 bytes), created 0x0200000000 (5 bytes)
      at RpcCore._newType (file:///__w/ahm-dryrun/ahm-dryrun/node_modules/@polkadot/rpc-core/bundle.js:407:19)
      at RpcCore._formatStorageData (file:///__w/ahm-dryrun/ahm-dryrun/node_modules/@polkadot/rpc-core/bundle.js:341:21)
      at RpcCore._formatOutput (file:///__w/ahm-dryrun/ahm-dryrun/node_modules/@polkadot/rpc-core/bundle.js:306:25)
      at RpcCore._formatResult (file:///__w/ahm-dryrun/ahm-dryrun/node_modules/@polkadot/rpc-core/bundle.js:168:20)
      at callWithRegistry (file:///__w/ahm-dryrun/ahm-dryrun/node_modules/@polkadot/rpc-core/bundle.js:189:25)
      at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

I believe that the error in decoding AH migration state is somehow expected while testing vs kusama and won't happen when we test vs polkadot.
Looking at .github/workflows/zombie-bite-common.yml:158:

zombie-bite bite -r $NETWORK --rc-override $RC_OVERRIDE --ah-override $AH_OVERRIDE -d $ZOMBIE_BITE_BASE_PATH

this command

  1. Connects to the live Kusama network at the time of CI execution
  2. Forks the chain state from the current finalized block
  3. Applies the new runtime WASM (--rc-override and --ah-override)

Since Kusama currently has AhMigrationStage::MigrationDone in storage (encoded as 0x02), but the new runtime (as per polkadot-fellows/runtimes#952) expects that byte position to decode as CoolOff{end_at} (5 bytes), we get a decode error.

Now, since we don't care about testing migration on Kusama anymore there is no point in having a migration for Ah migration stage there. And I expect the issue not to happen on Polkadot since we won't have any AhMigration stage with the old format on Asset Hub when we bite the network.

@muharem @pepoviola @ggwpez @kianenigma please confirm that my assumption above makes sense.

That said, on CI I am still not able to test AHM flow on Polkadot e.g. see https://github.com/paritytech/ahm-dryrun/actions/runs/18478441568/job/52648138711 where my job got stuck.

@pepoviola
Copy link
Contributor

Hi @sigurpol, I think you are correct about the decoding error. I will check why the migration stuck in polkadot and ping you for verify.
Thx!

@muharem
Copy link

muharem commented Oct 14, 2025

@sigurpol yes, your understanding is right

@sigurpol
Copy link
Contributor Author

Once polkadot-fellows/runtimes#952 is merged (hopefully very soon), we can update this PR to point to latest main runtime and hopefully merge it

@sigurpol
Copy link
Contributor Author

@muharem maybe one thing worth mentioning.
I was testing a migration and trying to capture snasphots via monitoring scripts using as runtime muharem-ahm-move-finish-stage branch (a3d3563c2c907d5ab4d4495d67bc44ec562d923d) but as you see below, AH went directly from DataMigrationOnGoing to MigrationDone without passing through CoolOff stage unlike RC. I have run the migration via just zb perform-migration polkadot-bite, where we are relying on default values for RC scheduleMigration so for cool-off after:2 on RC, and AH should be informed by RC to enter in CoolOff stage and leaves that monitoring RC block at end_at.
But as you see from the logs below, AH doesn't seem to enter in CoolOff phase.
Now, probably worth trying with longer CoolOff period but have you tried the migration yourself, did you see AH entering in CoolOff stage? I am expecting to take snapshot when AH is in CoolOff stage so if that doesn't happen, we end with no post-migration snapshot...

2025-10-16 01:56:45 [debug]: RC migration in progress {
  "service": "ahm",
  "stage": {
    "CoolOff": {
      "endAt": "28,207,185"
    }
  }
}
2025-10-16 01:56:45 [debug]: AH migration in progress {
  "service": "ahm",
  "stage": "DataMigrationOngoing"
}
2025-10-16 01:56:53 [debug]: RC Finalized Block #28207184: 0x407194cc70112de50d7696f029f7786518c4927f69fe1eb2dc3a7d9c43f274c7 {
  "service": "ahm"
}
2025-10-16 01:56:53 [debug]: AH Finalized Block #9981488: 0xd69d2f110de2707044d5ef010de11c66944701e0762934b146e109d1e0f7d266 {
  "service": "ahm"
}
2025-10-16 01:56:53 [debug]: RC migration in progress {
  "service": "ahm",
  "stage": {
    "CoolOff": {
      "endAt": "28,207,185"
    }
  }
}
2025-10-16 01:56:53 [debug]: AH migration in progress {
  "service": "ahm",
  "stage": "DataMigrationOngoing"
}
2025-10-16 01:56:57 [debug]: RC Finalized Block #28207185: 0xe29f141971649c3dbbc6749224b49636725a579bcc1b5fbdfe2189d377feed12 {
  "service": "ahm"
}
2025-10-16 01:56:57 [debug]: RC migration in progress {
  "service": "ahm",
  "stage": "SignalMigrationFinish"
}
2025-10-16 01:56:57 [debug]: AH Finalized Block #9981489: 0x29c5cf9e2a8d550340ed150d70c93883bd1e36d969fd46fa261b137d6ec8acd0 {
  "service": "ahm"
}
2025-10-16 01:56:57 [debug]: AH migration in progress {
  "service": "ahm",
  "stage": "DataMigrationOngoing"
}
2025-10-16 01:57:05 [debug]: RC Finalized Block #28207186: 0xa7eccd4e86417acfc042f364650c272904eb108d4fe70f84939f849bdaa8d0a2 {
  "service": "ahm"
}
2025-10-16 01:57:05 [info]: RC migration finished {
  "service": "ahm",
  "blockNumber": 28207186
}
2025-10-16 01:57:05 [debug]: AH Finalized Block #9981490: 0xf05c2f9dce81c85e67259445696da98b5c55142ff603dd8d2bd7228a89272141 {
  "service": "ahm"
}
2025-10-16 01:57:05 [debug]: AH migration in progress {
  "service": "ahm",
  "stage": "DataMigrationOngoing"
}
2025-10-16 01:57:09 [debug]: AH Finalized Block #9981491: 0x573a54ecd60b404d5cdd3b0e85fdaa3bda23e274279bd5e4614d44eb723334a3 {
  "service": "ahm"
}
2025-10-16 01:57:09 [debug]: AH migration in progress {
  "service": "ahm",
  "stage": "DataMigrationOngoing"
}
2025-10-16 01:57:17 [debug]: AH Finalized Block #9981492: 0x81b54198c97abc8f3c839fe202d8f0389a2f354742ad1b49935685888d7bdc04 {
  "service": "ahm"
}
2025-10-16 01:57:17 [info]: AH migration finished {
  "service": "ahm",
  "blockNumber": 9981492
}

@sigurpol
Copy link
Contributor Author

sigurpol commented Oct 16, 2025

[DISCARDED - see later on]
one idea: the issue might happen around here https://github.com/polkadot-fellows/runtimes/blob/a3d3563c2c907d5ab4d4495d67bc44ec562d923d/pallets/ah-migrator/src/lib.rs#L1150-L1156

since when I performed the migration locally

  1. cool_off_end_at is calculated on RC at block 28207183
  2. The default cool-off is { after: 2 } = 2 blocks = 28207185
  3. RC informs AH via XCM but by the time XCM reaches AH and is executed, few RC blocks might have passed
  4. if The RC block number checked in on_initialize is already ≥ 28207185, AH immediately transitions from CoolOffMigrationDone without spending any actual time in CoolOff

Increasing cool-off period would solve the issue, I believe (and it's how we will run the migration for real...)

@muharem
Copy link

muharem commented Oct 16, 2025

@sigurpol on which branch the runtimes?

@sigurpol
Copy link
Contributor Author

@sigurpol on which branch the runtimes?

As I was mentioning above: muharem-ahm-move-finish-stage branch (a3d3563c2c907d5ab4d4495d67bc44ec562d923d)

@sigurpol
Copy link
Contributor Author

ok as discussed with @muharem (@pepoviola please confirm 🙏 ): it is normal AH didn't went to CoolOff phase since the bite step came from CI from a runtime w/o changes coming from my local runtime aligne with polkadot-fellows/runtimes#952. What I did was using the pre-db from CI and spawn from there where locally I have a proper runtime with Muharem's changes.
The correct flow instead is to bite live network from my chain and/or use pre-db from CI pointing to 952 (or better 956) and then spawn, run migration and monitor to get snapshots. Case is closed 😄

@sigurpol
Copy link
Contributor Author

Now running AHM flow in CI with new runtime (aligned to latest from polkadot-fellows/runtimes#956 (current commit: 3039b0c88ace9471e75a6da4efd378817ee56e34). I expect CI to fails for the issue with collator getting stuck in premigration @pepoviola is investigating but at least I hope to get a valid pre-db state so I can then manually spawn + migrate + take snapshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants