Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rpc consistency proposal: Wait for up to a number of blocks specified in configuration before failing #2666

Merged
merged 29 commits into from
Feb 17, 2025

Conversation

acerone85
Copy link
Contributor

@acerone85 acerone85 commented Feb 4, 2025

I am using a rebase heavy workflow to handle stacked PR. If you want to push updates to this PR and make sure that the whole PR is consistent with the target branch, please do the following instead of git merge 1897-rpc-consistency-proposal:

git checkout ac@rpc-consistency-proposal-wait-for-block
git fetch origin
git rebase --onto origin/1897-rpc-consistency-proposal 1897-rpc-consistency-proposal
# Fix conflicts if necessary
git branch -f 1897-rpc-consistency-proposal origin/1897-rpc-consistency-proposal
git push -f origin

Linked Issues/PRs

Follow up on #1897

This PR implements a mechansim that allows a node operator to configure a tolerance level, in number of blocks, for the required fuel_block_height extension. When checking that the current_fuel_block_height is at least the required_fuel_block_height, the node will check first whether the current_fuel_block_height is at most required_fuel_block_height_tolerance "too far away" from the required_fuel_block_height. If that is the case, the node will check every second if the precondition has been met before fulfilling the request. Otherwise, the request will fail because the precondition has not been met.

Because the HTTP server of the node has a timeout layer, requests by a node that has lost synchronisation with the network will eventually fail.

Description

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

@acerone85 acerone85 force-pushed the ac@rpc-consistency-proposal-wait-for-block branch from cf942ca to ebe0ffa Compare February 4, 2025 15:12
Comment on lines 130 to 132
while current_block_height < *required_block_height {
tokio::time::sleep(SLEEP_DURATION).await;
match view.latest_block_height() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could subscribe for the height that we need and notify all dependent tasks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, without #2646 it is very expensive operation, and do it for each query can be a lot. Before releasing RPC consistancy, we need merge this optimization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we subscribe to a channel that broadcasts the height when it is updated in the offchain worker, would that still require #2646?

Ok(block_height) => {
current_block_height = block_height;
}
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you wait for the required height, the view that you have inside of the context is old. We need to update it.

I think it would be better if we combined ViewExtension and RequiredFuelBlockHeightInner together, where you create a view only if you reach the required height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a different approach where we subscribe to changes to the block height in the extension. It does not require merging the ViewExtension and RequiredFuelBlockHeightInner, curious to hear your opinion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the ReadView that was created by the ViewExtension. If required height is 9 blocks higher than the current height, ReadView will be 9 blocks old.

You need to actualize the view to be at required_block_height. It is where you need to create the view second time, and it is where merging ViewExtension and RequiredFuelBlockHeightInner can help to avoid creation of 2 views.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, thanks for explaining.
I had a look, and merging the two views will totally help. Will amend

CHANGELOG.md Outdated
@@ -9,6 +9,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
### Changed
- [2473](https://github.com/FuelLabs/fuel-core/pull/2473): Graphql requests and responses make use of a new `extensions` object to specify request/response metadata. A request `extensions` object can contain an integer-valued `required_fuel_block_height` field. When specified, the request will return an error unless the node's current fuel block height is at least the value specified in the `required_fuel_block_height` field. All graphql responses now contain an integer-valued `current_fuel_block_height` field in the `extensions` object, which contains the block height of the last block processed by the node.

### Added
- [2666](https://github.com/FuelLabs/fuel-core/pull/2666) Added `--required-fuel-block-height-tolerance` CLI arg (default: `10`). Requests with `required_fuel_block_height` set will wait if within tolerance of `current_fuel_block_height` instead of failing.

## [Version 0.41.5]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you need to merge with master, because the latest version is 0.41.6=)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased

@@ -66,6 +66,13 @@ pub struct GraphQLArgs {
#[clap(long = "api-request-timeout", default_value = "30s", env)]
pub api_request_timeout: humantime::Duration,

#[clap(
long = "required-fuel-block-height-tolerance",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long = "required-fuel-block-height-tolerance",
long = "graphql_required-fuel-block-height-tolerance",

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would be nice to have a comment to describe the functionality behind this CLI argument.

Ok(block_height) => {
current_block_height = block_height;
}
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the ReadView that was created by the ViewExtension. If required height is 9 blocks higher than the current height, ReadView will be 9 blocks old.

You need to actualize the view to be at required_block_height. It is where you need to create the view second time, and it is where merging ViewExtension and RequiredFuelBlockHeightInner can help to avoid creation of 2 views.

Comment on lines 152 to 150
self.block_height_subscription_handle
.clone()
.wait_for_block_height(*required_block_height)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make wait_for_block_height to work with &self to avoid cloning here

// If the node is lagging behind w.r.t. the required fuel block height,
// but within an acceptable interval, wait until the current block height
// matches the required fuel block height.
// TODO: Add a timeout to prevent the node from waiting indefinitely.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add a timeout in 1 minute + number of blocks(in seconds) that you need to wait

height: BlockHeight,
) -> anyhow::Result<()> {
loop {
let current_height = *self.rx_block_height.borrow_and_update();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do just borrow, it still should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to do that, but that will introduce busy waiting as we cannot await on the value being changed on the sender side. Is that okay?

&mut self,
height: BlockHeight,
) -> anyhow::Result<()> {
loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is very simple to implement it with the tokio::watch, it is still expensive from an execution perspective.

Let's imagine the current height is 10. The required height for 10k requests is 20.

10 times, each task related to the request will wake up; it is 100k wakeups that check the state(it is read of the RWLock) of the watch and wait again.

If you have the list of subscribers for each block height, we can avoid unnecessary wakeups. And it should be very easy to maintain this list via one mutex. You can have BTreeMap<BlockHeight, Vec<tokio::sync::oneshot::Sender<()>>>, where you register all listeners, and on each block you just process most left listeners.

pub async fn wait_for_block_height(
&mut self,
height: BlockHeight,
) -> anyhow::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return the new block height from wait_for_block_height. In this case you don't need latest_seen_block_height

@@ -745,5 +779,7 @@ where
da_compression_config,
continue_on_error,
base_asset_id: *consensus_parameters.base_asset_id(),
// TODO: Should we recover the block height from the DB?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to get the block height from the off_chain_database

@acerone85 acerone85 force-pushed the ac@rpc-consistency-proposal-wait-for-block branch 2 times, most recently from a3890ba to 7496006 Compare February 11, 2025 13:54
Base automatically changed from 1897-rpc-consistency-proposal to master February 12, 2025 20:43
@acerone85 acerone85 force-pushed the ac@rpc-consistency-proposal-wait-for-block branch from 9821876 to 331976b Compare February 13, 2025 13:10
Comment on lines 79 to 87
/// The minimum time that the node will wait to catch up to the required block height
/// of a graphql request. The actual time may be longer and proportional to how
/// much the node is behind the required block height.
#[clap(
long = "graphql-required-block-height-min-timeout-seconds",
default_value = "10",
env
)]
pub required_fuel_block_height_min_timeout: u32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this CLI argument is overkill, it is very hard to understand what it is doing, if you don't look into the code. And even then, it is very hard to trigger this timeout

Comment on lines 79 to 87
/// The minimum time that the node will wait to catch up to the required block height
/// of a graphql request. The actual time may be longer and proportional to how
/// much the node is behind the required block height.
#[clap(
long = "graphql-required-block-height-min-timeout-seconds",
default_value = "10",
env
)]
pub required_fuel_block_height_min_timeout: u32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The minimum time that the node will wait to catch up to the required block height
/// of a graphql request. The actual time may be longer and proportional to how
/// much the node is behind the required block height.
#[clap(
long = "graphql-required-block-height-min-timeout-seconds",
default_value = "10",
env
)]
pub required_fuel_block_height_min_timeout: u32,
/// The minimum time that the node will wait to catch up to the required block height
/// of a graphql request. The actual time may be longer and proportional to how
/// much the node is behind the required block height.
#[clap(
long = "graphql-required-block-height-min-timeout-seconds",
default_value = "30s",
env
)]
pub required_fuel_block_height_min_timeout: humantime::Duration,

Comment on lines 52 to 55
/// Minimum time to wait before dropping the request if the node is lagging behind the required
/// fuel block height. The actual timeout in seconds is `required_fuel_block_height_min_timeout + lag`,
/// where `lag` is the number of blocks the node is lagging behind the required fuel block height.
pub required_fuel_block_height_min_timeout: Duration,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed to add lag to avoid making the required_fuel_block_height_min_timeout configurable via the CLI=D If you plan to have CLI argument, then we need to remove lag for simplicity of the logic. If you remove the CLI argument, then we need to have a lag + some random timeout like 1 minute for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to have lag + random timeout, wondering if 1 minute is too high (the default timeout for all RPC requests is 30 seconds IIRC)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stick with CLI argument without lag, it allows us to test timeout logic(it is what you did and I think it makes sense):)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just to confirm we remove the lag altogether and we always wait for a timeout?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@xgreenx xgreenx closed this Feb 14, 2025
@xgreenx xgreenx deleted the ac@rpc-consistency-proposal-wait-for-block branch February 14, 2025 05:16
@xgreenx xgreenx restored the ac@rpc-consistency-proposal-wait-for-block branch February 14, 2025 05:42
@xgreenx xgreenx reopened this Feb 14, 2025
@xgreenx
Copy link
Collaborator

xgreenx commented Feb 14, 2025

Sorry, clicked wrong button=D Didn't plan to close it=)

xgreenx
xgreenx previously approved these changes Feb 14, 2025
Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The change looks very nice

Copy link
Contributor

@rafal-ch rafal-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, will continue later.
No issues so far.

CHANGELOG.md Outdated
@@ -12,12 +12,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

### Added
- [2150](https://github.com/FuelLabs/fuel-core/pull/2150): Upgraded `libp2p` to `0.54.1` and introduced `ConnectionLimiter` to limit pending incoming/outgoing connections.
- [2666](https://github.com/FuelLabs/fuel-core/pull/2666) Added `--graphql-block-height-tolerance` (default: `10`) and `--graphql-block-height-min-timeout-secs` (default: `30`) CLI args. Requests with `required_fuel_block_height` set will wait if within tolerance of `current_fuel_block_height` instead of failing. The number of seconds that the node will wait is proportional to the number of blocks `lag-behind-blocks` that the node is lagging behind the required fuel block height, and equal to `graphql-block-height-min-timeout-secs + lag-behind-blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can be more concise, because we don't need to treat changelog as a feature documentation :)

Suggested change
- [2666](https://github.com/FuelLabs/fuel-core/pull/2666) Added `--graphql-block-height-tolerance` (default: `10`) and `--graphql-block-height-min-timeout-secs` (default: `30`) CLI args. Requests with `required_fuel_block_height` set will wait if within tolerance of `current_fuel_block_height` instead of failing. The number of seconds that the node will wait is proportional to the number of blocks `lag-behind-blocks` that the node is lagging behind the required fuel block height, and equal to `graphql-block-height-min-timeout-secs + lag-behind-blocks.
Added two new CLI arguments to control the GraphQL queries consistency: `--graphql-block-height-tolerance` (default: `10`) and `--graphql-block-height-min-timeout-secs` (default: `30`). If a request requires a specific block height and the node is slightly behind, it will wait instead of failing. The wait time increases with the block lag, starting at the minimum timeout.

use tokio::sync::oneshot;

#[derive(Default)]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@acerone85 acerone85 force-pushed the ac@rpc-consistency-proposal-wait-for-block branch from 98b0ca7 to 6329c4f Compare February 14, 2025 17:05
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me. I think the block height subscription can be simplified.

@acerone85 acerone85 merged commit c0975a0 into master Feb 17, 2025
33 checks passed
@acerone85 acerone85 deleted the ac@rpc-consistency-proposal-wait-for-block branch February 17, 2025 00:16
MitchTurner pushed a commit that referenced this pull request Feb 18, 2025
… in configuration before failing (#2666)

## Linked Issues/PRs
<!-- List of related issues/PRs -->

Follow up on #1897 

This PR implements a mechansim that allows a node operator to configure
a tolerance level, in number of blocks, for the required
fuel_block_height extension. When checking that the
`current_fuel_block_height` is at least the
`required_fuel_block_height`, the node will check first whether the
`current_fuel_block_height` is at most
`required_fuel_block_height_tolerance` "too far away" from the
`required_fuel_block_height`. If that is the case, the node will check
every second if the precondition has been met before fulfilling the
request. Otherwise, the request will fail because the precondition has
not been met.

Because the HTTP server of the node has a timeout layer, requests by a
node that has lost synchronisation with the network will eventually
fail.

## Description
<!-- List of detailed changes -->

## Checklist
- [ ] Breaking changes are clearly marked as such in the PR description
and changelog
- [ ] New behavior is reflected in tests
- [ ] [The specification](https://github.com/FuelLabs/fuel-specs/)
matches the implemented behavior (link update PR if changes are needed)

### Before requesting review
- [ ] I have reviewed the code myself
- [ ] I have created follow-up issues caused by this PR and linked them
here

### After merging, notify other teams

[Add or remove entries as needed]

- [ ] [Rust SDK](https://github.com/FuelLabs/fuels-rs/)
- [ ] [Sway compiler](https://github.com/FuelLabs/sway/)
- [ ] [Platform
documentation](https://github.com/FuelLabs/devrel-requests/issues/new?assignees=&labels=new+request&projects=&template=NEW-REQUEST.yml&title=%5BRequest%5D%3A+)
(for out-of-organization contributors, the person merging the PR will do
this)
- [ ] Someone else?

---------

Co-authored-by: green <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants