Skip to content

Better safeguard against incompatible builtin handles between dialects #15961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

clonker
Copy link
Member

@clonker clonker commented Mar 20, 2025

Inspired from the remarks by @cameel in #15952 (comment), I have refactored EVMDialect a bit so that there is a class for generating the builtin vector that will automatically insert blanks if conditions aren't met and not give direct access to the underlying raw datastructure.
Also, I have added a unit test which takes the default dialect (without EOF) as well as the latest dialect with EOF and tests whether the populated builtin functions with their handles contained in the respective dialects are preserved when compared to all other dialects with and without object access.

Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a step in a good direction, but if we're refactoring these builtins, there's a way to do it in a way that will more completely eliminate the possibility of introducing these kinds of mistakes. We should simply separate they declarations with the logic that defines them. See comments below for details.

Comment on lines 313 to 316
if (!_eofVersion.has_value())
{
builtins.emplace_back(createIfObjectAccess("datasize", 1, 1, SideEffects{}, ControlFlowSideEffects{}, {LiteralKind::String}, [](
builtins.addIfObjectAccess(_objectAccess, "datasize", 1, 1, SideEffects{}, ControlFlowSideEffects{}, {LiteralKind::String}, [](
FunctionCall const& _call,
Copy link
Member

@cameel cameel Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that now some conditions are handled here and some in Builtins still bothers me a bit. I think it would be best if we made the whole thing more declarative. I.e. have one place that declares the builtins along with their scopes (i.e. EVM/EOF version range and state of object access). Then have another place that loops over all builtins that decides whether to actually define them based on their scopes.

The definition loop would then be the single place that decides whether the opcode should be skipped and whether to leave an empty spot in m_functions.

This would be a more comprehensive solution for the problem, because we would no longer touch that definition loop when adding new opcodes. We would only touch declarations and scopes without the risk of accidentally breaking the logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, not sure "scope" is the best way to call it, but I could not think of anything better so far. Could be just "conditions", but that sounds very non-specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it for a while, I think I do like Scope!

Comment on lines 217 to 232
!(_opcode >= evmasm::Instruction::DUP1 && _opcode <= evmasm::Instruction::DUP16) &&
!(_opcode >= evmasm::Instruction::SWAP1 && _opcode <= evmasm::Instruction::SWAP16) &&
!isPushInstruction(_opcode) &&
_opcode != evmasm::Instruction::JUMP &&
_opcode != evmasm::Instruction::JUMPI &&
_opcode != evmasm::Instruction::JUMPDEST &&
_opcode != evmasm::Instruction::DATALOADN &&
_opcode != evmasm::Instruction::EOFCREATE &&
_opcode != evmasm::Instruction::RETURNCONTRACT &&
_opcode != evmasm::Instruction::RJUMP &&
_opcode != evmasm::Instruction::RJUMPI &&
_opcode != evmasm::Instruction::CALLF &&
_opcode != evmasm::Instruction::JUMPF &&
_opcode != evmasm::Instruction::DUPN &&
_opcode != evmasm::Instruction::SWAPN &&
_opcode != evmasm::Instruction::RETF &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For builtins based on opcodes, this part of the condition is the declarative part. I'd move it to a separate function that returns a boolean. Then when adding a new opcode we'd only really modify that function.

In fact, I'd actually create more than one function. I think we should make it more semantic by having smaller functions expressing specific properties the excluded opcodes satisfy:

  • All the DUP/SWAP/PUSH variants: low-level stack manipulation opcodes.
    • We should also use SemanticInformation::isSwapInstruction()/SemanticInformation::isDupInstruction() here.
  • All the JUMP variants and RETF/CALLF: low-level control flow opcodes
  • DATALOADN, EOFCREATE, RETURNCONTRACT: these are actually exposed, but not directly. We have builtins replacing them. We can define a separate function for them, but a nicer alternative would be to be able to declare for a builtin that it replaces an instruction and have the definition logic exclude that instruction automatically.

Executing these functions and EVMVersion::hasOpcode() would then be a fixed part of the definition logic and would not have to be adjusted. It would be the equivalent of checking the scope of a builtin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have separated out the builtin function definitions into its own class which then can serve as reference database for all EVM dialects. They can iterate over it and declare that they have a specific function by simply referring to it.
That should also reduce memory pressure a bit, as there's only ever one copy of a builtin function (ignoring these no-output builtins).
So in principle the dialect is then simply a bool field (or implicitly convertible to one at least) over all possible builtins.


/// Make sure to only add builtins in a way that is consistent over EVM versions. If the order depends on the
/// EVM version - which can easily happen using conditionals -, different dialects' builtin handles
/// become inherently incompatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, the compatibility between dialects with and without object access should be more prominently documented in the header. I'd mention it for strictAssemblyForEVMObjects() and maybe also for the whole EVMDialect class.

Copy link
Member

@cameel cameel Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH we guarantee so much compatibility between them that I'm not sure we should even be calling these distinct dialects. Move like flavors of the same dialect. It seems like an orthogonal property that could apply to any dialect.

Comment on lines 54 to 56
builtin_function_handle_compatibility_eof,
bdata::monomorphic::grid(
bdata::make(generateEVMDialectConfigurationsToTest(std::nullopt)) + bdata::make(generateEVMDialectConfigurationsToTest(1)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to actually be testing both EOF and non-EOF despite the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a much more relevant difference is that it's using the latest rather than the default EVM version. In not so distant future both current and latest will have EOF, but the fact that current is not always the latest will remain. If you replace withEOF check with withEOF && evmVersion.supportsEOF() it will be more future-proof. You can then unify the tests and make the source EVM version just another dimension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we should have test case where we cover the "inline" -> "with objects" conversion for all dialects. Converting between distinct dialects is nice to have but these pairs are the most relevant, because for them we actually do perform such a conversion in practice.

@clonker clonker force-pushed the safeguard_builtin_handles branch from 106775f to 3e3258d Compare April 4, 2025 05:27
Copy link

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 18, 2025
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 18, 2025
@clonker clonker force-pushed the safeguard_builtin_handles branch 10 times, most recently from 2a5e7f8 to f721181 Compare April 28, 2025 08:48
@clonker clonker force-pushed the safeguard_builtin_handles branch 7 times, most recently from 8479ac4 to 70e2ab0 Compare April 28, 2025 15:33
@clonker clonker force-pushed the safeguard_builtin_handles branch from 70e2ab0 to 5128ef7 Compare April 29, 2025 08:19
@clonker clonker force-pushed the safeguard_builtin_handles branch from 5128ef7 to 3ed3248 Compare April 29, 2025 08:26
@clonker clonker requested a review from cameel April 29, 2025 08:38
clonker added 4 commits April 29, 2025 10:47
This makes for a better separation between the declarations and the logic that defines builtins.

# Conflicts:
#	libyul/backends/evm/EVMDialect.cpp
Between current default dialect as well as latest dialect and all other dialects.
@clonker clonker force-pushed the safeguard_builtin_handles branch from 3ed3248 to 8a7d619 Compare April 29, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants