Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable and composable storage #40

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

ironcev
Copy link
Member

@ironcev ironcev commented Jun 10, 2024

Rendered

This RFC introduces a concept of a Storage trait, as well as the StorageBox and StorageEncodedBox structs. Those types are cornerstones upon which we build a fully flexible, feature rich, and robust access to storage. These concepts simplify defining dynamic storage types, like e.g., StorageVec. They also provide a dedicated way of configuring (in the storage declaration) and initializing (in the code) arbitrary storage types composed of existing storage types.

Additionally, the RFC provides API design guidelines for storage type's APIs. Those guidelines ensure that various aspects of a storage access like, e.g., accessing uninitialized storage, are alway treated in the same way, across various storage types.

Closes #34.

@ironcev ironcev added the enhancement New feature or request label Jun 10, 2024
@ironcev ironcev self-assigned this Jun 10, 2024
@ironcev ironcev requested review from dmihal, SwayStar123, bitzoic and a team June 10, 2024 22:08
Comment on lines +276 to +287
1. The API clearly communicates that the underlying storage might be uninitialized which can cause certain operations, like e.g. `StorageBox::read()` to fail. _All_ operations that can fail must have the `try_` prefix and return an `Option`.
1. Every `try_` operation has its "non-try" counterpart that returns the value and reverts if the operation fails internally. The counterpart is always named the same as the `try_`, but without the `try_` prefix. E.g., `read()` for `try_read()`.
1. Uninitialized storage instances are always safe to use. This way we benefit of the possibility to not pay for default initialization and still safely use the storage instance. Every storage type will provide its own semantics for uninitialized storage state. E.g. an uninitialized [`StorageVec`](../files/0013-configurable-and-composable-storage/api-design/storage_vec.sw) has the same semantics as an empty `StorageVec`.
1. Thus, the failing of `try_` methods can always be seen as a semantic failure. E.g., `StorageVec::try_get()` can fail if the `StorageVec` is not initialized or because it requested element index is out of bounds. Since in the first case we consider the vector to be empty, it is also a semantic error.
1. Certain operations might have a version with a "special behavior". E.g., in the case of the `StorageVec`, we want to provide a possibility to `get()` an element without the expensive length check. In such cases, the method will be named by the original name, and the suffix indicating the "special behavior". E.g., `StorageVec::get_unchecked()`.
1. When the special behavior consists of doing deep clear via `DeepClearStorage` or deep read via `DeepReadStorage`, the suffix will always be `_deep_clear` and `_deep_read`, respectively.

Thus, every storage type operation might come in all or some of these flavours:
- `fn <operation>() -> T`: reverts in case of uninitialized storage or other error.
- `fn try_<operation>() -> Option<T>`
- `fn <operation>_<special_behavior>() -> T`: reverts in case of uninitialized storage or other error.
- `fn try_<operation>_<special_behavior>() -> Option<T>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed changing the naming in the API previously here: FuelLabs/sway#4740

I think we should go with the opposite of what's said in this paragraph.

Most of the time, you want an operation to return Option. I'd much rather we standardize on read+read_unchecked than try_read + read. Reverts should be the exception not the rule and it should be clear when an operation potentially reverts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL; DR

The proposal (for the next discussion round 😄) is:

  • fn <operation>() -> Option<T>: None in case of uninitialized storage or other, semantic, error.
  • fn <operation>_or_revert() -> T
  • fn <operation>_<special_behavior>() -> Option<T>: None in case of uninitialized storage or other, semantic, error.
  • fn <operation>_<special_behavior>_or_revert() -> T
  • fn <operation>() -> T: operation never fails.
  • fn <operation>_<special_behavior>() -> T: operation with special behavior never fails.

Looking at the storage types we have, there will never be an explosion of those variants per operation, and we still have a clear guidelines that capture the overall complexity (explained below).

Detailed Analysis

After spending a good portion of the last week walking in circles and pondering how to capture the complexity of the storage access semantics into consistent API guidelines, here I am again, doing the same 😄

I am completely fine with promoting the Option returning operations to be the default ones.

But we still need to capture the overall matrix of possible combinations. Since FuelLabs/sway/#4740 is not going into details there, let's do it here.

Before starting, just to emphasise the difference, the term "unchecked" in the #4740 and in the RFC paragraph, means two different things. In the #4740 it is a synonim for unwrap while in the RFC it represents a "special behavior". In the concrete example, running the StorageVec::get() without checking the vector length and out-of-bounds access.

To the matrix of possible combinations and what we want to capture in the API.

Every operation can potentially:

  • fail if the storage is not initialized. E.g., StorageBox::read() on uninitialized StorageBox.
  • fail because of a semantically wrong usage. E.g., out of bounds StorageVec::get() on initialized StorageVec.
  • have one or more "special behaviors" that affect what the operation do. E.g., StorageVec::get() without lenght checks. E.g., StorageVec::pop() with deep clear.
    • we expect a small number of special behaviors per operation, very often none or mostly one.
    • special behaviors might require additional trait constraints. E.g. DeepClearStorage.
    • special behavior operations can also fail because of the first two points.
  • have none of the above; neither it can fail nor it has any special behaviors. E.g., StorageMap::get().

The guidelines should reflect the above matrix.

What RFC proposes, to simplify the matrix and the API, and assuming that we are not losing any valuable information there, is to treat the access to the uninitialized storage as a semantically wrong usage and not as a special technical error.

Means, every storage type assigns a meaningful default semantics to the uninitialized state. E.g, an uninitialized StorageVec behaves the same as an empty StorageVec. For all the storage types we currently support, this is possible in an intuitive way.

It also has a benefit of just creating an instance with new() without the need for initialization. StorageVec::new() behaves the same as StorageVec::init([]) but does not produce any storage writes until used. For storage types I've played with and the sample StoragePair having this meaningful default behavior was always possible.

If we do not join those two points and consistently want to distinguish between the failing because of the storage being uninitialized and because of a semantically wrong usage, we will necessarily end up in the cumbersome API like, e.g., StorageVec::get() returning Option<Option<T>> where the first option means failure on uninitialized and the second out-of-bounds. I doubt that developers will ever want to distinguish those two cases.

One more thing to consider in the guidelines is:

it should be clear when an operation potentially reverts

How to achieve this via naming convention only? (We are talking here about the computational effects. We had this discussion internally triggered by the #[storage] attributes not being a part of the function type.)

One possibility would be:

  • <operation>() -> Option<T>: does not revert.
  • <operation>_<unchecked or some better name>() -> T: potentially reverts.
  • <operation>() -> T: does not revert. E.g., StorageMap::get() -> T.

And all this should also be aplicable to "special behavior" variants.

Using the term "unchecked" as a suffix I see as misleading. Because, in the case of, e.g., StorageBox::read_unchecked() it actually does check and it will revert if that check fails.

"unchecked" as a word implies that we are actually skipping checks and who knows what can happen. Developers responsibility. E.g., in the RFC, the StorageVec::get_unchecked() cannot fail because it literally does not check the bounds and returns the calculated element. Accessing that element later on can fail, but the get_unchecked() itself cannot.

I am arguing here to adopt some other terminology for the <operation>_<give me the result or revert if it is not there>(), and not not to have that variant.

Perhaps <operation>_or_revert()?

To wrap it up, the proposal (for the next discussion round 😄) is:

  • fn <operation>() -> Option<T>: None in case of uninitialized storage or other, semantic, error.
  • fn <operation>_or_revert() -> T
  • fn <operation>_<special_behavior>() -> Option<T>: None in case of uninitialized storage or other, semantic, error.
  • fn <operation>_<special_behavior>_or_revert() -> T
  • fn <operation>() -> T: operation never fails.
  • fn <operation>_<special_behavior>() -> T: operation with special behavior never fails.

Looking at the storage types we have, there will never be an explosion of those variants per operation, and we still have a clear guidelines that capture the overall complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question that immediately comes to my mind is why _or_revert()/unchecked() at all, when we can simply call unwrap()?

The <operation>() <-> try_<operation>() pairs are there from the beginning if that's a valid reason. <operation>().unwrap() results in a potentially much bigger code then <operation>_or_revert() (see FuelLabs/sway/#5982), but this should be handled in optimizations.

Also, other APIs, aside from the storage API, does not have such pairs. E.g. Vec has only get() -> Option<T>.

So why to have _or_revert()/unchecked() so prominently in the storage API?

Comment on lines +295 to +347
## Storage references

[storage-references]: #storage-references

Sometimes we want to be able to store in the storage a "reference" to another storage instance. Storage API and the compiler will provide support for such use cases. On the API level, the `StorageRef` type (defined in the [`storage.sw`](../files/0013-configurable-and-composable-storage/sway-libs/storage.sw)) will contain a type-safe "reference" to a storage element. This "reference" will internally just be the `StorageKey` of the referenced storage element.

```Sway
pub enum StorageRef<TStorage> where TStorage: Storage {
Null: (),
Ref: StorageKey,
}
```

`StorageRef`s will be stored in [`StorageRefBox`es](../files/0013-configurable-and-composable-storage/sway-libs/storage/storage_ref_box.sw) and once retrieved from the storage it will be possible to dereference them to access the referenced storage.

Every storage type automatically provides the `StorageRef` that references it, via the `Storage::as_ref` method.

Thus, storage references provide a convenient, type-safe way to express referencing storage elements, but with a price that might be considerable in some cases. Namely, the `StorageRef` requires two storage slots to store a reference. This means that developers might consider other, less storage consuming, manual approaches to "reference" a storage entity. E.g., if referenced elements are stored in a `StorageVec`, "referencing" them by storing their vector indices as "references" might be less storage-costly. In this case, we are trading type-safety and clear built-in API for the performance.

Storage reference will be supported in the `storage` declarations, by allowing the `storage` keyword to be used on the RHS of the _configure with_ operator:

```Sway
type StVecOfU64 = StorageVec<StorageBox<u64>>;
type StPairOfStVecOfU64 = StoragePair<StVecOfU64, StVecOfU64>;

storage {
vec_1: StVecOfU64 := [11, 22, 33],
vec_2: StVecOfU64 := [44, 55, 66],

pair_1: StPairOfStVecOfU64 := ([0, 0, 0], [1, 1, 1]),
pair_2: StPairOfStVecOfU64 := ([], []),

map_1: StorageMap<str[3], StVecOfU64> := [
("abc", [1, 2, 3]),
("def", [11, 22, 33]),
("ghi", [111, 222, 333]),
]

//--
// Storage references can be used in storage configurations.
// They can reference other storage elements, or their parts.
//--
vec_of_refs: StorageVec<StorageRefBox<StVecOfU64>> := [
StorageRef::Null,
storage.vec_1.as_ref(), // <<-- Note using the `storage` keyword here, as well as `as_ref`.
storage.pair_1.first().as_ref(),
storage.map_1.get("abc").as_ref(),
StorageRef::Ref(some_const_fn_that_returns_the_storage_key_of_the_referenced_storage()),
],
}
```

For more examples on using storage references, see the [`demo_contract_storage_references.sw`](../files/0013-configurable-and-composable-storage/contracts/demo_contract_storage_references.sw).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think StorageRef is an unnecessary complication.

I don't see what this adds over Option<StorageKey> which would already be supported. If anything it is worse because it reintroduces a potential implicit null.

Unless there's a compelling reason to have a special type, I say we should remove this and recommend storing StorageKeys and Option<StorageKey>s for references to other parts of the storage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the enum and the Null and using Option instead I like a lot!

Actually, the first version of the StorageRef was a struct type-safely wrapping the StorageKey but then I ended up thinking how to support not having a reference and introduced Null motivated by the similar programming language concept. The result was putting the burden of checking for Null when dereferencing always, even in the situations where the reference will always be there. Using Option<StorageRef> where the StorageRef is a struct is definitely the right way!

To the question of having the dedicated StorageRef, or using the general StorageKey directly, the benefit of having the struct StorageRef<TStorage> is there, because of these two reasons:

  1. Type-safety.
  2. Having easy-to-use, zero-(additional)-cost abstraction that clearly communicates the intent and provides a standardized usage pattern.

Ad 1) StorageRef<TStorage> referrs to a storage of a particular type and that is type-checked by the compiler. It also clearly communicates that it referrs to a storage of a particular type. Using StorageKey would be more like having a raw pointer into the storage.

Ad 2) Using the naked StorageKey will require from developers to implement their own deref equivalents every time they use it for referencing. Those will be cumbersome and error-prone, because they will need to use the proper way to create the proper storage type etc. In the worse case, we will end up having that code in contracts, and in the best case, having it packed in various libraries. Having the StorageRef provided by the standard library gives a simple type that clearly communicates the intent and gives all convenience in a standardized and zero-cost way.

Comment on lines +466 to +470
```
<Config> := StorageConfig<TValue>
| [<Config>]
| (<Config>, ...);
```
Copy link
Contributor

@IGI-111 IGI-111 Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism is sound, but we need better naming than "config". It should be clearer that StorageConfig represents a single address initializer for consecutive slots and that Config is a whole, potentially composed, initializer.

Maybe StorageInit and StorageAddressInit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal! I am not happy with the term Config either. While the Value as the name of the associated type is intuitive, Config does not carry such clarity.

Here we need to be cautios of the overall "initialization" vs "configuration" confusion. Since configurables and storage are actually configured in code and that configuration can be changed at deployment time, the term "configuration" is consistently used everywhere in the RFC. Sticking to that term will help to remove existing confusion, where the choice of the syntax and wording implies "initialization" and "assignment" to something that is rather a "configuration".

E.g., the Storage::init constructor on the other side, is an "initialization" because it will create a storge type instance at the runtime and actually stores the provided values into the storage.

To the origin of the current, non-optimal name Config, which I tried to change (my thinking below). Since both the associated type Config and the StorageConfig are used only by the compiler during the configuration, I've kept the name Config although I don't like it.

Alternative names I've played with for the Config associated type were:

  • ValueLayout: because it provides the information on how the value (described by the Value type) is layed out in the storage. The reason I didn't choose it is, that storage types can store additional information as well, e.g., StorageVec its length, and not only what is in the Value.
  • StorageLayout: it clashed with the StorageLayout enum.
  • StorageContent: because it describes which content is stored in the storage and where exactly.

Using "Init" in the name, as explained about, can lead back to the current confusion.

Perhaps SlotsConfig because it provides the information on how to configure the slots, or even better SlotsContent?

Additional proposals are welcome 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, regardless of the terminology we chose at the end, I like the proposed formulation (but to change "initializing" to "configuring"):

StorageConfig represents a single address initializer for consecutive slots and that Config is a whole, potentially composed, initializer

Hmmm, perhaps:

Storage(Address)Config/Content represents a configuration for storing a single value/content in consecutive slots starting at the address specified by the storage_key. The SlotsConfig/Content is a whole, potentially composed, configuration, for storing the entire value contained in the Storage.

Whatever the final terminology will be, it should sound well in that sentence 😄

Copy link
Contributor

@IGI-111 IGI-111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalized STD is confusing because the standard library is already made up of std and core, either use those specifically or just say "the standard library".

rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved
rfcs/0013-configurable-and-composable-storage.md Outdated Show resolved Hide resolved

The STD will still provide the low-level storage `read`, `write`, and `clear` functions. However, unlike now when they are actually sometimes needed in contract code, e.g., to store arrays, in the new storage implementation there should never be a need to use them in contracts. Instead, the atomic `StorageBox` and `StorageEncodedBox` should be used. They provide the same low-level functionality while offering safe storage access.

The low-level API will be used only when implementing storage types, and even in those cases not always. Thus, the proposal is to move them to the module named [`internal`](../files/0013-configurable-and-composable-storage/sway-libs/storage/internal.sw) to emphasize that they are, similar to `Storage::internal_` functions, meant to be used only when developing custom storage types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct programmatic manipulation of storage should not be encouraged but it should not be this discouraged either. There are cases where it's a better more straightforward pattern to use to implement what you want.

I do think having it in a separate module is fine, but I don't like the implication that it is "internal", it's merly raw or low level. But it's perfectly valid for the end user and not specific to storage type implementors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!

Encouraging and/or discouraging usage is a separate topic, similar to the one of using raw pointers in contract code. It is to be reflected in the way how we convey and document the library/language features, patterns, etc. But in both cases, even discouraging the usage does not justify the name "internal".

The current module name is storage_api which we can keep. But, I am more for raw_api or some other classifier which will better convay that this is the low-level API, rarely needed, if at all, in the contract code.

@bitzoic Do you have a view or proposal here?


[drawbacks]: #drawbacks

The only drawback I can think of is the time and effort needed to implement the proposal and to deal with the [breaking changes](#breaking-changes). However, the impact of _not_ improving the current storage handling will over time surely be higher. It would mean carrying on the issues mentioned in the [Motivation](#motivation) and, worst of all, living with an API that is error-prone by-design.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have some discussion of the performance implications of this change. Can storage get slower as a result of this, if employed correctly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say - no 😄 (to the negative implications, not to the discussion 😄)

Or at least I don't see any negative impact on the performance, compared to the approach we have now. On the contrary:

  • Performance was in focus and considered carefully. It resulted, e.g, in the StorageLayout enum being introduced.
  • Just using the StorageLayout information consistently will help Storage implementors to provide implementations that optimally pack and store data across the whole hierarchy of the stored data.
  • Having the convention for "special behavior" methods should encourage providing optimized versions of methods where high-performance is needed and safety guaranteed by surrounding code.

But it can be that I am missing something, so yes we can surely discuss it further. And of course reviewing the proposal from the performance standpoint is highly encouraged.

- naming of proposed code elements like, e.g., method/function names, type names, etc. Since those names will become the part of the storage API it is of highest importance to come up with good and expressive names for abstractions.

Also, I expect the current storage attributes, `#[storage(read, write)]`, to be discussed. In the sample code, the existing attributes are used. However, there are two questions to be decided on:
- if the meaning of `write` remains "read and write" as it is now, the attributes should be renamed to `readonly` and `readwrite`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are important reasons (for standards, notably) that we cannot be explicitly bound to read and write and must have read only/read write semantics.

Maybe we can just have the smallest change:

#[storage(read)]
#[storage(read_write)]


const fn new(self_key: &StorageKey) -> Self;

const fn internal_get_config(self_key: &StorageKey, value: &Self::Value) -> Self::Config;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with StorageInit instead of Config, maybe this can be internal_make_initializer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the verb make! I didn't like get and was thinking about alternatives, but didn't like them, e.g. create, either. They all sounded very "runtimeysh". make somehow does not have that apeal. Here is your self key and the value to store, make me the <config (good name still to find 😄)>.

As mentioned, initializer can mislead and in the and, the final name behind make will depend on how we name the Config associated type.

This is where the `Storage::Config` associated type comes in play, together with the `const fn` function `internal_get_config()`.

Before generating the slots, the compiler already knows:
- the `StorageKey` at which a particular `storage` element will be stored. This key is calculated based on the element name and namespace, or it is given using the `in` keyword. This storage key is, as mentioned below, called the _self key_ of the storage element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of "self key" is an artifact of our thought process and it's littered all over this proposal, but I think we should try to come up with a better one since it's no longer involved with self as a keyword.

Maybe "field key"? Since it's the StorageKey for a given storage field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, self_key is the best name 😄 IMO, of course.

It very precisely communicates what it is, and that's why it is introduced and consistently used in the RFC:

An important term to establish will be the self key. Self key is the StorageKey at which an instance of a storage type is stored.

Why I see it having properties of a perfectly good name:

  • In the implementations of Storage types, the term self_key immediately communicates that "this is my storage key, the storage key at which I am stored".
  • The public self_key() method is equally clear. It gives a clear answer to the question "what is your key?". (We need this method for, e.g., referencing or any other scenario where the knowledge of the slot location of the storage type instance is needed.)
  • In the documentation and communication, being reflexive it immediately points to the storage instance we are talking about.

Any other word that does not imply reflexive relationship (like "self", or "me") will be to broad and possibly applicable to several instances and the question "whose, of what" will not be immediately incorporated in the word.

E.g., having the field_key everywhere rises the question "which field? whose field?". The sentence "Field key is the StorageKey at which the instance of a storage type is stored" calls for explaining why the word "field". We have to keep in mind that storage type instances can be used independent of the storage which makes the term "field" even less understandable and hard to justify.

Also, some storage types could have clashing terminology with the word we choose. E.g., if we chose "element key", it will clash with what an element in the StorageVec is.

That's why I am for the name that implies reflexive relationship and see "self" as the perfect candidate. It also corresponds to what "self" is in Sway types. Something that points to, or denotes, myself.

Copy link
Member

@bitzoic bitzoic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first look, I am unsure how I feel about including StorageBox in the storage definition from a user perspective. It makes sense however with the current storage implementation the StorageKey is behind the scenes and makes it much more readable:

storage {
     box_1: StorageBox<u64> := 0,
}

vs

storage {
     box_1: u64 = 0, // This is actually a StorageKey<u64>
}

It is unclear for me whether this is still achievable on the compiler side.

As AbiEncode is implemented for primitive types such as u64, couldn't StorageEncodedBox be used? How do we know we are using using "the correct one"? A lot of the confusion in storage that we have seen from external developers is around having separate types for storage i.e. StorageVec vs Vec. Requiring that developers need to make the choice between StorageBox and StorageEncodedBox could further that confusion.

I am also wondering what the difference is here between StorageVec and Vec:

vec_of_encoded_val_1: StorageVec<StorageEncodedBox<Vec<bool>> := [Vec::from([true, false, true]), Vec::default(), Vec::from([true])],

As Vec is a heap type and is supported, wouldn't this obsolete StorageVec? In what case would I use one over the other? How would we ensure that developers know which to use?

I am very much so liking the use of write_deep_clear and it's associated functions as well as StorageLayout

@IGI-111
Copy link
Contributor

IGI-111 commented Jun 11, 2024

@bitzoic A big part of this change is trying to separate and give a proper interface between the structure of what is being stored and the storage method itself.

This is why we can't stay with implicit u64 fields: you should tell the compiler how you want things stored, and indeed you should actually have to make that decision. Encoding has benefits but it also has a cost you may not want to pay, etc.

Being explicit about the storage types also has the nice benefit that storage.a is actually of the type of field a in storage, which means you can easily mentally model what's happening with no magic.

It also means that we do not need to be opinionated as to how people store things, and they can build their own storage data structures as libraries (as is already being done by some users).

Co-authored-by: Cameron Carstens <[email protected]>
Co-authored-by: IGI-111 <[email protected]>
@ironcev
Copy link
Member Author

ironcev commented Jun 12, 2024

@bitzoic Why we insist on StorageBox<u64> is already explained and the clarity of the intention is the main reason. Let me just add another one reason, although being just a consequence of the composability, why inferring StorageBox usage from the compiler in a general case would be impossible.

Currently, whatever the type in the storage field declaration is, it is wrapped in the StorageKey. With the composable storage, that's not the case. Consider, e.g., StorageMap<str[3], StorageBox<u64>>. If we allow for StorageMap<str[3], u64> and expect the compiler to infer StorageBoxes, how can the compiler know, in a general case, which type arguments to wrap within the StorageBox and which not.

The questions regarding the choices between the types are great! I expect them to be well addressed in the documentation. Actually, discussing them was also a part of the initial RFC, but I've removed them to shorten the overall length of already lengthy RFC. But I think now I'll put them back, at least a short explanation.

As AbiEncode is implemented for primitive types such as u64, couldn't StorageEncodedBox be used? How do we know we are using using "the correct one"?

Currently, the negative impl in the StorageEncodedBox forbids its usage with the types that can be stored without encoding. There is a TODO-DISCUSSION comment above asking if this should be a hard error or a compiler warning.

In both cases, developers will be properly informed and guided by the compiler if they tried to do StorageEncodedBox<u64>.

I am also wondering what the difference is here between StorageVec and Vec
As Vec is a heap type and is supported, wouldn't this obsolete StorageVec? In what case would I use one over the other? How would we ensure that developers know which to use?

StorageVec provides access to individual vector elements, thus alowing pushing and reading individual elements in a manner that minimizes the number of storage reads and writes.

StorageEncodedBox<Vec> on the other side always reads and writes back the entire vector. This is very likely not what developers want and we will explain the difference.

The intended usage of the StorageEncodedBox is with custom dynamic Sway types that thus must be encoded to be stored but are expected to be read entirely, like the Struct used in the example that contains a Vec and a String.

So, I expect having something like StorageVec<StorageEncodedBox<MyTypeThatNeedsEncoding>> to be seen.

The interesting consequence of this is that the StorageString will be obsolete or just become a type alias for the StorageEncodedBox<String>. Because in this case we want to store and read the entire string and not having each character be stored and accessable separately within the storage.

I'll address this in the RFC. We will definitely provide sufficient guidelines and examples in the documentation so that developers know which storage type to use.

@SwayStar123
Copy link
Member

@ironcev Is there any situation in which i can use StorageBox and would want to use StorageEncodedBox instead?

@esdrubal
Copy link

esdrubal commented Jun 27, 2024

The RFC looks good to me and is a noticeable improvement on what we have now.

I noticed that StorageBox<T> requires T to implement Serializable and StorageEncodedBox<T> requires T to implement AbiEncode + AbiDecode.
The RFC also mentions that StorageBox is more gas efficient than StorageEncodedBox, the question I have Is why is it more efficient? How would Serializable trait work? We cannot copy the contiguous memory if we assume the memory representation of types can change in the future.

@ironcev
Copy link
Member Author

ironcev commented Jul 1, 2024

Is there any situation in which i can use StorageBox and would want to use StorageEncodedBox instead?

@SwayStar123 I am not aware of any. In the proposal, this line:

impl<T> !Storage for StorageEncodedBox<T> where T: Serializable { }

forbids using StorageEncodedBox if the type can be stored using StorageBox.

However, I've left the discussion question asking the same as you, is there a valid reason to allow that and treat it as a warning rather then forbidding it:

// TODO-DISCUSSION: Shell we forbid encoded-boxing serializable types and thus force
//                  them to be boxed in `StorageBox` or should this only be a compiler warning?
//                  Essentially, if a type is `Serializable` encoding it unnecessarily
//                  is a huge waste of computational and storage resources.

Note that we can always start as proposed, having a compiler error if a serializable type and thus StorageBoxable type is attempted to be stored in EncodedStorageBox and later on remove the restriction and replace it with a warning, without having a breaking change.

@ironcev
Copy link
Member Author

ironcev commented Jul 1, 2024

The RFC also mentions that StorageBox is more gas efficient than StorageEncodedBox, the question I have Is why is it more efficient? How would Serializable trait work? We cannot copy the contiguous memory if we assume the memory representation of types can change in the future.

@esdrubal Currently, this is exactly what we do when storing types into storage, copying memory representation as is. At compile time storage configuration uses the Sway memory layout information to calculate the slots. The Serializable marker trait puts the safety around copying memory, ensuring that we do not copy types that, e.g., contain pointers. The similar guard is there when we autoimpl AbiEncode.

If the memory representation changes in the future, the issue might indeed arise if we e.g. have code accessing storage that gets LDCed into a contract that used old memory layout. This is a valid concern. I see it being in the same category like some others where we traded undefined behavior for performance. Paying for decoding/encoding at every storage access would definitely be a very high cost to pay for a change of memory layout that we will potentially never introduce. Also, note that we would have a similar issue with the StorageEncodedBox if the encoding changes from v1 to vNext.

That's why I am for addressing this concern as a breaking change in existing contracts if it ever happens.

As a side note, what we might introduce in the future is the configurable memory layout equivalent to #[repr(...)] in Rust: FuelLabs/sway/#5286

In that case, we would still be copying the memory. The compiler would be aware of the #[repr] attribute and during configuration calculate the slots accordingly. The above concern would remain or be even more prominent. E.g. having the same struct but with two different #[repr]s in different contracts accessing the same storage.

CC: @IGI-111

@SwayStar123
Copy link
Member

Is there any situation in which i can use StorageBox and would want to use StorageEncodedBox instead?

@SwayStar123 I am not aware of any. In the proposal, this line:

impl<T> !Storage for StorageEncodedBox<T> where T: Serializable { }

forbids using StorageEncodedBox if the type can be stored using StorageBox.

However, I've left the discussion question asking the same as you, is there a valid reason to allow that and treat it as a warning rather then forbidding it:

// TODO-DISCUSSION: Shell we forbid encoded-boxing serializable types and thus force
//                  them to be boxed in `StorageBox` or should this only be a compiler warning?
//                  Essentially, if a type is `Serializable` encoding it unnecessarily
//                  is a huge waste of computational and storage resources.

Note that we can always start as proposed, having a compiler error if a serializable type and thus StorageBoxable type is attempted to be stored in EncodedStorageBox and later on remove the restriction and replace it with a warning, without having a breaking change.

Im confused then, the stated reason for having explicit storagebox and storageencodedbox is that the developer should make a conscious decision to choose between them, but apparently theres no choice at all?

@ironcev
Copy link
Member Author

ironcev commented Jul 1, 2024

Here are some additional remarks and concerns coming from discussion with @SwayStar123.

  1. StorageBox and EncodedStorageBox. Unnecessary for the developer to have to think about this. Can't the compiler just figure out which one is necessary?

    The reasoning for explicitly using StorageBox and EncodedStorageBox are already discussed in detail here:
    Configurable and composable storage #40 (comment)
    Configurable and composable storage #40 (comment)

    I would like to additionally point out, that in general, developers will actually not need to think about the existance of two stroage boxes. In vast majority of cases, it is simply going to be the StroageBox. We will need to properly convey the mental model of our storage approach. Storage types offer structural view over the storage creating hierarchies ending up in atomic storage boxes. Those will, for the majority of cases be the StorageBox.

    If the EncodedStorageBox is needed, and a developer still habitually picks StorageBox, the compiler error will clearly explain why the encoded version is needed. Expressive diagnostics enable us to help developers learn on their own code.

    My point is, in general, there will be no real need to think much upfront about the choice between two storage boxes.

  2. The fact that there are non interchangable vec and storagevec. Strongly against this design. Not only will this be hard to communicate to all devs, it's also adding complexity as compared to solidity, is it not supposed to be the opposite?

    This is a direct consequence of two design decisions in Sway as a language and our approach to storage, established with Storage v2.0.

    • First, unlike dynamic arrays in Solidity, Sway's vectors are not a language construct but rather a library type. From the compilers perspective, there is no any difference between the Vec and any other ordinary Sway struct. Vec is just a struct that internally does memory allocation.
    • Second, Sway transparently presents the nature of the contract storage. It has the explicit storage declaration to declare storage "variables" and insist on reading and writing to storage via methods that clearly convey that they might fail. The storage access is intentionally conceptually presented as IO access unlike accessing variables in memory.

    The combination of these two design choices makes StorageVec and Vec different in the API and not interchangeable. Even if we would built-in special support for Vec into compiler (not something I would advocate) and compile it differently when in memory and in storage, Vec's API as such wouldn't fit to storage API.

    The perceived complexity, which I fully agree is there, needs to be communicated through the lenses of the benefits of the Sway approach to storage. Here I like how @IGI-111 put it, "no magic" + "flexibility":

    Being explicit about the storage types also has the nice benefit that storage.a is actually of the type of field a in storage, which means you can easily mentally model what's happening with no magic.
    It also means that we do not need to be opinionated as to how people store things, and they can build their own storage data structures as libraries

  3. I can imagine many devs not even knowing storagevec existing and using horribly inefficient code where they read and write the entire vecs everytime they want to update an element

    With a strong background in a managed language simple to use in an unperformant way, I also see this as a problem.

    The RFC pays a lot of attention to providing the infrastructure and the API that will be difficult to use in a false way, both when defining stroage types and using them. This is one example where, purely throught the API it is not possible to guide the developers to the right choice. On the other side, treating Vec differently on the compiler side, or forbidding storing dynamic structs is also not an option.

    The question is what to do to mitigate the potential unintended misuse.

    I see two complementary approaches here. The documentation will have StorageVec as the default narrative, as it is now. Also the StorageBox. The EncodedStorageBox<Vec> should be on the margins and explained with red flag and as potential optimization and very likely not the regular case. In addition, we've discussed standardized static analysis tool similar to Clippy for Sway. E.g., Sway Analyzer can give a warning when using encoded Vec in storage.

    But to come back to the flexibility point, storing EncodedStorageBox<Vec> can be the more performant option in certain scenarios. E.g., small vectors of u8 that need to be loaded and stored completely, etc.

@ironcev
Copy link
Member Author

ironcev commented Jul 2, 2024

Im confused then, the stated reason for having explicit storagebox and storageencodedbox is that the developer should make a conscious decision to choose between them, but apparently theres no choice at all?

Fair point 😄 We've discussed the topic of choice and the need for both the StorageBox and the StorageEncodedBox in the team meeting. The discussion addresses this question and solves as well the open TODO-DISCUSSION point that was asking if we should forbid storing serializable types in the StorageEncodedBox.

Please see the Explain difference between StorageBox and the StorageEncodedBox commit for the conclusion.

CC: @SwayStar123, @esdrubal, @IGI-111

@ironcev
Copy link
Member Author

ironcev commented Jul 2, 2024

Hmm, the two dead links in the collection_context RFC actually exist, both the old 2019 ones, and the newer, but the link verification still fails.

[✖] https://papl.cs.brown.edu/2020/types.html#(part._.Recursive_.Datatype_.Definitions) → Status: 0
[✖] https://papl.cs.brown.edu/2020/Type_Inference.html → Status: 0

@ironcev ironcev requested a review from Nazeeh21 July 4, 2024 11:36
@Nazeeh21
Copy link
Member

Nazeeh21 commented Jul 9, 2024

By extending the convo with @ironcev from Slack I would like to propose a concept/implementation for the StorageMap concept. I along with one of the SRs from the attackathon, have put all the info regarding that in the gist here along with the example implementation. It might be a bit clumsy right now, but at least it presents the idea. I reworked a bit the original StorageKey concept and combined it with the existing StorageMap, the outcome is a single struct.

I think it should also support nested maps, but need to test it.

The API of this SecureStorageMap thing is a bit tricky to use as methods and constructors return Result, so maybe that could be somehow improved. Let me know what you think about it.

@ironcev ironcev mentioned this pull request Aug 25, 2024
8 tasks
IGI-111 added a commit to FuelLabs/sway that referenced this pull request Nov 15, 2024
## Description

This PR fixes #6317 by implementing `storage_domains` experimental
feature.

This feature introduces two _storage domains_, one for the storage
fields defined inside of the `storage` declaration, and a different one
for storage slots generated inside of the `StorageMap`.

The PR strictly fixes the issue described in #6317 by applying the
recommendation proposed in the issue.
A general approach to storage key domains will be discussed as a part of
the [Configurable and composable storage
RFC](FuelLabs/sway-rfcs#40).

Additionally, the PR:
- adds expressive diagnostics for the duplicated storage keys warning.
- removes the misleading internal compiler error that always followed
the storage key type mismatch error (see demo below).

Closes #6317, #6701.

## Breaking Changes

The PR changes the way how storage keys are generated for `storage`
fields. Instead of `sha256("storage::ns1::ns2.field_name")` we now use
`sha256((0u8, "storage::ns1::ns2.field_name"))`.

This is a breaking change for those who relied on the storage key
calculation.

## Demo

Before, every type-mismatch was always followed by an ICE, because the
compilation continued despite the type-mismatch.

![Mismatched types and ICE -
Before](https://github.com/user-attachments/assets/ac7915f7-3458-409e-a2bb-118dd4925234)

This is solved now, and the type-mismatch has a dedicated help message.

![Mismatched types and ICE -
After](https://github.com/user-attachments/assets/570aedd3-4c9c-4945-bfd0-5f12d68dbead)

## Checklist

- [x] I have linked to any relevant issues.
- [x] I have commented my code, particularly in hard-to-understand
areas.
- [ ] I have updated the documentation where relevant (API docs, the
reference, and the Sway book).
- [ ] If my change requires substantial documentation changes, I have
[requested support from the DevRel
team](https://github.com/FuelLabs/devrel-requests/issues/new/choose)
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [x] I have added (or requested a maintainer to add) the necessary
`Breaking*` or `New Feature` labels where relevant.
- [x] I have done my best to ensure that my PR adheres to [the Fuel Labs
Code Review
Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md).
- [x] I have requested a review from the relevant team or maintainers.

---------

Co-authored-by: Sophie Dankel <[email protected]>
Co-authored-by: IGI-111 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Storage v3 RFC
6 participants