require "domains.k"
WASM-DATA
module
module WASM-DATA
imports INT
imports BOOL
imports STRING
imports LIST
imports MAP
imports FLOAT
imports BYTES
syntax IdentifierToken [token]
syntax WasmIntToken [token]
syntax #Layout [token]
syntax WasmStringToken [token]
// ------------------------------
In KWASM
rules, we use #freshId ( Int )
to generate a fresh identifier based on the element index in the current module.
And we use OptionalId
to handle the case where an identifier could be omitted.
syntax Identifier ::= IdentifierToken
| #freshId ( Int )
syntax OptionalId ::= "" [klabel(.Identifier)]
| Identifier
// --------------------------------
In KWasm we store identifiers in maps from Identifier
to Int
, the Int
being an index.
This rule handles adding an OptionalId
as a map key, but only when it is a proper identifier.
syntax Map ::= #saveId (Map, OptionalId, Int) [function]
// -------------------------------------------------------
rule #saveId (MAP, ID:OptionalId, _) => MAP requires notBool isIdentifier(ID)
rule #saveId (MAP, ID:Identifier, IDX) => MAP [ID <- IDX]
Many constructions in Wasm, such a functions, labels, locals, types, etc., are referred to by their index.
In the abstract syntax of Wasm, indicies are 32 bit unsigned integers.
However, we extend the Index
sort with another subsort, Identifier
, in the text format.
syntax Index ::= WasmInt
// ------------------------
Int
is the basic form of index, and indices always need to resolve to integers.
In the text format, Index
is extended with the Identifier
subsort, and there needs to be a way to resolve it to an Int
.
In Wasm, the current "context" contains a mapping from identifiers to indices.
The #ContextLookup
function provides an extensible way to get an Int
from an index.
Any extension of the Index
type requires that the function #ContextLookup
is suitably extended.
For Int
, however, a the context is irrelevant and the index always just resolves to the integer.
syntax Int ::= #ContextLookup ( Map , Index ) [function]
// --------------------------------------------------------
rule #ContextLookup(IDS:Map, I:Int) => I
Element Segment is a list of indices.
It is used when initializing a WebAssembly table, or used as the parameter of the br_table
function.
syntax ElemSegment ::= List{Index, ""} [klabel(listIndex)]
// ----------------------------------------------------------
syntax Int ::= #lenElemSegment (ElemSegment) [function]
syntax Index ::= #getElemSegment (ElemSegment, Int) [function]
// --------------------------------------------------------------
rule #lenElemSegment(.ElemSegment) => 0
rule #lenElemSegment(TFIDX ES) => 1 +Int #lenElemSegment(ES)
rule #getElemSegment(E ES, 0) => E
rule #getElemSegment(E ES, I) => #getElemSegment(ES, I -Int 1) requires I >Int 0
In some cases, an integer is optional, such as when either giving or omitting the max bound when defining a table or memory.
The sort OptionalInt
provides this potentially "undefined" Int
.
syntax OptionalInt ::= Int | ".Int"
// -----------------------------------
WebAssembly has four basic types, for 32 and 64 bit integers and floats.
Here we define AValType
which stands for anonymous valtype
, representing values that doesn't have an identifier associated to it.
Also we define NValType
which stands for named valtype
, representing values that has an identifier associated to it.
ValType
is the aggregation of AValType
and NValType
.
syntax IValType ::= "i32" | "i64"
syntax FValType ::= "f32" | "f64"
syntax AValType ::= IValType | FValType
syntax NValType ::= "{" Identifier AValType "}"
syntax ValType ::= AValType | NValType
// ---------------------------------------
There are two basic type-constructors: sequencing ([_]
) and function spaces (_->_
).
syntax ValTypes ::= List{ValType, ""} [klabel(listValTypes)]
syntax VecType ::= "[" ValTypes "]"
// ------------------------------------
syntax FuncType ::= VecType "->" VecType
// ----------------------------------------
syntax Int ::= lengthValTypes ( ValTypes ) [function, functional]
// -----------------------------------------------------------------
rule lengthValTypes(.ValTypes) => 0
rule lengthValTypes(V VS) => 1 +Int lengthValTypes(VS)
We need helper functions to remove the identifiers from FuncType
.
syntax FuncType ::= unnameFuncType ( FuncType ) [function, functional]
// ----------------------------------------------------------------------
rule unnameFuncType ( [ V1 ]->[ V2 ] ) => [ unnameValTypes ( V1 ) ]->[ V2 ]
We need helper functions to remove all the identifiers from a ValTypes
.
syntax ValTypes ::= unnameValTypes ( ValTypes ) [function, functional]
// ----------------------------------------------------------------------
rule unnameValTypes ( .ValTypes ) => .ValTypes
rule unnameValTypes ( { ID V } VS ) => V unnameValTypes ( VS )
rule unnameValTypes ( V:AValType VS ) => V unnameValTypes ( VS )
All told, a Type
can be a value type, vector of types, or function type.
syntax Type ::= ".Type" | ValType | VecType | FuncType
// ------------------------------------------------------
We can append two ValTypes
s with the _+_
operator.
syntax ValTypes ::= ValTypes "+" ValTypes [function, functional]
// ----------------------------------------------------------------
rule .ValTypes + VTYPES' => VTYPES'
rule (VT VTYPES) + VTYPES' => VT (VTYPES + VTYPES')
Also we can reverse a ValTypes
with #revt
syntax ValTypes ::= #revt ( ValTypes ) [function, functional]
| #revt ( ValTypes , ValTypes ) [function, functional, klabel(#revtAux)]
// ------------------------------------------------------------------------------------------
rule #revt(VT) => #revt(VT, .ValTypes)
rule #revt(.ValTypes, VT') => VT'
rule #revt(V VT , VT') => #revt(VT, V VT')
The #width
function returns the bit-width of a given IValType
.
syntax Int ::= #width ( IValType ) [function, functional]
syntax Int ::= #numBytes ( IValType ) [function, functional, smtlib(numBytes)]
// ------------------------------------------------------------------------------
rule #width(i32) => 32
rule #width(i64) => 64
rule #numBytes(ITYPE) => #width(ITYPE) /Int 8 [concrete]
2 ^Int 32
and 2 ^Int 64
are used often enough to warrant providing helpers for them.
syntax Int ::= #pow ( IValType ) [function, functional, smtlib(pow )] /* 2 ^Int #width(T) */
| #pow1 ( IValType ) [function, functional, smtlib(pow1)] /* 2 ^Int (#width(T) -Int 1) */
// ------------------------------------------------------------------------------------------------------
rule #pow1(i32) => 2147483648
rule #pow (i32) => 4294967296
rule #pow1(i64) => 9223372036854775808
rule #pow (i64) => 18446744073709551616
Here we define the rules about what integer formats can be used in Wasm. For the core language, only regular integers are allowed.
syntax WasmInt ::= Int
// ----------------------
syntax Mut ::= ".Mut" | "const" | "var"
// ---------------------------------------
Tables and memories have limits, defined as either a sinlge Int
or two Int
s, representing min and max bounds.
syntax Limits ::= Int | Int Int
// -------------------------------
WebAssembly values are either integers or floating-point numbers, of 32- or 64-bit widths.
syntax Number ::= Int | Float
// -----------------------------
Proper values are numbers annotated with their types.
syntax IVal ::= "<" IValType ">" Int [klabel(<_>_)]
syntax FVal ::= "<" FValType ">" Float [klabel(<_>_)]
syntax Val ::= "<" AValType ">" Number [klabel(<_>_)]
| IVal | FVal
// ---------------------------
We also add undefined
as a value, which makes many partial functions in the semantics total.
syntax Val ::= "undefined"
// --------------------------
The #chop
function will ensure that an integer value is wrapped to the correct bit-width.
The #wrap
function wraps an integer to a given bit width.
syntax IVal ::= #chop ( IVal ) [function, functional]
// -----------------------------------------------------
rule #chop(< ITYPE > N) => < ITYPE > (N modInt #pow(ITYPE))
syntax Int ::= #wrap(Int, Int) [function, functional]
// ------------------------------------------------------
rule #wrap(WIDTH, N) => N &Int ((1 <<Int WIDTH) -Int 1) requires WIDTH >Int 0 [concrete]
rule #wrap(WIDTH, N) => 0 requires notBool WIDTH >Int 0
In K
all Float
numbers are of 64-bits width by default, so we need to downcast a f32
float to 32-bit manually.
The #round
function casts a f64
float to a f32
float.
f64
floats has 1 bit for the sign, 53 bits for the value and 11 bits for exponent.
f32
floats has 1 bit for the sign, 23 bits for the value and 8 bits for exponent.
syntax FVal ::= #round ( FValType , Number ) [function]
// -------------------------------------------------------
rule #round( f64 , N:Float) => < f64 > roundFloat(N, 53, 11) [concrete]
rule #round( f32 , N:Float) => < f32 > roundFloat(N, 24, 8) [concrete]
rule #round( f64 , N:Int ) => < f64 > Int2Float(N, 53, 11) [concrete]
rule #round( f32 , N:Int ) => < f32 > Int2Float(N, 24, 8) [concrete]
Functions #signed
and #unsigned
allow for easier operation on twos-complement numbers.
These functions assume that the argument integer is in the valid range of signed and unsigned values of the respective type, so they will not correctly map arbitrary integers into the corret range.
Some operations extend integers from 1, 2, or 4 bytes, so a special function with a bit width argument helps with the conversion.
syntax Int ::= #signed ( IValType , Int ) [function]
| #unsigned ( IValType , Int ) [function]
| #signedWidth ( Int , Int ) [function]
// ---------------------------------------------------------
rule #signed(ITYPE, N) => N requires 0 <=Int N andBool N <Int #pow1(ITYPE)
rule #signed(ITYPE, N) => N -Int #pow(ITYPE) requires #pow1(ITYPE) <=Int N andBool N <Int #pow (ITYPE)
rule #unsigned(ITYPE, N) => N +Int #pow(ITYPE) requires N <Int 0
rule #unsigned(ITYPE, N) => N requires 0 <=Int N
rule #signedWidth(1, N) => N requires 0 <=Int N andBool N <Int 128
rule #signedWidth(1, N) => N -Int 256 requires 128 <=Int N andBool N <Int 256
rule #signedWidth(2, N) => N requires 0 <=Int N andBool N <Int 32768
rule #signedWidth(2, N) => N -Int 65536 requires 32768 <=Int N andBool N <Int 65536
rule #signedWidth(4, N) => #signed(i32, N)
Function #bool
converts a Bool
into an Int
.
syntax Int ::= #bool ( Bool ) [function, functional]
// ----------------------------------------------------
rule #bool( B:Bool ) => 1 requires B
rule #bool( B:Bool ) => 0 requires notBool B
WebAssembly is a stack-machine, so here we provide the stack to operate over.
Operator _++_
implements an append operator for sort ValStack
.
syntax ValStack ::= ".ValStack"
| Val ":" ValStack
| ValStack "++" ValStack [function, functional]
// -----------------------------------------------------------------
rule .ValStack ++ VALSTACK' => VALSTACK'
rule (SI : VALSTACK) ++ VALSTACK' => SI : (VALSTACK ++ VALSTACK')
#zero
will create a specified stack of zero values in a given type.
#take
will take the prefix of a given stack.
#drop
will drop the prefix of a given stack.
#revs
will reverse a given stack.
NOTE: #take
and #drop
are total, so in case they could not take/drop enough values before running out, they just return the empty .ValStack
.
Each call site must ensure that this is desired behavior before using these functions.
syntax ValStack ::= #zero ( ValTypes ) [function, functional]
| #take ( Int , ValStack ) [function, functional]
| #drop ( Int , ValStack ) [function, functional]
| #revs ( ValStack ) [function, functional]
| #revs ( ValStack , ValStack ) [function, functional, klabel(#revsAux)]
// ------------------------------------------------------------------------------------------
rule #zero(.ValTypes) => .ValStack
rule #zero(ITYPE:IValType VTYPES) => < ITYPE > 0 : #zero(VTYPES)
rule #zero(FTYPE:FValType VTYPES) => < FTYPE > 0.0 : #zero(VTYPES)
rule #zero({ ID VT } VTYPES) => #zero(VT VTYPES)
rule #take(N, _) => .ValStack requires notBool N >Int 0
rule #take(N, .ValStack) => .ValStack requires N >Int 0
rule #take(N, V : VS) => V : #take(N -Int 1, VS) requires N >Int 0
rule #drop(N, VS) => VS requires notBool N >Int 0
rule #drop(N, .ValStack) => .ValStack requires N >Int 0
rule #drop(N, _ : VS) => #drop(N -Int 1, VS) requires N >Int 0
rule #revs(VS) => #revs(VS, .ValStack)
rule #revs(.ValStack, VS') => VS'
rule #revs(V : VS , VS') => #revs(VS, V : VS')
Wasm uses a different character escape rule with K, so we need to define the unescape
function ourselves.
syntax String ::= unescape(String) [function]
| unescape(String, Int, String) [function, klabel(unescapeAux)]
// -------------------------------------------------------------------------------
rule unescape(S ) => unescape(S, 1, "")
rule unescape(S, IDX, SB) => SB requires IDX ==Int lengthString(S) -Int 1
Unescaped characters just directly gets added to the output. The escaped character starts with a "" and followed by 2 hexdigits will be converted to a unescaped character before stored.
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 1, SB +String substrString(S, IDX, IDX +Int 1))
requires IDX <Int lengthString(S) -Int 1
andBool substrString(S, IDX, IDX +Int 1) =/=K "\\"
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 3, SB +String chrChar(String2Base(substrString(S, IDX +Int 1, IDX +Int 3), 16)))
requires IDX <Int lengthString(S) -Int 3
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool (findChar("0123456789abcdefABCDEF", substrString(S, IDX +Int 1, IDX +Int 2), 0) =/=Int -1 )
The characters "\t", "\n", "\r", """, "'", and "" are interpreted as regular escaped ascii symbols.
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 2, SB +String chrChar(String2Base("09", 16)))
requires IDX <Int lengthString(S) -Int 2
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "t"
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 2, SB +String chrChar(String2Base("0A", 16)))
requires IDX <Int lengthString(S) -Int 2
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "n"
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 2, SB +String chrChar(String2Base("0D", 16)))
requires IDX <Int lengthString(S) -Int 2
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "r"
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 2, SB +String chrChar(String2Base("22", 16)))
requires IDX <Int lengthString(S) -Int 2
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "\""
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 2, SB +String chrChar(String2Base("27", 16)))
requires IDX <Int lengthString(S) -Int 2
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "'"
rule unescape(S, IDX, SB) => unescape(S, IDX +Int 2, SB +String chrChar(String2Base("5C", 16)))
requires IDX <Int lengthString(S) -Int 2
andBool substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "\\"
Longer byte sequences can be encoded as escaped "Unicode", with \u{<hexdigit>+}
.
The implementation is not correct for now because the UTF-8 encoding is not implemented.
syntax Int ::= #idxCloseBracket ( String, Int ) [function]
// ----------------------------------------------------------
rule #idxCloseBracket ( S, I ) => I requires substrString(S, I, I +Int 1) ==String "}"
rule #idxCloseBracket ( S, I ) => #idxCloseBracket ( S, I +Int 1 ) requires substrString(S, I, I +Int 1) =/=String "}"
syntax Bytes ::= #encodeUTF8 ( Int, Endianness ) [function]
// -----------------------------------------------------------
rule #encodeUTF8 (I, E) => Int2Bytes(I, BE, Unsigned) requires I <=Int 127
rule #encodeUTF8 (I, E) => Int2Bytes(((((I &Int 1984) >>Int 6) +Int 192) <<Int 8) +Int ((I &Int 63) +Int 128), BE, Unsigned)
requires I >=Int 128 andBool I <=Int 2047
rule #encodeUTF8 (I, E) => Int2Bytes(((((I &Int 61440) >>Int 12) +Int 224) <<Int 16) +Int ((((I &Int 4032) >>Int 6) +Int 128) <<Int 8) +Int ((I &Int 63) +Int 128), BE, Unsigned)
requires I >=Int 2048 andBool I <=Int 65535
rule #encodeUTF8 (I, E) => Int2Bytes(((((I &Int 1835008) >>Int 18) +Int 240) <<Int 24) +Int ((((I &Int 258048) >>Int 12) +Int 128) <<Int 16) +Int ((((I &Int 4032) >>Int6) +Int 128) <<Int 8) +Int ((I &Int 63) +Int 128), BE, Unsigned)
requires I >=Int 65536 andBool I <=Int 1114111
rule unescape(S, IDX, SB) => unescape(S, #idxCloseBracket(S, IDX) +Int 1, SB +String Bytes2String(#encodeUTF8(String2Base(substrString(S, IDX +Int 3, #idxCloseBracket(S, IDX +Int 3)), 16), BE)))
requires substrString(S, IDX, IDX +Int 1) ==K "\\"
andBool substrString(S, IDX +Int 1, IDX +Int 2) ==K "u"
Wasm binary data can sometimes be specified as a string. The string considered to represent the sequence of UTF-8 bytes that encode it. The exception is for characters that are explicitly escaped which can represent bytes in hexadecimal form.
syntax WasmString ::= ".WasmString"
| WasmStringToken
// -------------------------------------
syntax String ::= #parseWasmString ( WasmStringToken ) [function, functional, hook(STRING.token2string)]
syntax WasmStringToken ::= #unparseWasmString ( String ) [function, functional, hook(STRING.string2token)]
syntax DataString ::= List{WasmString, ""} [klabel(listWasmString)]
// ------------------------------------------------------------------------------------------
DataString
, as is defined in the wasm semantics, is a list of WasmString
s.
#concatDS
concatenates them together into a single string.
The strings to connect needs to be unescaped before concatenated, because the unescape
function removes the quote sign "
before and after each substring.
syntax String ::= #concatDS ( DataString ) [function]
| #concatDS ( DataString, String ) [function, klabel(#concatDSAux)]
// -----------------------------------------------------------------------------------
rule #concatDS ( DS ) => #concatDS ( DS, "" )
rule #concatDS ( .DataString , S ) => S
rule #concatDS ( WS:WasmStringToken DS , S ) => #concatDS ( DS , S +String unescape(#parseWasmString(WS)) )
#DS2Bytes
converts a DataString
to a K builtin Bytes
.
syntax Bytes ::= #DS2Bytes (DataString) [function]
// --------------------------------------------------
rule #DS2Bytes(DS) => String2Bytes(#concatDS(DS))
Wasm memories are byte arrays, sized in pages of 65536 bytes, initialized to be all zero bytes.
To avoid storing many zeros in what may be sparse memory, we implement memory as maps, and store only non-zero bytes.
The maps store integers, but maintains the invariant that each stored integer is between 1 and 255, inclusive.
We make a ByteMap
sort so that the following functions only make sense over this sort.
However, ByteMap
is just a wrapper around regular Map
s.
syntax ByteMap ::= "ByteMap" "<|" Map "|>"
// ------------------------------------------
#setRange(BM, START, VAL, WIDTH)
writes the integer I
to memory as bytes (little-endian), starting at index N
.
syntax ByteMap ::= #setRange(ByteMap, Int, Int, Int) [function, functional, smtlib(setRange)]
// ---------------------------------------------------------------------------------------------
rule #setRange(BM, _, _, WIDTH) => BM
requires notBool (WIDTH >Int 0)
rule #setRange(BM, IDX, VAL, WIDTH) => #setRange(#set(BM, IDX, VAL modInt 256), IDX +Int 1, VAL /Int 256, WIDTH -Int 1)
requires WIDTH >Int 0
[concrete]
#getRange(BM, START, WIDTH)
reads off WIDTH
elements from BM
beginning at position START
, and converts it into an unsigned integer.
The function interprets the range of bytes as little-endian.
syntax Int ::= #getRange (ByteMap, Int , Int) [function, functional, smtlib(getRange)]
// --------------------------------------------------------------------------------------
rule #getRange( _, _, WIDTH) => 0
requires notBool (WIDTH >Int 0)
rule #getRange(BM, START, WIDTH) => #get(BM, START) +Int (#getRange(BM, START +Int 1, WIDTH -Int 1) *Int 256)
requires WIDTH >Int 0
[concrete]
#get
looks up a key in a map, defaulting to 0 if the map does not contain the key.
#set
sets a key in a map, removing the key if the value is 0.
syntax Int ::= #get (ByteMap , Int ) [function, functional, smtlib(mapGet)]
syntax ByteMap ::= #set (ByteMap , Int, Int) [function, functional, smtlib(mapSet)]
// -----------------------------------------------------------------------------------
rule #get( ByteMap <| M |>, KEY ) => {M [KEY]}:>Int requires KEY in_keys(M) [concrete]
rule #get( ByteMap <| M |>, KEY ) => 0 requires notBool KEY in_keys(M)
rule #set( ByteMap <| M |>, KEY, VAL ) => ByteMap <| M [KEY <- undef] |> requires VAL ==Int 0
rule #set( ByteMap <| M |>, KEY, VAL ) => ByteMap <| M [KEY <- VAL ] |> requires notBool (VAL ==Int 0) [concrete]
An external value
is the runtime representation of an entity that can be imported
or exported
.
It is an address
denoting either a function instance
, table instance
, memory instance
, or global instances
in the shared store.
syntax AllocatedKind ::= "func" | "table" | "memory" | "global"
syntax Externval ::= AllocatedKind Index
// --------------------------------------------
endmodule