This document describes the bytecode virtual machine used by aver run, aver verify, and aver replay.
It is a design note, not a frozen spec. The opcode set and internal representation may still change while the VM matures.
The Aver VM is the sole execution backend for Aver programs.
It is intentionally language-shaped, not generic:
- opcodes model Aver concepts directly
- pattern matching is compiled to explicit match/destructure instructions
- tail calls are part of the ISA
- records, variants, wrappers, lists, and tuples are first-class runtime values
This is not a mini-JVM or a universal IR. It is a runtime designed around the constraints of Aver.
The VM compiles resolved Aver AST into bytecode function chunks:
src/vm/compiler.rslowers AST to bytecodesrc/vm/execute.rsruns the stack machinesrc/vm/opcode.rsdefines the ISAsrc/vm/runtime.rshandles builtins, effects, and record/replay at the host boundary
Execution is stack-based:
- locals live in the current frame
- operands are pushed onto the VM stack
- calls create or reuse frames
- returns leave one value on the caller stack
The VM now also marks conservatively-classified thin functions and parent-thin functions.
These are small helpers that do not use tail-call frame reuse, do not write globals, and do not emit obvious aggregate-construction opcodes such as RECORD_UPDATE, WRAP, LIST_*, TUPLE_NEW, or VARIANT_NEW.
When a thin function returns and the runtime can confirm that its local young / yard / handoff marks never moved, the VM skips the normal boundary relocation path entirely.
In practice this means many tiny Aver helpers now behave like:
- keep normal stack locals while running
- but do not pay full survivor/stable bookkeeping on return
- unless they actually created local heap state after all
parent-thin is narrower and more Aver-specific:
- it is meant for wrapper-like helpers, not for general small functions
- it borrows the caller
younglane directly - it avoids ordinary-return
handoffas long as it never touchesyard/handoff - its local
youngscratch dies later at the caller boundary instead of forcing a helper-local relocation step
The classifier is deliberately less strict than a pure "single expression only" rule:
- small
matchhelpers with local bindings can still beparent-thin - field/tuple extraction and other tiny control-flow opcodes are allowed
- nullary variant constructors (
Status.Todo) are treated as inline results, so they can still stay on the thin / parent-thin fast path - list destructuring and obviously aggregate-building builtins still stay out of
parent-thin
Execution-backed commands run through the VM:
aver run app.av
aver verify app.av
aver replay recordings/The VM runs on NanValue, not on the higher-level Value enum.
The current layout is best thought of as semantic tags first, storage second.
Floats still use the plain IEEE path. Everything else is a tagged quiet-NaN:
63 50 49 46 45 0
┌────────┬──────┬────────────────────────┐
│ 0x7FFC │ tag │ payload │
│ 14 bits│ 4 bit│ 46 bits │
└────────┴──────┴────────────────────────┘
Current tags:
| Tag | Meaning | Payload shape |
|---|---|---|
0 |
Immediate |
false / true / Unit |
1 |
Symbol |
fn / builtin / namespace / nullary-variant handle |
2 |
Int |
inline signed int or arena big-int |
3 |
String |
inline small string or arena string |
4 |
Some |
inline payload or boxed arena payload |
5 |
None |
singleton |
6 |
Ok |
inline payload or boxed arena payload |
7 |
Err |
inline payload or boxed arena payload |
8 |
List |
empty list or arena list |
9 |
Tuple |
arena tuple |
10 |
Map |
empty map or arena map |
11 |
Record |
arena record |
12 |
Variant |
arena payload variant |
The important convention in v2 is that bit45 is now mostly the “does this value carry an arena reference?” discriminator:
Int: inline int vs arena big-intString: inline small string vs arena stringSome/Ok/Err: inline payload vs boxed arena payloadList/Map: empty singleton vs arena aggregateTuple/Record/Variant: always arena-backed
That makes the representation much more regular than the older wrapper-heavy scheme.
The point of v2 is not only compactness. It is to keep the common Aver shapes cheap:
Bool,Unit, andNoneare pure inline singletonsSome(true),Ok(Unit),Err(None)stay inlineSome(42),Ok(-7),Err(0)stay inline as long as the int fits the wrapper-inline range[]andMap.empty()are real values under their normal collection tags, not exceptions hidden inImmediate- strings up to 5 UTF-8 bytes stay inline under
TAG_STRING - nullary variants such as
Status.TodoorColor.Redtravel asSymbolhandles instead of arena entries
That keeps Result / Option pipelines, empty collections, and short-string-heavy code from manufacturing arena churn just to move tiny values around.
The arena is still where the real aggregate payloads live:
- large
Int - long
String - non-empty
List - non-empty
Map TupleRecord- payload-carrying
Variant - boxed wrapper payloads when
Some/Ok/Errcannot stay inline
This is the main reason the VM can stay small without dragging a bigger object model through every helper call.
The VM no longer uses one “grow forever” arena.
Instead it splits heap-backed values into four runtime spaces:
youngfor short-lived temporaries created while evaluating the current stepyardas a tail-position construction lanehandoffas an ordinary-return construction lanestableas the canonical long-lived space
Each call frame records marks for the local young, yard, and handoff suffixes it owns.
That means the VM knows exactly which heap entries were created “during this frame” and can reclaim them in bulk.
Conceptually:
youngmeans “local scratch work”yardmeans “this value is being built for a tail-call path”handoffmeans “this value is being built for an ordinary return path”stablemeans “this value is safe to keep beyond the current frame boundary”
Implementation-wise, the current VM now splits boundary behavior by control-flow shape:
- values can still be allocated into
yardorhandoffin obvious tail/return positions - at
TAIL_CALL_*boundaries, live roots are kept inyard, so loop-carried state stays out ofstable - at ordinary
RETURNboundaries to another Aver frame, live roots stay on the handoff path instead of being forced intostable - parent-thin wrappers are the exception: they borrow caller
youngand skip ordinary-return handoff entirely unless they spill intoyard/handoff - pure-
handoff, pure-young, and single-result mixed helper returns use fast ordinary-return paths - larger mixed
young + handoffgraphs still fall back to full evacuation, because correctness matters more than over-eager survivor cleverness - only globals, host-facing escapes, and top-level completion are canonicalized into
stable - then the frame-local
young/yard/handoffsuffixes are truncated or compacted as appropriate
This matters because it gives the VM a real survivor lane for TCO-heavy programs and for ordinary helper chains, without forcing every “survives one more call boundary” value through stable.
So the current VM is:
- region-style for local scratch memory
- yard-based for tail-call survivors
- handoff-based for ordinary helper returns, with a conservative fallback for larger mixed graphs
- stable-space based for globals, host-facing escapes, and top-level canonicalization
- explicit about which lanes are used during construction
That already gives us the most important property: frame-local garbage dies in bulk, and long-lived values stop pretending to live in temporary memory.
The easiest way to think about the VM is:
- New local work starts in
young. - In obvious tail-position construction, aggregates may be built in
yard. - In obvious ordinary-return construction, aggregates may be built in
handoff. - On
TAIL_CALL_*, live roots are evacuated intoyard. - On ordinary
RETURNto another Aver frame, live roots stay on the handoff path. - On top-level completion or real escape boundaries, live roots are canonicalized into
stable. - The frame-local
young/yard/handoffsuffixes are then truncated in one shot.
For helper-sized functions there are now two extra fast paths:
- If a frame returns with unchanged local marks, the VM skips boundary promotion/truncation work for that frame and resumes the caller directly.
- If a
parent-thinframe only touched borrowedyoung, it returns directly to the caller without building ordinary-return handoff state at all.
That means the VM still distinguishes:
- local scratch work
- tail-position construction
- caller-facing return construction
- truly long-lived values
The important distinction now is:
yardsurvives the next tail-call boundaryhandoffsurvives the next ordinary call/return boundary- borrowed parent-
youngis the cheapest path of all, but only for very narrow wrapper-like helpers stableis for values that really outlive the current Aver call chain
Typical examples:
tmp = (x, y)inside a function body: lives inyoungList.prepend(n, acc)used as the next argument of a tail-recursive call: can be built inyard, and stays inyardwhen the tail-call boundary is finalizedResult.Ok(value)built just before returning from a helper: can be built inhandoff, and stays inhandoffwhile the caller continues- a helper that built both local temporaries and one final returned aggregate: can still stay on the fast ordinary-return path when that returned aggregate is the only fresh handoff root; larger mixed graphs fall back to full evacuation
- storing a value into globals, returning from top-level, or passing a value across a host boundary:
goes to
stable
The point is not only speed. The point is that the runtime distinguishes “temporary while computing” from “safe to keep after this frame ends”.
The VM still does not need a classical "GC everywhere" story:
young,yard, andhandoffare reclaimed by explicit boundary truncationstableis compacted from live roots at top-level completion or explicit escape boundaries
So there is still tracing and relocation, but not as one global always-on collector. Most memory dies because control flow tells us it can die, and only stable needs long-lived root-driven maintenance.
Lists in the VM are not just flat Vec payloads.
The current arena list storage supports four shapes:
Flatfor compact literal / materialized listsPrependfor cheapList.prependandLIST_CONSConcatfor cheap structural concatenationSegmentsfor concat-tail views produced by repeated destructuring
Repeated List.append does not keep building a one-element-deep concat chain forever. The VM grows the right edge in flat chunks, so append-heavy code stays structural without turning indexed access into a totally degenerate tree walk.
This matters because the VM can now keep list construction aligned with Aver semantics instead of flattening on every prepend.
Pattern matching and destructuring (MATCH_CONS, LIST_HEAD_TAIL) use list helpers that understand these shapes directly. In particular, destructuring a Concat tail no longer rebuilds a fresh concat suffix on every step; it can carry a cheap segment-view instead.
Core list operations also have dedicated bytecode paths:
LIST_LENLIST_GETLIST_APPENDLIST_PREPENDLIST_GET_MATCH
That avoids paying full generic builtin-dispatch overhead for the most common list operations in real Aver programs.
In obvious tail-call positions, the VM can allocate new aggregate values directly into the frame yard instead of forcing an immediate young-to-yard copy on the next TAIL_CALL_*.
In obvious ordinary return positions, the VM can allocate new aggregate values directly into the frame handoff lane, so helper returns can survive into the caller without first pretending to be temporaries or globally-stable values.
The VM now keeps a single interned table of compile-time-known names:
- function names
- builtin/service members
- declared effect names
- type / field / variant names discovered while compiling
Each entry gets a stable symbol_id.
That lets the VM stop carrying string-ish dispatch state through hot paths:
- function values travel as inline
Int(symbol_id) CALL_VALUEresolvessymbol_id -> functionCALL_BUILTINcarriessymbol_idinstead of a builtin name or arena string- builtin effect checks compare interned effect ids instead of runtime strings
This is intentionally simple and Aver-shaped:
- one symbol table
- one inline handle format
- metadata attached to the symbol entry
Not every runtime value is a symbol. User data is still just data. But anything the compiler already knows by name no longer needs string dispatch during execution.
One of the more unusual choices is that the VM separates:
- runtime symbolic values in
NanValueviaTAG_SYMBOL - VM-known callable ids in bytecode and call dispatch via inline
symbol_id
That means the runtime still has one shared symbolic handle class for things like:
FnBuiltinNamespace- nullary variants
while the hottest VM paths can still dispatch directly on interned symbol ids instead of names.
That means:
- a known top-level function can be passed around as a first-class value
CALL_VALUEcan dispatch without a separate closure object model- the current VM does not need upvalues or captured environments
- the same inline handle model also works for other compile-time-known symbols such as builtins and effect names
This is an internal encoding choice, not a surface-language feature. At the language level, functions are still just Aver functions.
The opcode set is deliberately semantic rather than minimal.
Examples:
TAIL_CALL_SELFTAIL_CALL_KNOWNMATCH_UNWRAPMATCH_CONSMATCH_TUPLEEXTRACT_FIELDEXTRACT_TUPLE_ITEMTUPLE_NEWLIST_LENLIST_GETLIST_APPENDLIST_PREPENDLIST_GET_MATCH
These opcodes exist because Aver already has strong opinions:
matchis the only branching constructResultandOptionare explicit and common- recursion and TCO matter more than loop machinery
- records, variants, and tuples are core language shapes
So instead of lowering everything into overly generic bytecode, the VM keeps those concepts visible.
Pattern matching is compiled into a short sequence of checks and destructuring steps.
Typical pieces are:
- tag checks (
MATCH_TAG) - wrapper checks/unwrapping (
MATCH_UNWRAP) - list shape checks (
MATCH_NIL,MATCH_CONS,LIST_HEAD_TAIL) - tuple shape checks (
MATCH_TUPLE,EXTRACT_TUPLE_ITEM) - variant checks (
MATCH_VARIANT) - field extraction (
EXTRACT_FIELD)
This keeps the execute loop simple while preserving the structure of Aver patterns.
The current VM no longer uses arm-local match-region opcodes. In practice they were adding machinery at the wrong granularity for Aver: most functions are tiny, and the bigger wins came from better list/value placement and more semantic bytecode around common patterns such as match List.get(xs, i).
Two recent fixes are worth calling out because they affected real example programs:
- mutual tail calls with a larger target
local_countnow resize the VM stack before clearing new locals, which removed a crash in large verify suites such asexamples/data/json.av - ordered string comparison was corrected, so examples like
examples/data/date.avbehave correctly underverify
Those are not design shifts, but they matter because they closed the last obvious correctness gaps in the VM path.
The VM enforces declared effects at runtime.
That logic does not live in the main execute loop. Instead:
src/vm/execute.rsis the core machinesrc/vm/runtime.rsis the host/runtime bridge
VmRuntime is responsible for:
- builtin dispatch
- effect checking from interned effect ids attached to VM symbols
- record/replay integration
- CLI argument access
This split is intentional: the VM core should mostly be “bytecode mechanics”, while effectful services stay at the boundary.
HttpServer.listen and HttpServer.listenWith are special because they need to call back into Aver from host code.
Today that bridge works by:
- converting callback args into VM values
- resolving the callback's inline
symbol_idback to a VM function - calling that VM function
- converting the result back into host
Value
This boundary is more complex than normal builtin calls and is one of the few places where the VM still has explicit host-runtime plumbing.
Tail calls are not an afterthought.
The compiler emits:
TAIL_CALL_SELFTAIL_CALL_KNOWN
So recursive and mutual-recursive tail calls can reuse frames directly in the VM.
That matches the rest of Aver, where recursion is the normal control-flow mechanism instead of loops.
What is still true today:
- the bytecode format is internal and not stable yet
- function values are modeled around top-level Aver functions, which matches the language today
- builtin calls are primarily compiled as direct builtin operations, not passed around as first-class VM values
- some host-service edges, especially callback-heavy ones like
HttpServer, still need more runtime plumbing than the pure VM core
These are mostly implementation boundaries, not evidence that the VM is “toy” or “partial”. The VM should be thought of as a real runtime path whose internals are still settling.
The VM is small partly because Aver itself is narrow and explicit:
- one branching construct
- explicit effects
- no exceptions
- no hidden mutation model
- no closure-heavy execution model
That lets the VM stay simple in the good sense:
- fewer opcodes than a generic language VM
- more semantic opcodes than a minimal stack toy
- a direct correspondence between surface-language constructs and runtime behavior
That is the design goal: not “generic bytecode purity”, but a runtime that matches how Aver already wants programs to look.