Skip to content

Commit 68d726a

Browse files
committed
Merge #152: Human-readable encoding 1: serialization and README
0150e7c simpcli: No error on help (Christian Lewe) fb6e924 ci: add basic simpcli test (Andrew Poelstra) 4de7e10 types: add backticks to error serialization (Andrew Poelstra) 3ac34c6 simplang: introduce crate with one command ("disassemble") (Andrew Poelstra) 5cb2eb8 human_encoding: add README describing the encoding (Andrew Poelstra) 73e071f human_encoding: add serialization support (Andrew Poelstra) 9cd1c84 human_encoding: add module, new `NamedCommitNode` node type (Andrew Poelstra) 9adc564 node: fix typo in `Node::from_parts` (Andrew Poelstra) Pull request description: This PR introduces the human-readable serialization of Simplicity programs. Right now it will serialize disconnect nodes as having only one child (and the next PR will parse such programs). Later, when we introduce typed holes, we will correct this so that disconnect nodes always have two children (the rightmost one being a named hole). But for now disconnect is more-or-less unusable. To avoid too much complexity at once I think this is a reasonable order of operations. It also slightly changes the syntax from the last PR to allow `'` in symbol names. This is so I can do stuff like `x' := pair x unit` or whatever, where I use the prime symbol to indicate a slightly tweaked version of a different expression. ACKs for top commit: uncomputable: ACK 0150e7c Tree-SHA512: ca2abb2c9da2167c6568bc6de01694c7e7a3f13f1e0e38e069ff41baee682b94c58c837ca718c7276c2fd85f55e2c24ac2c4b318f7baaf4aca4b2147b590a074
2 parents cf5f1e9 + 0150e7c commit 68d726a

File tree

11 files changed

+742
-4
lines changed

11 files changed

+742
-4
lines changed

.github/workflows/main.yml

+21
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,27 @@ jobs:
2626
command: fmt
2727
args: --all -- --check
2828

29+
simpcli_test:
30+
name: SimpCLI Tests
31+
runs-on: ubuntu-latest
32+
strategy:
33+
matrix:
34+
rust:
35+
- stable
36+
steps:
37+
- name: Checkout Crate
38+
uses: actions/checkout@v2
39+
- name: Checkout Toolchain
40+
uses: actions-rs/toolchain@v1
41+
with:
42+
profile: minimal
43+
toolchain: ${{ matrix.rust }}
44+
override: true
45+
- name: Running cargo test
46+
run: |
47+
cd simpcli
48+
cargo test
49+
2950
bench_test:
3051
name: Jets-Bench Tests
3152
runs-on: ubuntu-latest

Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ actual-serde = { package = "serde", version = "1.0.103", features = ["derive"],
2828
simplicity-sys = { version = "0.1.0", path = "./simplicity-sys", features = ["test-utils"] }
2929

3030
[workspace]
31-
members = ["simplicity-sys"]
31+
members = ["simpcli", "simplicity-sys"]
3232
# Should be manually/separately tested since it has a massive dep tree
3333
# and not follow MSRV
3434
# FIXME we also need to include 'fuzz' in here because it currently uses

simpcli/Cargo.toml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
[package]
2+
name = "simpcli"
3+
version = "0.1.0"
4+
edition = "2018"
5+
6+
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
7+
8+
[dependencies]
9+
base64 = "0.21"
10+
# todo add lexopt for command line parsing
11+
simplicity = { version = "0.1", path = "..", features = [ "serde", "elements" ] }
12+
13+
[[bin]]
14+
name = "simpcli"
15+
path = "src/main.rs"
16+

simpcli/src/main.rs

+120
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
// Simplicity "Human-Readable" Language
2+
// Written in 2023 by
3+
// Andrew Poelstra <[email protected]>
4+
//
5+
// To the extent possible under law, the author(s) have dedicated all
6+
// copyright and related and neighboring rights to this software to
7+
// the public domain worldwide. This software is distributed without
8+
// any warranty.
9+
//
10+
// You should have received a copy of the CC0 Public Domain Dedication
11+
// along with this software.
12+
// If not, see <http://creativecommons.org/publicdomain/zero/1.0/>.
13+
//
14+
15+
use simplicity::human_encoding::Forest;
16+
use simplicity::node::CommitNode;
17+
use simplicity::{self, BitIter};
18+
19+
use base64::engine::general_purpose::STANDARD;
20+
use std::env;
21+
use std::str::FromStr;
22+
23+
/// What set of jets to use in the program.
24+
// FIXME this should probably be configurable.
25+
type DefaultJet = simplicity::jet::Elements;
26+
27+
fn usage(process_name: &str) {
28+
eprintln!("Usage:");
29+
eprintln!(" {} disassemble <base64>", process_name);
30+
eprintln!();
31+
eprintln!("For commands which take an optional expression, the default value is \"main\".");
32+
eprintln!();
33+
eprintln!("Run `{} help` to display this message.", process_name);
34+
}
35+
36+
fn invalid_usage(process_name: &str) -> Result<(), String> {
37+
usage(process_name);
38+
Err("invalid usage".into())
39+
}
40+
41+
enum Command {
42+
Disassemble,
43+
Help,
44+
}
45+
46+
impl FromStr for Command {
47+
type Err = String;
48+
fn from_str(s: &str) -> Result<Self, String> {
49+
match s {
50+
"disassemble" => Ok(Command::Disassemble),
51+
"help" => Ok(Command::Help),
52+
x => Err(format!("unknown command {}", x)),
53+
}
54+
}
55+
}
56+
57+
impl Command {
58+
fn takes_optional_exprname(&self) -> bool {
59+
match *self {
60+
Command::Disassemble => false,
61+
Command::Help => false,
62+
}
63+
}
64+
}
65+
66+
fn main() -> Result<(), String> {
67+
let mut args = env::args();
68+
let process_name = args.next().unwrap();
69+
let process_name = match process_name.rfind('/') {
70+
Some(idx) => &process_name[idx + 1..],
71+
None => &process_name[..],
72+
};
73+
74+
// Parse command-line args into (command, first_arg, expression)
75+
let command = match args.next() {
76+
Some(cmd) => match Command::from_str(&cmd) {
77+
Ok(cmd) => cmd,
78+
Err(e) => {
79+
eprintln!("Error: {}.", e);
80+
eprintln!();
81+
return invalid_usage(&process_name);
82+
}
83+
},
84+
None => return invalid_usage(&process_name),
85+
};
86+
87+
if let Command::Help = command {
88+
usage(&process_name);
89+
return Ok(());
90+
}
91+
92+
let first_arg = match args.next() {
93+
Some(s) => s,
94+
None => return invalid_usage(&process_name),
95+
};
96+
let _expression = if command.takes_optional_exprname() {
97+
args.next().unwrap_or("main".to_owned())
98+
} else {
99+
String::new()
100+
};
101+
if args.next().is_some() {
102+
invalid_usage(&process_name)?;
103+
}
104+
105+
// Execute command
106+
match command {
107+
Command::Disassemble => {
108+
let v = base64::Engine::decode(&STANDARD, first_arg.as_bytes())
109+
.map_err(|e| format!("failed to parse base64: {}", e))?;
110+
let mut iter = BitIter::from(v.into_iter());
111+
let commit = CommitNode::decode(&mut iter)
112+
.map_err(|e| format!("failed to decode program: {}", e))?;
113+
let prog = Forest::<DefaultJet>::from_program(commit);
114+
println!("{}", prog.string_serialize());
115+
}
116+
Command::Help => unreachable!(),
117+
}
118+
119+
Ok(())
120+
}

src/human_encoding/README.md

+211
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Simplicity Human-Readable Encoding
2+
3+
This module defines a human-readable encoding for Simplicity programs. This encoding
4+
is intended to be the encoding used for storage and interchange of "commitment-time"
5+
Simplicity programs, i.e. programs which are unpruned and have no witnesses.
6+
7+
The following parts of the encoding are incomplete/undesigned:
8+
9+
1. It does not support witness data; in future we would like to support partial/full
10+
witness population, as well as population of disconnected expressions.
11+
2. `2^n` when it appears in types is a single lexer token and you cannot put spaces
12+
into it. I'm not sure if I want to fix this or not.
13+
14+
With that said, the rest of the document defines the encoding.
15+
16+
## Syntax
17+
18+
The syntax is defined in `src/human_encoding/parse/ast.rs`. It currently uses the
19+
`santiago` parser generator, but we would like to move away from this, probably to
20+
an ad-hoc parser, to avoid poor asymptotic behavior and to get better error messages.
21+
22+
Comments are started by `--` and end at the next newline. This is the only aspect
23+
in which whitespace is significant.
24+
25+
Simplicity expressions are composed as a series of **definitions** of the form:
26+
27+
NAME := EXPRESSION
28+
29+
and **type bounds** of the form
30+
31+
NAME : TYPE -> TYPE
32+
33+
where these may be combined into the singular form `NAME := EXPRESSION : TYPE -> TYPE`.
34+
Whitespace is not significant. Each definition or type bound is self-delimiting, so
35+
there are no semicolons or other separators, but by convention each one should be
36+
separated by at least one newline.
37+
38+
Here NAME is
39+
40+
* Any sequence matching the regex `[a-zA-Z_\-.'][0-9a-zA-Z_\-.']*`; i.e. combination
41+
of alphanumerics, `-`, `_`, `'`, and `.` that does not start with a numeral;
42+
* EXCEPT for the following reserved symbols, which may not be used:
43+
* `_`;
44+
* `assertl`, `assertr`, `case`, `comp`, `const`, `disconnect`, `drop`, `fail`, `iden`, `injl`, `injr`, `pair`, `take`, `unit`, `witness`;
45+
* anything beginning with `prim`; and
46+
* anything beginning with `jet_`.
47+
48+
and EXPRESSION is
49+
50+
* a NAME;
51+
* a HOLE (defined below);
52+
* `unit`, `iden`, or `witness`;
53+
* `injl`, `injr`, `take`, or `drop` followed by another EXPRESSION;
54+
* `case`, `comp`, or `pair` followed by two EXPRESSIONs;
55+
* `assertl` followed by an EXPRESSION, a literal `#`, and another EXPRESSION;
56+
* `assertr` followed by a literal `#` and two EXPRESSIONs;
57+
* a jet, which begins with `jet_` and must belong to the list of jets (FIXME define this list);
58+
* `const` followed by a VALUE (defined below);
59+
* `fail` followed by an ENTROPY (defined below); or
60+
* `(` followed by another EXPRESSION followed by `)`.
61+
62+
Note that while we allow parenthesis to help group parts of expressions for human
63+
understanding, they are never needed for disambiguation and are essentially
64+
ignored by the parser.
65+
66+
A HOLE is the literal `?` followed by a NAME. It indicates an expression that has
67+
yet to be defined. Holes have a different namespace than other names.
68+
69+
A VALUE is one of
70+
* the literal `_`, which is interpreted as the empty value;
71+
* a binary literal `0b[01]+` which is interpreted as a sequence of bits; or
72+
* a hex literal `0x[0-9a-f]+` which is interpreted as a sequence of 4-bit nybbles
73+
74+
An ENTROPY is a VALUE whose size is between 128 and 512 bits inclusive. Internally
75+
it is 0-padded out to 512 bits.
76+
77+
Finally, TYPE is
78+
79+
* a literal `_`, indicating no type bound;
80+
* a NAME;
81+
* a literal `1`, indicating the unit type;
82+
* a literal `2`, indicating the bit type;
83+
* `2^n`, where `n` is any power of two, in decimal with no spaces or punctuation;
84+
* `(` followed by another TYPE followed by `)`;
85+
* another TYPE, followed by `+`, followed by another TYPE; or
86+
* another TYPE, followed by `*`, followed by another TYPE.
87+
88+
Here `*` has higher precedence than `+`, and both `+` and `*` are left-associative.
89+
90+
## Namespaces
91+
92+
There are three independent namespaces: one for NAMEs, one for HOLEs, and one for
93+
TYPEs. They all have the same rules for valid symbols, except that `_` is reserved
94+
(may not be used) for NAMEs and `_` has a special meaning (no type bound) for
95+
TYPEs.
96+
97+
## Semantics: Definitions
98+
99+
As above, Simplicity expressions are a series of **definitions** of the form
100+
101+
NAME := EXPRESSION
102+
103+
or
104+
105+
NAME := EXPRESSION : TYPE -> TYPE
106+
107+
and **type bounds** of the form
108+
109+
NAME : TYPE -> TYPE
110+
111+
We refer to the `NAME` part as the **name**, `EXPRESSION` as the **expression**,
112+
and `TYPE -> TYPE` as the **type ascription**. If such a named expression appears anywhere
113+
in a Simplicity encoding, then whenever that name appears in the expression of
114+
any other named expression, its expression is substituted in place of it.
115+
116+
For a given name, it is permissible to have multiple type bounds, but any name
117+
which appears must have exactly one definition. That is, it is not permitted to
118+
have a type bound for a name which isn't defined elsewhere, and it is not permitted
119+
to have multiple definitions for the same name.
120+
121+
This allows the user to provide any number of type bounds for a given name, each of
122+
which may be helpful in clarifying a program.
123+
124+
For example, in the encoding
125+
126+
node := unit : _ -> 1
127+
main := comp node node
128+
129+
the name `node` is substituted by `unit` both places that it appears. We can see
130+
that starting from the name `main`, by recursive substitution, we obtain a single
131+
Simplicity expression. The type checker will ensure that the target type of `node`
132+
is 1.
133+
134+
In general, we do not need to obtain a single expression. It is permissible to
135+
encode a "DAG forest".
136+
137+
The name `main` is special for several reasons:
138+
* An expression named `name` implicitly has the type ascription `1 -> 1`. That is, it must always be a program.
139+
* To generate a commitment-time program from an expression, it must be called `main`.
140+
* Type ascriptions for `main` and its children are enforced **after** type inference has completed, so they act as sanity checks but cannot change the output of type inference. For other expressions, type ascriptions are enforced **before** and may guide inference.
141+
142+
No cycles are allowed; if a name occurs anywhere in the expansion of its expression,
143+
this is an error.
144+
145+
## Semantics: Expressions
146+
147+
Expressions may be
148+
149+
* a NAME, which simply refers to another expression;
150+
* a HOLE, which is described in the next section;
151+
* one of the core combinators `unit`, `iden`, `comp`, `injl`, `injr`, `case`, `take`, `drop`, `pair`, followed by subexpression(s) as needed;
152+
* the `disconnect` combinator followed by an expression and a hole;
153+
* the `witness` combinator which currently allows no subexpressions;
154+
* the assertions, `assertl` or `assertr`, which take two subexpressions, one of which will be hidden in the decoded program. The hidden subexpression should be prefixed by `#` which indicates to the parser to take the CMR of that expression, not the expression itself.
155+
* `fail` followed by a 128-to-512-bit entropy value, which should occur only in the pruned branch of an assertion, though this is not enforced;
156+
* `const` followed by a value, which is a "constant-word jet" and is equivalent to constructing the given value by a tree of `pair`s whose leaves are `injl unit` (0) or `injr unit` (1);
157+
158+
Expressions have an associated **type arrow**, which is inferred by the type checker as
159+
expressions are built up. If a combinator's children's types are incompatible for that
160+
combinator, an error is raised.
161+
162+
After the type arrow for a named expression is fully inferred, any type ascriptions for
163+
that name are applied, and an error is raised if this fails. In this way, a user can
164+
provide type ascriptions which act as sanity checks and documentation for sub-parts of
165+
a Simplicity expression.
166+
167+
## Semantics: Holes
168+
169+
Holes are of the form `?NAME`; there may be whitespace after the `?` but by convention it is
170+
omitted. Holes must have unique names, but live in a separate namespace from ordinary names,
171+
so they cannot conflict with the names of expressions.
172+
173+
Holes have two meanings:
174+
* When they occur as the right child of a `disconnect` combinator, they give a name to a disconnected expression. `disconnect` combinators are **required** to have holes for right children. Any other expression form is an error.
175+
* In all other contexts, they indicate an incomplete part of the program which can be typechecked but not much else.
176+
177+
In all cases, holes are typechecked before any errors are reported, and the assembler will
178+
report their types. This allows the use of holes as placeholders during development, where
179+
users can learn the required type of the eventual expression by querying the typechecker.
180+
181+
When assembling or computing CMRs of incomplete programs (any program with a hole outside
182+
of the right child of a `disconnect` node), errors will be reported for every hole.
183+
error messages will include the holes' type arrows.
184+
185+
## Semantics: Type Ascriptions
186+
187+
Type ascriptions are of the form `TYPE -> TYPE`. We refer to the first type as the **source
188+
type** and the second as the **target type**. Here each TYPE is one of
189+
190+
* the literal `_`, which indicates that no additional checks should be done on the appropriate type;
191+
* the literals `1`, `2` or `2^n` indicate that the appropriate type must be the unit, bit, or n-bit word type;
192+
* an arbitrary NAME, which simply gives a name to the appropriate type. (If the same name is used in multiple places, the type-checker will check that the same type appears in each place.);
193+
* any pair of TYPEs separated by `+` or `*`, which indicate a sum or product bound respectively; or
194+
* any TYPE surrounded by `(` or `)`.
195+
196+
Note that the NAMEs used for types are in a separate namespace from the NAMEs used in
197+
named expression, and from HOLEs. These three uses of names do not interact.
198+
199+
Note also that unlike the case for EXPRESSIONs, parentheses may be meaningful. Absent parentheses,
200+
`*` has higher precedence than `+` and both operators are left-associative.
201+
202+
The interpretation of type ascriptions depends on whether they appear within the `main` expression.
203+
If so, type inference is completed and all free types set to unit, and **then** all type ascriptions
204+
are applied. In this case, the type ascriptions cannot change the final types, but if they are
205+
incompatible with the final types, an error is raised.
206+
207+
Outside of `main`, type ascriptions are applied in parallel with type inference, and before free
208+
types are set to unit. This means that type ascription can change types that would otherwise be
209+
free, and the type of the resulting expression will not necessarily match its type if the
210+
expression were to be pulled into `main`.
211+

0 commit comments

Comments
 (0)