Skip to content

Add Wireshark Lua dissector backend#264

Open
AaronWebster wants to merge 2 commits into
masterfrom
emboss-wireshark-lua-dissector
Open

Add Wireshark Lua dissector backend#264
AaronWebster wants to merge 2 commits into
masterfrom
emboss-wireshark-lua-dissector

Conversation

@AaronWebster
Copy link
Copy Markdown
Collaborator

@AaronWebster AaronWebster commented May 21, 2026

Summary

Adds a parallel back end at compiler/back_end/lua/ that turns an Emboss
.emb definition into a runnable Wireshark Lua dissector. Mirrors the
C++ backend's shape (driver, starlark rule, golden tests) so the new
backend is invoked exactly like its C++ sibling:

emboss_lua_library(
    name = "myproto_lua",
    srcs = ["myproto.emb"],
)

How layered protocols work

Wireshark already dissects Ethernet → IP → UDP/TCP using its built-in
dissectors. The user only needs to define their payload in .emb;
declaring [(wireshark) register_on: "..."] plugs the generated
dissector into the correct Wireshark dissector table at load time.

[(wireshark) protocol: "myproto"]
[(wireshark) root: "Packet"]
[(wireshark) register_on: "udp.port == 12345 or tcp.port == 12345"]

The register_on value uses Wireshark-display-filter syntax — one or
more <table> == <integer> terms joined by or / ||, with decimal or
0x-hex patterns. Each term becomes a
DissectorTable.get("<table>"):add(<pattern>, <proto>) call.

Generator features

  • One Proto per .emb, one local function per struct/bits, one
    value-strings table per enum.
  • Nested structs dispatched via forward-declared locals so any
    reference order works.
  • Bit-addressable (bits) blocks emitted as masked ProtoFields
    against a single container read.
  • Conditional (if) fields are wrapped in if <condition> then … end, with the Emboss condition translated to Lua.
  • Variable-length arrays (T[n] where n is a sibling field) and
    dynamically-located fields, via an Emboss-expression → Lua
    translator. Sibling values referenced by a condition, array length, or
    offset are captured into local val_* reads ahead of use.
  • -- doc comments become each ProtoField's description; #
    comments are ignored.
  • Endianness respected (subtree:add vs subtree:add_le).
  • Module / struct / field level [(wireshark) filter: "..."]
    override the auto-generated filter-name segment.

Explicit non-goals (for this initial cut)

The generator emits valid Lua even when it can't fully describe a
field — it emits a -- skipped … comment and moves on. Future work
can extend coverage:

  • Parameterized struct uses.
  • let / virtual fields.
  • Conditional or dynamically-sized fields inside bits blocks
    (byte-level structs are supported).
  • Expression operators outside arithmetic / comparison / logical /
    $max (e.g. ?:, $present).

Tests

  • compiler/back_end/lua/dissector_generator_test.py — 44 unit tests
    covering sanitization, integer-width mapping, register_on parsing,
    enum tables, filter composition, doc extraction, attribute validation,
    root-struct selection, nested-struct dispatch, and now the
    expression translator, conditional fields, value capture, and
    variable-length arrays.
  • lua_golden_test targets in compiler/back_end/lua/BUILD for
    enum.emb, nested_structure.emb, uint_sizes.emb, int_sizes.emb,
    the testdata/wireshark.emb smoke fixture, and the new
    testdata/wireshark_dynamic.emb fixture (conditional field +
    length-prefixed variable array).
  • compiler/back_end/lua/tshark_smoke_test.py — loads a generated
    dissector into a real TShark and asserts the decoded tree for both
    branches of a conditional, length-prefixed message. Auto-skips when
    tshark / text2pcap aren't installed (e.g. in CI).
  • scripts/regenerate_goldens.py also refreshes the Lua goldens.

Test plan

  • bazel test //compiler/back_end/lua:dissector_generator_test
  • bazel test //compiler/back_end/lua/... (golden + smoke tests)
  • bazel build //testdata:wireshark_lua_emboss //testdata:wireshark_dynamic_lua_emboss
  • Generated dissector loads in TShark and correctly decodes
    synthetic packets — automated in tshark_smoke_test.py, and
    verified by hand for testdata/wireshark.emb and the dynamic
    fixture (enum value-strings, big-endian decode, nested structs,
    the conditional error_code, and the variable-length payload).

Adds a parallel back end at compiler/back_end/lua/ that turns an Emboss
.emb into a runnable Wireshark Lua dissector. Mirrors the C++ backend's
shape: a py_binary driver, a starlark rule (lua_emboss_library) exposed
from the root build_defs.bzl, a (wireshark)-qualified attribute set, and
golden tests parallel to cpp_golden_test.

Generator highlights:

* One Proto per .emb, one local function per struct/bits, one value
  strings table per enum.
* Nested structs dissected via forward-declared dispatch.
* Bit-addressable (`bits`) blocks emitted as masked ProtoFields against
  a single container read.
* `--` doc comments become the ProtoField description; `#` hash
  comments are ignored.
* Endianness honored via `subtree:add` vs `subtree:add_le`.

Module-level attributes:

* `[(wireshark) protocol: "name"]`     name of the generated Proto
* `[(wireshark) root: "Struct"]`       which struct dispatches the top
* `[(wireshark) register_on: "..."]`   Wireshark-display-filter-style
                                       string of `<table> == <pattern>`
                                       terms separated by `or` / `||`.
                                       Each term becomes a
                                       DissectorTable.get(...):add(...)
                                       call so Wireshark routes packets
                                       from Ethernet/IP/UDP/TCP layers
                                       into the generated dissector.

Struct- and field-level:

* `[(wireshark) filter: "name"]`       overrides the auto-generated
                                       Wireshark filter-name segment.

Plumbing:

* New `emboss_lua_library` macro + `lua_emboss_library` rule + aspect
  in the root build_defs.bzl, modelled on cc_emboss_library.
* `embossc --generate lua` (in addition to the existing `cc`).
* scripts/regenerate_goldens.py also refreshes the Lua goldens.

Tests:

* compiler/back_end/lua/dissector_generator_test.py — 27 unit tests
  covering identifier sanitization, integer-width mapping, register_on
  parsing, enum value-strings emission, filter composition, doc-text
  extraction, attribute validation, root-struct selection, and nested
  struct dispatch.
* lua_golden_test targets in compiler/back_end/lua/BUILD covering
  enum, nested_structure, uint_sizes, int_sizes, and the new
  wireshark.emb fixture.
@AaronWebster AaronWebster force-pushed the emboss-wireshark-lua-dissector branch from 514553c to c962481 Compare June 3, 2026 22:33
…k Lua backend

Add an Emboss-expression -> Lua translator so the dissector backend can
handle constructs it previously skipped:

* Conditional (`if`) fields are emitted as `if <cond> then ... end`.
* Variable-length arrays (`T[n]`, with `n` a sibling field) and
  dynamically-located fields are emitted with the length/offset
  expression translated to Lua.

Sibling field values referenced by a condition, array length, or offset are
captured into `local val_*` reads. Fields whose governing expression can't be
translated are still skipped with a comment, so the generator always emits
valid Lua. Constant-only structs produce byte-identical output to before.

Add testdata/wireshark_dynamic.emb (and its golden), expression/conditional/
array unit tests, and a TShark smoke test that loads a generated dissector
and checks both branches of a conditional, length-prefixed message (the test
skips itself when tshark is not installed).
@AaronWebster AaronWebster marked this pull request as ready for review June 4, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant