PyRDP format #959

obilodeau · 2024-06-04T19:00:44Z

As discussed in GoSecure/fq-pyrdp#1.

The new test case is 2.9MB. Let me know if this is still too big.

For project history see: https://github.com/GoSecure/fq-pyrdp/

This PR is done as part of GeekWeek 9.

wader · 2024-06-04T20:18:49Z

Thanks! did a quick review, will look more tomorrow.

The new test case is 2.9MB. Let me know if this is still too big.

Is fine i think. Just noticed that the wasm testdata i 42MB 😬

For project history see: https://github.com/GoSecure/fq-pyrdp/

Could be added to documentation if you like

This PR is done as part of GeekWeek 9.

🥳

format/pyrdp/pdu/client_data.go

format/pyrdp/pyrdp.go

format/pyrdp/pdu/client_data.go

format/pyrdp/pyrdp.go

format/pyrdp/pdu/client_data.go

wader · 2024-06-08T19:08:53Z

Hey, are ok with me pushing to your PR branch? i'm on vacation atm but when i'm back i might try massage this PR to look more like the rest of fq:s code

obilodeau · 2024-06-13T17:48:20Z

Let me do another pass today and tomorrow then it'll be your turn.

obilodeau · 2024-06-13T20:07:58Z

Made all required changes and fixed all CI linter errors. From my perspective, this is complete and useful.

Let me know if you want other changes.

wader · 2024-06-14T14:00:46Z

Made all required changes and fixed all CI linter errors. From my perspective, this is complete and useful.

Let me know if you want other changes.

Great thanks! will have look during the weekend or week. Will request you to review if i done any substantial changes, but i guess i will mostly just do some styling changes for consistency.

BTW i have been working on a prototype to allow runtime definition of decoders using kaitai. Would something like that have been useful in this case? maybe some other format specification language could be interesting? i've dumped some ideas in #627 if your intersted.

wader · 2024-06-14T21:59:04Z

Rebased on master, fix some code style stuff, less name stutter and made some flag gaps into unused fields

format/pyrdp/pyrdp.go

For project history see: https://github.com/GoSecure/fq-pyrdp/ Co-authored-by: Lisandro Ubiedo <[email protected]>

One fieldname change clientDir to client_dir to align with everything else

Same as other formats that use timestamps and makes it easier to use them in query, number instead of a formatted string. Can still use todescription/0 to get formatted timestamp.

wader · 2024-06-16T20:45:52Z

format/pyrdp/pdu/clipboard_data.go

+}
+
+const (
+	// Message flags.


Could multiple bit be set? cbFlagsMap will atm only map if one bit is set

I believe multiple bits could be set, yes.

here's the spec: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-rdpeclip/9e97ce45-0597-43dd-b116-9f62a5b34d54?redirectedfrom=MSDN

here's the pyrdp parser: https://github.com/GoSecure/pyrdp/blob/63d1b6e0f93a8e1db45b8818c9b0ec9c1374e9cc/pyrdp/parser/rdp/virtual_channel/clipboard.py#L32

It's vague and I understand the original mistake since two out of three messages are OK or FAIL which don't seem compatible.

I'll fix it. I think I have a good idea for an approach.

So my initial approach to achieve this was to replicate what you did in 69ec44a. Naively I wrote this:

func parseClipboardData(d *decode.D, length int64) { d.FieldStruct("clipboard_data", func(d *decode.D) { msgType := uint16(d.FieldU16("msg_type", cbTypesMap)) d.FieldStruct("msg_flags", func(d *decode.D) { d.FieldBool("cb_response_ok") d.FieldBool("cb_response_fail") d.FieldBool("cb_ascii_names") }) dataLength := d.FieldU32("data_len")

This is exactly how the protocol is documented and the UintMapSymStr we had before.

Enhancing the protocols' parsers would be straightforward. However, I realized that FieldStruct doesn't respect byte order and when I was wondering why, I realized well it can't since it doesn't know the "type" we are about to read...

So to work the parser now looks like this:

func parseClipboardData(d *decode.D, length int64) { d.FieldStruct("clipboard_data", func(d *decode.D) { msgType := uint16(d.FieldU16("msg_type", cbTypesMap)) d.FieldStruct("msg_flags", func(d *decode.D) { d.FieldRawLen("unused0", 5) d.FieldBool("cb_ascii_names") d.FieldBool("cb_response_fail") d.FieldBool("cb_response_ok") d.FieldRawLen("unused1", 8) }) dataLength := d.FieldU32("data_len")

In addition to being harder to write, it also has a UX impact with the unnecessary unused fields being exposed to the user:

This is not how the protocol is described on the wire.

Now, trying to improve this, is there a way we could make FieldStruct or a variant of it byte order aware?

func parseClipboardData(d *decode.D, length int64) { d.FieldStruct("clipboard_data", func(d *decode.D) { msgType := uint16(d.FieldU16("msg_type", cbTypesMap)) d.FieldU16Bitfield("msg_flags", func(d *decode.D) { d.FieldBool("cb_response_ok") d.FieldBool("cb_response_fail") d.FieldBool("cb_ascii_names") }) dataLength := d.FieldU32("data_len")

Using FieldU16Bitfield would under the hood parse the value as a uint16 in little-endian byte order and automatically jump after the 13 unused bits.

If this is too complicated to implement, I think the previous way (const list + UintMapSymStr) was better then. Both easier to write and easier to analyze as an analyst (no unnecessary unused fields). However, the bitfield instead of a map problem remains.

Otherwise, is there something else in the decode API I don't see or understand that would be better?

So to work the parser now looks like this:

func parseClipboardData(d *decode.D, length int64) { d.FieldStruct("clipboard_data", func(d *decode.D) { msgType := uint16(d.FieldU16("msg_type", cbTypesMap)) d.FieldStruct("msg_flags", func(d *decode.D) { d.FieldRawLen("unused0", 5) d.FieldBool("cb_ascii_names") d.FieldBool("cb_response_fail") d.FieldBool("cb_response_ok") d.FieldRawLen("unused1", 8) }) dataLength := d.FieldU32("data_len")

In addition to being harder to write, it also has a UX impact with the unnecessary unused fields being exposed to the user:

Yeap this gets a bit messy at times with how fq was designed to work, that is to not hide anything and also give "access" to all bits somehow even unused and unknown stuff.

So in general i've usually opted to let format decoders decode things as detailed as possible to have as few assumption what a end user would like to query and look at, e.g. maybe someone whats to extract or query the unused bits?

This is not how the protocol is described on the wire.

Now, trying to improve this, is there a way we could make FieldStruct or a variant of it byte order aware?

func parseClipboardData(d *decode.D, length int64) { d.FieldStruct("clipboard_data", func(d *decode.D) { msgType := uint16(d.FieldU16("msg_type", cbTypesMap)) d.FieldU16Bitfield("msg_flags", func(d *decode.D) { d.FieldBool("cb_response_ok") d.FieldBool("cb_response_fail") d.FieldBool("cb_ascii_names") }) dataLength := d.FieldU32("data_len")

Using FieldU16Bitfield would under the hood parse the value as a uint16 in little-endian byte order and automatically jump after the 13 unused bits.

If this is too complicated to implement, I think the previous way (const list + UintMapSymStr) was better then. Both easier to write and easier to analyze as an analyst (no unnecessary unused fields). However, the bitfield instead of a map problem remains.

Otherwise, is there something else in the decode API I don't see or understand that would be better?

All decoding in fq currently is on purpose on bit level made to "hide" byte boundaries, this is to allow decoding bitstream formats, common for media codecs etc, without much fuzz. And as you have noticed this put some burden on some format decoders to byte align and maybe have to deal with padding, unknown or unused bits that a byte-oriented decoder would not care about, fq even automatically adds "gap" fields for bit ranges a decode skips or for some reason can't know about (trailing bits in a format with explicit ending), this is so that no bits are hidden/unreachable.

So in general if there is a clear bit field it's probably best to decode it they way you did above: a struct with bool/raw fields for each bit even unused or unknown ones. Will be a bit verbose but now all of them are accessible via a jq query. Btw if there bit flag combinations etc that have some special meaning a decode can add "synthetic" fields of any type that will not be "backed" by a bit range but will be visible and queryable (ex samples count in the mp3 decoder https://github.com/wader/fq/blob/master/format/mpeg/mp3_frame.go#L195)

But yeap all this becomes a bit messy and uninvited when decoding little endian bit fields that are > 1 byte :( currently there is no helpers for dealing with it but i have tried to come up with some way of specifying things in the same order as in a spec etc and the let fq figure things out. Actually in the kaitai prototype there is some support for what kaitai calls "bit-endian" that does more or less this. Sadly things get even messier when decoding some bit range inside a little endian integer as now a range a field can "span" two or more bytes and be non-continuous in the bitstream :(. Not sure how to visualise that in the "dump" tree, atm i think i would just opt to let the fiend be the range of the first and the last bit backing the field.

Sorry for wall of text! and hope i managed to explain myself properly and all this is a bit of an unsolved issue with fq so i'm glad someone has input on it.

You have made this definitely clearer. Having access to bit-sequence unused at the time the parser was written but that could be used later is a strong argument too. User-friendly before plugin programmer-friendly.

Flags bits are in LE byte order Removed unsued info consts

wader · 2024-06-16T21:24:31Z

format/pyrdp/pdu/client_info.go

+			unicodeN  uint64 = 0
+		)
+		codePage := d.FieldU32("code_page")
+		d.FieldStruct("flags", func(d *decode.D) {


Could you verify the flags look more correct now in test.fqtest? now unicode flags is true at least and i hope i got the other ones correct

Right now I'm not focused on the correctness of the flags but on how a protocol (or filetype) analyst looks at them.

Having a different ordering makes it confusing. Ideally it should be aligned. Check my previous comment on different approaches that could be taken. Please advise, I'm willing to help.

Microsoft documentation about the client_info flags is here: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-rdpbcgr/732394f5-e2b5-4ac5-8a0a-35345386b0d1

The comment above i hope explains why this looks a bit weird in fq: fields are in "bitstream"-order which might not be how spec define them and things gets even more weird with multi-byte little endian bitfields. I wonder how fq would visualise things if it would allow a format decoder to "force" some specific field order for structs? maybe something to indicate that things are now not in bitstream order?

Yes, I understand more the context now. I'm wondering how Wireshark does it?...

So Wireshark does it like this:

The boolean flags in the field are all individually defined, each by a struct with a mask value provided. The container of the flags is also defined (foo.flags and foo.flags.flag1 both exist) and in its structure, the endianness is provided. These are static declarations.

Then the dissector refers to these declarations in the order that they should be presented. It's quite verbose and there's a lot of duplication of effort but at the same time more control.

Ref: https://www.wireshark.org/docs/wsdg_html_chunked/ChDissectAdd.html and https://gitlab.com/wireshark/wireshark/-/blob/master/epan/dissectors/packet-rdp.c

That approach is not interesting but it gave me an idea. I feel like making the FieldStuct() function endian-aware could be an elegant compromise. To avoid breaking existing code, it could be a new function called FieldStructEndianAware() or something. If so then the fields would be properly sorted in the output.

Write this:

d.FieldStructEndianAware("msg_flags", func(d *decode.D) { d.FieldBool("cb_response_ok") d.FieldBool("cb_response_fail") d.FieldBool("cb_ascii_names") d.FieldRawLen("unused0", 5) d.FieldRawLen("unused1", 8) })

with FieldStructEndianAware() relying on the global d.Endian value.

Instead of this:

d.FieldStruct("msg_flags", func(d *decode.D) { d.FieldRawLen("unused0", 5) d.FieldBool("cb_ascii_names") d.FieldBool("cb_response_fail") d.FieldBool("cb_response_ok") d.FieldRawLen("unused1", 8) })

It would solve many problems:

unused fields are still accessible

programmer-friendly: fields are ordered like in the documentation

user-friendly: fq's output would match protocol documentation order

Is that something that would be easily feasible?

I tried to implement my proposal today. I looked at the decode.FieldStruct code and realized that everything is delegated to fieldDecoder... Maybe some state should be stored in the Compound struct? An additional member used to track that this struct requires byte endianness awareness? Then the AddChild could do things differently but I don't know how to keep it elegant.

It's too deep of a change for me. I give up. I'll verify the flags and use the current decoding infrastructure.

Hey! thanks for researching and digging into this! sorry for slow reply. I'm on a 2 week bicycle vacation but home soon again, will look closer at this then.

But i can dump some info how i started look into this for the kaitai prototype that will require "bit-endian" which i think i related/same as this? see https://doc.kaitai.io/user_guide.html#_bit_sized_integers (search for bit-endian) for kaitais definition of it.

So in the kaitai struct prototype branch the decode context looks like this to keep som state (similar to what you proposed) https://github.com/wader/fq/blob/kaitai2/pkg/decode/decode.go#L177 and usage would that a decoder can set d.BitEndian = decode.LittleEndian in a struct for example and the decode internals will start reading bits from bytes "in-reverse" and also keep track to fail if you try to read mixed bit-endian-ness from same byte etc. Later on other parts of the decode internals will take care of sorting the fields. Only weirdness with all of this is what to do with fields that ends up with non-continuous bit-ranges? current i'm think of just assign a bit-range of the first and last bit affecting the field.

wader reviewed Jun 4, 2024

View reviewed changes

obilodeau mentioned this pull request Jun 13, 2024

Upstream to fq? GoSecure/fq-pyrdp#1

Open

wader force-pushed the format-pyrdp branch from d5d3590 to 3f2aa75 Compare June 14, 2024 21:57

wader reviewed Jun 14, 2024

View reviewed changes

format/pyrdp/pyrdp.go Outdated Show resolved Hide resolved

wader force-pushed the format-pyrdp branch from 3f2aa75 to c462db6 Compare June 16, 2024 20:40

obilodeau and others added 17 commits June 16, 2024 22:42

Sending PyRDP parser upstream

5d35eb6

For project history see: https://github.com/GoSecure/fq-pyrdp/ Co-authored-by: Lisandro Ubiedo <[email protected]>

Clarified MIT license usage alongside copyright info

392fabe

refactor: Changed all FieldStrFn callers to FieldUTF16LE

87633e6

One fieldname change clientDir to client_dir to align with everything else

refactoring: UintMap to UintMapSymStr

3904c45

refactor: package name aligned with directory name

051493d

removed dead code

ec03f87

doc: added format to README

fd85952

Resolved all linter errors

dab2c6e

pyrdp: Add doc and move init func to top

6ae10da

help: Render mailto:addr as <addr>

a79f64f

pyrdp: Use field description for formatted timestamp

67bf55e

Same as other formats that use timestamps and makes it easier to use them in query, number instead of a formatted string. Can still use todescription/0 to get formatted timestamp.

pyrdp: camelCase vars and lowercase number literals

a6b2468

pyrdp: Less symbol stuttering

c59e7cc

pyrdp: Add unused fields to replace gap fields

3986f20

pyrdp: Simplify always true condition

36d5cd4

pyrdp: Move pdu types into pdu package and some cleeanup

86f28b6

pyrdp: Reorganize pdu consts a bit

92ad156

wader force-pushed the format-pyrdp branch from c462db6 to 92ad156 Compare June 16, 2024 20:42

wader reviewed Jun 16, 2024

View reviewed changes

pyrdp: Decode client info flags correctly

69ec44a

Flags bits are in LE byte order Removed unsued info consts

wader reviewed Jun 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyRDP format #959

PyRDP format #959

obilodeau commented Jun 4, 2024

wader commented Jun 4, 2024

wader commented Jun 8, 2024

obilodeau commented Jun 13, 2024

obilodeau commented Jun 13, 2024

wader commented Jun 14, 2024

wader commented Jun 14, 2024

wader Jun 16, 2024

obilodeau Jun 21, 2024

obilodeau Jun 22, 2024

wader Jun 22, 2024

obilodeau Jul 19, 2024

wader Jun 16, 2024

obilodeau Jun 22, 2024

obilodeau Jun 22, 2024

wader Jun 22, 2024

obilodeau Jul 19, 2024

obilodeau Jul 19, 2024

obilodeau Jul 25, 2024

wader Jul 25, 2024

PyRDP format #959

Are you sure you want to change the base?

PyRDP format #959

Conversation

obilodeau commented Jun 4, 2024

wader commented Jun 4, 2024

wader commented Jun 8, 2024

obilodeau commented Jun 13, 2024

obilodeau commented Jun 13, 2024

wader commented Jun 14, 2024

wader commented Jun 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment