Parser by ornj · Pull Request #1 · ornj/nginx-go-crossplane

ornj · 2024-03-01T00:32:44Z

Proposed changes

Describe the use case and detail of the change. If this PR addresses an issue on GitHub, make sure to include a link to that issue using one of the supported keywords here in this description (not in the title of the PR).

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING document
If applicable, I have added tests that prove my fix is effective or that my feature works
If applicable, I have checked that any relevant tests pass after adding my changes
I have updated any relevant documentation (README.md)

Implemented `crossplane.Scanner` that follows the example of other "scanner" types implemented in the Go stdlib. The existing `Lex` uses concurrency to make tokens available to the caller while managing "state". I think this design queue was taken from Rob Pike's 2011 talk on [Lexical Scanning in Go](https://go.dev/talks/2011/lex.slide). If you look at examples from the Go stdlib-- such as `bufio.Scanner` that `Lex` depends on-- you'd find that this isn't the strategy being employed and instead there is a struct that manages the state of the scanner and a method that used by the caller to advance the scanner to obtain tokens. After a bit of Internet archeology, I found [this](https://groups.google.com/g/golang-nuts/c/q--5t2cxv78/m/Vkr9bNuhP5sJ) post on `golang-nuts` from Rob Pike himself: > That talk was about a lexer, but the deeper purpose was to demonstrate > how concurrency can make programs nice even without obvious parallelism > in the problem. And like many such uses of concurrency, the code is > pretty but not necessarily fast. > > I think it's a fine approach to a lexer if you don't care about > performance. It is significantly slower than some other approaches but > is very easy to adapt. I used it in ivy, for example, but just so you > know, I'm probably going to replace the one in ivy with a more > traditional model to avoid some issues with the lexer accessing global > state. You don't care about that for your application, I'm sure. > So: It's pretty and nice to work on, but you'd probably not choose that > approach for a production compiler. An implementation of a "scanner" using the more "traditional" model-- much of the logic is the same or very close to `Lex`-- seems to support the above statement. ``` go test -benchmem -run=^$ -bench "^BenchmarkScan|BenchmarkLex$" github.com/nginxinc/nginx-go-crossplane -count=1 -v goos: darwin goarch: arm64 pkg: github.com/nginxinc/nginx-go-crossplane BenchmarkLex BenchmarkLex/simple BenchmarkLex/simple-10 70982 16581 ns/op 102857 B/op 37 allocs/op BenchmarkLex/with-comments BenchmarkLex/with-comments-10 64125 18366 ns/op 102921 B/op 43 allocs/op BenchmarkLex/messy BenchmarkLex/messy-10 28171 42697 ns/op 104208 B/op 166 allocs/op BenchmarkLex/quote-behavior BenchmarkLex/quote-behavior-10 83667 14154 ns/op 102768 B/op 24 allocs/op BenchmarkLex/quoted-right-brace BenchmarkLex/quoted-right-brace-10 48022 24799 ns/op 103369 B/op 52 allocs/op BenchmarkScan BenchmarkScan/simple BenchmarkScan/simple-10 179712 6660 ns/op 4544 B/op 34 allocs/op BenchmarkScan/with-comments BenchmarkScan/with-comments-10 133178 7628 ns/op 4608 B/op 40 allocs/op BenchmarkScan/messy BenchmarkScan/messy-10 49251 24106 ns/op 5896 B/op 163 allocs/op BenchmarkScan/quote-behavior BenchmarkScan/quote-behavior-10 240026 4854 ns/op 4456 B/op 21 allocs/op BenchmarkScan/quoted-right-brace BenchmarkScan/quoted-right-brace-10 87468 13534 ns/op 5056 B/op 49 allocs/op PASS ok github.com/nginxinc/nginx-go-crossplane 13.676s ``` This alternative to `Lex` is probably a micro-optimization for many use cases. As the size and number of NGINX configurations that need to be analyzed grows, optimization can be a good thing as well as an API that feels familiar to Go developers who might use this tool for their own purposes. Next steps: - Use `Scanner` to "parse" NGINX configurations. I think this should be done in place so that the existing API works as is, but we should also expose a way to allow the caller to provide the scanner. - Deprecate `Lex` in favor of `Scanner`. If we leave `Lex` in place then I don't think we would need a `v2` of the crossplane package (yet).

Stores the first error encountered by Scan() and checks it to make sure scanning stops unrecoverably. The Err() method can use used to fetch the last non-EOF error.

This parser allows the user to configure what "commands" are known and supported by way of providing a list of "modules". The idea is that users can then customize Crossplane to parse NGINX configurations with specific module sets including their own that we know nothing about. We could provide presets that contain specific modules/versions that can be used with specific builds/versions of NGINX. This POC has support for a very limited set of directives and a basic capability to handle includes. Includes are currently handled similar to what Crossplane does today, but I think I'm seeing issues with that strategy. Specifically, the include's context is set when the file name is first discovered. I think if that file is included a second time in a different context, the context would be incorrect.

A list of commands can be passed to the parser that are then used to match commands (directives) that the parser should "ignore" and not include in the result. If an ignored command is a "block type" then the parser recursively parses the block and ignores any error. This is inline with existing Crossplane behavior but as others have noted, it does seem a little weird as it changes the meaning/intent of the NGINX configuration in a way that may no longer be valid. Arguably it is up to the user to handle this use case some how. I gave some thought to changing the parser "type" to be a pull model similar to `Scanner` where the user could ask for the next directive and on receiving an `include` could then choose to ignore it or even handle it as the "single file" use case where the contents are injected into the `Block` field of the parent directive instead of added to the file list for later parsing. I'm not sure how that would look yet.

Added the concept of an "extension" to the Scanner that can be used to implement alternative grammars such as Lua. Unlike the Python version of crossplane, extensions to not need to be "registered" ahead of time. Instead, when a token is detected for which you want to start scanning with a different grammar, you then use the ScanWith method to continue tokenizing the config. This is using the same `Directive` type that already exists in crossplane, but it starts to show that the existing struct may be insufficient. Lua doesn't fit well into `Directive` so like the python implementation of crossplane, the Lua script is jammed into the Args field.

ornj and others added 4 commits January 31, 2024 09:32

Adds Err() method to scanner and checks error in Scan()

8095433

Stores the first error encountered by Scan() and checks it to make sure scanning stops unrecoverably. The Err() method can use used to fetch the last non-EOF error.

NLB-4374: Add validation of set_real_ip_from

ac65344

ornj force-pushed the parser branch from 2be39e1 to 6a4dbb4 Compare March 1, 2024 00:42

ornj added 5 commits March 6, 2024 12:53

Merge remote-tracking branch 'origin' into parser

0da04b9

WIP: Added additional directives and parser tests

f2cd9c2

WIP: pattern of referencing Type.Name instead of string const

77edce8

ornj force-pushed the scanner branch 2 times, most recently from e7c97fe to fe04b93 Compare July 5, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser#1

Parser#1
ornj wants to merge 9 commits intoscannerfrom
parser

ornj commented Mar 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ornj commented Mar 1, 2024

Proposed changes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants