Skip to content
This repository was archived by the owner on May 16, 2025. It is now read-only.

Commit 87b0750

Browse files
authored
Refactor AST (#15)
This is a large refactoring of the abstract syntax tree that introduced backward incompatible changes. Now the AST is not generated directly as a Protocol Buffer, because the structures generated by Protocol Buffer compiler are hard to work with. Instead we have defined our own interfaces and types to represent the AST.
1 parent 83a82d2 commit 87b0750

36 files changed

+9219
-3900
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ hexgrammar:
77
flexgo -G -v -o hex/hex_lexer.go hex/hex_lexer.l && goyacc -p xx -o hex/hex_parser.go hex/hex_grammar.y
88

99
proto:
10-
protoc --go_out=. ast/yara.proto
10+
protoc --go_out=. pb/yara.proto
1111

1212
j2y:
1313
go build github.com/VirusTotal/gyp/cmd/j2y

README.md

Lines changed: 15 additions & 191 deletions
Original file line numberDiff line numberDiff line change
@@ -1,200 +1,32 @@
1-
# gyp (go-yara-parser)
2-
3-
`gyp` is a Go library for manipulating YARA rulesets.
4-
It uses the same grammar and lexer files as the original libyara to ensure that lexing and parsing work exactly like YARA.
5-
The grammar and lexer files have been modified to fill protocol buffers (PB) messages for ruleset manipulation instead of compiling rulesets for data matching.
6-
7-
Using `gyp`, one will be able to read YARA rulesets to programatically change metadata, rule names, rule modifiers, tags, strings, conditions and more.
8-
9-
Encoding rulesets as PB messages enable their manipulation in other languages.
10-
Additionally, the `y2j` tool is provided for serializing rulesets to JSON.
11-
Similarly, `j2y` provides JSON-to-YARA conversion, but do see __Limitations__ below.
12-
13-
## `y2j` Usage
14-
15-
Command line usage for `y2j` looks like the following:
16-
17-
```
18-
$ y2j --help
19-
Usage of y2j: y2j [options] file.yar
20-
21-
options:
22-
-indent int
23-
Set number of indent spaces (default 2)
24-
-o string
25-
JSON output file
26-
```
1+
[![GoDoc](https://godoc.org/github.com/VirusTotal/gyp?status.svg)](https://godoc.org/github.com/VirusTotal/gyp)
2+
[![Go Report Card](https://goreportcard.com/badge/github.com/VirusTotal/gyp)](https://goreportcard.com/report/github.com/VirusTotal/gyp)
273

28-
In action, `y2j` would convert the following ruleset:
29-
30-
```yara
31-
import "pe"
32-
import "cuckoo"
33-
34-
include "other.yar"
35-
36-
global rule demo : tag1 {
37-
meta:
38-
description = "This is a demo rule"
39-
version = 1
40-
production = false
41-
description = "because we can"
42-
strings:
43-
$string = "this is a string" nocase wide
44-
$regex = /this is a regex/i ascii fullword
45-
$hex = { 01 23 45 67 89 ab cd ef [0-5] ?1 ?2 ?3 }
46-
condition:
47-
$string or $regex or $hex
48-
}
49-
```
4+
# gyp (go-yara-parser)
505

51-
to this JSON output:
52-
53-
```json
54-
{
55-
"imports": [
56-
"pe",
57-
"cuckoo"
58-
],
59-
"includes": [
60-
"other.yar"
61-
],
62-
"rules": [
63-
{
64-
"modifiers": {
65-
"global": true,
66-
"private": false
67-
},
68-
"identifier": "demo",
69-
"tags": [
70-
"tag1"
71-
],
72-
"meta": [
73-
{
74-
"key": "description",
75-
"text": "This is a demo rule"
76-
},
77-
{
78-
"key": "version",
79-
"number": "1"
80-
},
81-
{
82-
"key": "production",
83-
"boolean": false
84-
},
85-
{
86-
"key": "description",
87-
"text": "because we can"
88-
}
89-
],
90-
"strings": [
91-
{
92-
"id": "$string",
93-
"text": {
94-
"text": "this is a string",
95-
"modifiers": {
96-
"nocase": true,
97-
"ascii": false,
98-
"wide": true,
99-
"fullword": false,
100-
"xor": false
101-
}
102-
}
103-
},
104-
{
105-
"id": "$regex",
106-
"regexp": {
107-
"text": "this is a regex",
108-
"modifiers": {
109-
"nocase": false,
110-
"ascii": true,
111-
"wide": false,
112-
"fullword": true,
113-
"xor": false,
114-
"i": true
115-
}
116-
}
117-
},
118-
{
119-
"id": "$hex",
120-
"hex": {
121-
"token": [
122-
{
123-
"sequence": {
124-
"value": "ASNFZ4mrze8=",
125-
"mask": "//////////8="
126-
}
127-
},
128-
{
129-
"jump": {
130-
"start": "0",
131-
"end": "5"
132-
}
133-
},
134-
{
135-
"sequence": {
136-
"value": "AQID",
137-
"mask": "Dw8P"
138-
}
139-
}
140-
]
141-
}
142-
}
143-
],
144-
"condition": {
145-
"orExpression": {
146-
"terms": [
147-
{
148-
"stringIdentifier": "$string"
149-
},
150-
{
151-
"stringIdentifier": "$regex"
152-
},
153-
{
154-
"stringIdentifier": "$hex"
155-
}
156-
]
157-
}
158-
}
159-
}
160-
]
161-
}
162-
```
6+
`gyp` is a Go library for parsing YARA rules. It uses the same grammar and lexer files as the original libyara to ensure that lexing and parsing work exactly like YARA. This library produces an Abstract Syntax Tree (AST) for the parsed YARA rules. Additionally, the AST can be serialized as a Protocol Buffer, which facilitate its manipulation in other programming languages.
1637

1648
## Go Usage
1659

166-
Sample usage for working with rulesets in Go looks like the following:
10+
The example below illustrates the usage of `gyp`, this a simple program that reads a YARA source file from the standard input, creates the corresponding AST, and writes the rules back to the standard output. The resulting output won't be exactly like the input, during the parsing and re-generation of the rules the text is reformatted and comments are lost.
16711

16812
```go
16913
package main
17014

17115
import (
172-
"fmt"
173-
"log"
174-
"os"
175-
proto "github.com/golang/protobuf/proto"
16+
"log"
17+
"os"
17618

177-
"github.com/VirusTotal/gyp"
19+
"github.com/VirusTotal/gyp"
17820
)
17921

18022
func main() {
181-
input, err := os.Open(os.Args[1]) // Single argument: path to your file
182-
if err != nil {
183-
log.Fatalf("Error: %s\n", err)
184-
}
185-
186-
ruleset, err := gyp.Parse(input)
187-
if err != nil {
188-
log.Fatalf(`Parsing failed: "%s"`, err)
189-
}
190-
191-
fmt.Printf("Ruleset:\n%v\n", ruleset)
192-
193-
// Manipulate the first rule
194-
rule := ruleset.Rules[0]
195-
rule.Identifier = proto.String("new_rule_name")
196-
rule.Modifiers.Global = proto.Bool(true)
197-
rule.Modifiers.Private = proto.Bool(false)
23+
ruleset, err := gyp.Parse(os.Stdin)
24+
if err != nil {
25+
log.Fatalf(`Error parsing rules: %v`, err)
26+
}
27+
if err = ruleset.WriteSource(os.Stdout); err != nil {
28+
log.Fatalf(`Error writing rules: %v`, err)
29+
}
19830
}
19931
```
20032

@@ -231,14 +63,6 @@ The `Makefile` includes targets for quickly building the parser and lexer and th
23163
- Build `y2j` tool: `make y2j`
23264
- Build `j2y` tool: `make j2y`
23365

234-
## Limitations
235-
236-
Currently, there are no guarantees with the library that modified rules will serialize back into a valid YARA ruleset:
237-
238-
1. you can set `rule.Identifier = "123"`, but this would be invalid YARA.
239-
2. Adding or removing strings may cause a condition to become invalid.
240-
3. Comments cannot be retained.
241-
4. Numbers are always serialized in decimal base.
24266

24367
## License and third party code
24468

adapter.go

Lines changed: 52 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -10,44 +10,43 @@ import (
1010
"io/ioutil"
1111

1212
"github.com/VirusTotal/gyp/ast"
13-
"github.com/VirusTotal/gyp/error"
13+
gyperror "github.com/VirusTotal/gyp/error"
1414
)
1515

1616
func init() {
1717
yrErrorVerbose = true
1818
}
1919

20-
// Parse parses a YARA rule from the provided input source
20+
// Parse parses a YARA rule from the provided input source.
2121
func Parse(input io.Reader) (rs *ast.RuleSet, err error) {
2222
defer func() {
2323
if r := recover(); r != nil {
2424
if yaraError, ok := r.(gyperror.Error); ok {
2525
err = yaraError
2626
} else {
2727
err = gyperror.Error{
28-
Code: gyperror.UnknownError,
29-
Data: fmt.Sprintf("%v", r),
28+
Code: gyperror.UnknownError,
29+
Message: fmt.Sprintf("%v", r),
3030
}
3131
}
3232
}
3333
}()
3434

35-
// "Reset" the global ParsedRuleset
36-
ParsedRuleset = ast.RuleSet{}
37-
38-
lexer := Lexer{
39-
lexer: *NewScanner(),
35+
lexer := &lexer{
36+
scanner: *NewScanner(),
37+
ruleSet: &ast.RuleSet{
38+
Imports: make([]string, 0),
39+
Rules: make([]*ast.Rule, 0),
40+
},
4041
}
41-
lexer.lexer.In = input
42-
lexer.lexer.Out = ioutil.Discard
42+
lexer.scanner.In = input
43+
lexer.scanner.Out = ioutil.Discard
4344

44-
if result := yrParse(&lexer); result != 0 {
45-
err = lexer.lexicalError
45+
if result := yrParse(lexer); result != 0 {
46+
err = lexer.err
4647
}
4748

48-
rs = &ParsedRuleset
49-
50-
return
49+
return lexer.ruleSet, err
5150
}
5251

5352
// ParseString parses a YARA rule from the provided string.
@@ -56,24 +55,51 @@ func ParseString(s string) (*ast.RuleSet, error) {
5655
}
5756

5857
// Lexer is an adapter that fits the flexgo lexer ("Scanner") into goyacc
59-
type Lexer struct {
60-
lexer Scanner
61-
lexicalError gyperror.Error
58+
type lexer struct {
59+
scanner Scanner
60+
err gyperror.Error
61+
ruleSet *ast.RuleSet
6262
}
6363

6464
// Lex provides the interface expected by the goyacc parser.
6565
// It sets the context's lval pointer (defined in the lexer file)
6666
// to the one passed as an argument so that the parser actions
6767
// can make use of it.
68-
func (l *Lexer) Lex(lval *yrSymType) int {
69-
l.lexer.Context.lval = lval
70-
return l.lexer.Lex().(int)
68+
func (l *lexer) Lex(lval *yrSymType) int {
69+
l.scanner.Context.lval = lval
70+
r := l.scanner.Lex()
71+
if r.Error.Code != 0 {
72+
r.Error.Line = l.scanner.Lineno
73+
panic(r.Error)
74+
}
75+
return r.Token
7176
}
7277

7378
// Error satisfies the interface expected of the goyacc parser.
74-
func (l *Lexer) Error(e string) {
75-
l.lexicalError = gyperror.Error{
76-
Code: gyperror.LexicalError,
77-
Data: fmt.Sprintf(`@%d - "%s"`, l.lexer.Lineno, e),
79+
func (l *lexer) Error(msg string) {
80+
l.err = gyperror.Error{
81+
Code: gyperror.LexicalError,
82+
Line: l.scanner.Lineno,
83+
Message: msg,
84+
}
85+
}
86+
87+
// SetError sets the lexer error. The error message can be built by passing
88+
// a format string and arguments as fmt.Sprintf. This function returns 1 as
89+
// it's intended to by used in grammar.y as:
90+
// return lexer.SetError(...)
91+
// By returning 1 from the parser the parsing is aborted.
92+
func (l *lexer) SetError(code gyperror.Code, format string, a ...interface{}) int {
93+
l.err = gyperror.Error{
94+
Code: code,
95+
Line: l.scanner.Lineno,
96+
Message: fmt.Sprintf(format, a...),
7897
}
98+
return 1
99+
}
100+
101+
// Helper function that casts a yrLexer interface to a lexer struct. This
102+
// function is used in grammar.y.
103+
func asLexer(l yrLexer) *lexer {
104+
return l.(*lexer)
79105
}

0 commit comments

Comments
 (0)