Skip to content

Commit

Permalink
Refactor AST (#15)
Browse files Browse the repository at this point in the history
This is a large refactoring of the abstract syntax tree that introduced backward incompatible changes. Now the AST is not generated directly as a Protocol Buffer, because the structures generated by Protocol Buffer compiler are hard to work with. Instead we have defined our own interfaces and types to represent the AST.
  • Loading branch information
plusvic authored Dec 3, 2019
1 parent 83a82d2 commit 87b0750
Show file tree
Hide file tree
Showing 36 changed files with 9,219 additions and 3,900 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ hexgrammar:
flexgo -G -v -o hex/hex_lexer.go hex/hex_lexer.l && goyacc -p xx -o hex/hex_parser.go hex/hex_grammar.y

proto:
protoc --go_out=. ast/yara.proto
protoc --go_out=. pb/yara.proto

j2y:
go build github.com/VirusTotal/gyp/cmd/j2y
Expand Down
206 changes: 15 additions & 191 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,200 +1,32 @@
# gyp (go-yara-parser)

`gyp` is a Go library for manipulating YARA rulesets.
It uses the same grammar and lexer files as the original libyara to ensure that lexing and parsing work exactly like YARA.
The grammar and lexer files have been modified to fill protocol buffers (PB) messages for ruleset manipulation instead of compiling rulesets for data matching.

Using `gyp`, one will be able to read YARA rulesets to programatically change metadata, rule names, rule modifiers, tags, strings, conditions and more.

Encoding rulesets as PB messages enable their manipulation in other languages.
Additionally, the `y2j` tool is provided for serializing rulesets to JSON.
Similarly, `j2y` provides JSON-to-YARA conversion, but do see __Limitations__ below.

## `y2j` Usage

Command line usage for `y2j` looks like the following:

```
$ y2j --help
Usage of y2j: y2j [options] file.yar
options:
-indent int
Set number of indent spaces (default 2)
-o string
JSON output file
```
[![GoDoc](https://godoc.org/github.com/VirusTotal/gyp?status.svg)](https://godoc.org/github.com/VirusTotal/gyp)
[![Go Report Card](https://goreportcard.com/badge/github.com/VirusTotal/gyp)](https://goreportcard.com/report/github.com/VirusTotal/gyp)

In action, `y2j` would convert the following ruleset:

```yara
import "pe"
import "cuckoo"
include "other.yar"
global rule demo : tag1 {
meta:
description = "This is a demo rule"
version = 1
production = false
description = "because we can"
strings:
$string = "this is a string" nocase wide
$regex = /this is a regex/i ascii fullword
$hex = { 01 23 45 67 89 ab cd ef [0-5] ?1 ?2 ?3 }
condition:
$string or $regex or $hex
}
```
# gyp (go-yara-parser)

to this JSON output:

```json
{
"imports": [
"pe",
"cuckoo"
],
"includes": [
"other.yar"
],
"rules": [
{
"modifiers": {
"global": true,
"private": false
},
"identifier": "demo",
"tags": [
"tag1"
],
"meta": [
{
"key": "description",
"text": "This is a demo rule"
},
{
"key": "version",
"number": "1"
},
{
"key": "production",
"boolean": false
},
{
"key": "description",
"text": "because we can"
}
],
"strings": [
{
"id": "$string",
"text": {
"text": "this is a string",
"modifiers": {
"nocase": true,
"ascii": false,
"wide": true,
"fullword": false,
"xor": false
}
}
},
{
"id": "$regex",
"regexp": {
"text": "this is a regex",
"modifiers": {
"nocase": false,
"ascii": true,
"wide": false,
"fullword": true,
"xor": false,
"i": true
}
}
},
{
"id": "$hex",
"hex": {
"token": [
{
"sequence": {
"value": "ASNFZ4mrze8=",
"mask": "//////////8="
}
},
{
"jump": {
"start": "0",
"end": "5"
}
},
{
"sequence": {
"value": "AQID",
"mask": "Dw8P"
}
}
]
}
}
],
"condition": {
"orExpression": {
"terms": [
{
"stringIdentifier": "$string"
},
{
"stringIdentifier": "$regex"
},
{
"stringIdentifier": "$hex"
}
]
}
}
}
]
}
```
`gyp` is a Go library for parsing YARA rules. It uses the same grammar and lexer files as the original libyara to ensure that lexing and parsing work exactly like YARA. This library produces an Abstract Syntax Tree (AST) for the parsed YARA rules. Additionally, the AST can be serialized as a Protocol Buffer, which facilitate its manipulation in other programming languages.

## Go Usage

Sample usage for working with rulesets in Go looks like the following:
The example below illustrates the usage of `gyp`, this a simple program that reads a YARA source file from the standard input, creates the corresponding AST, and writes the rules back to the standard output. The resulting output won't be exactly like the input, during the parsing and re-generation of the rules the text is reformatted and comments are lost.

```go
package main

import (
"fmt"
"log"
"os"
proto "github.com/golang/protobuf/proto"
"log"
"os"

"github.com/VirusTotal/gyp"
"github.com/VirusTotal/gyp"
)

func main() {
input, err := os.Open(os.Args[1]) // Single argument: path to your file
if err != nil {
log.Fatalf("Error: %s\n", err)
}

ruleset, err := gyp.Parse(input)
if err != nil {
log.Fatalf(`Parsing failed: "%s"`, err)
}

fmt.Printf("Ruleset:\n%v\n", ruleset)

// Manipulate the first rule
rule := ruleset.Rules[0]
rule.Identifier = proto.String("new_rule_name")
rule.Modifiers.Global = proto.Bool(true)
rule.Modifiers.Private = proto.Bool(false)
ruleset, err := gyp.Parse(os.Stdin)
if err != nil {
log.Fatalf(`Error parsing rules: %v`, err)
}
if err = ruleset.WriteSource(os.Stdout); err != nil {
log.Fatalf(`Error writing rules: %v`, err)
}
}
```

Expand Down Expand Up @@ -231,14 +63,6 @@ The `Makefile` includes targets for quickly building the parser and lexer and th
- Build `y2j` tool: `make y2j`
- Build `j2y` tool: `make j2y`

## Limitations

Currently, there are no guarantees with the library that modified rules will serialize back into a valid YARA ruleset:

1. you can set `rule.Identifier = "123"`, but this would be invalid YARA.
2. Adding or removing strings may cause a condition to become invalid.
3. Comments cannot be retained.
4. Numbers are always serialized in decimal base.

## License and third party code

Expand Down
78 changes: 52 additions & 26 deletions adapter.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,44 +10,43 @@ import (
"io/ioutil"

"github.com/VirusTotal/gyp/ast"
"github.com/VirusTotal/gyp/error"
gyperror "github.com/VirusTotal/gyp/error"
)

func init() {
yrErrorVerbose = true
}

// Parse parses a YARA rule from the provided input source
// Parse parses a YARA rule from the provided input source.
func Parse(input io.Reader) (rs *ast.RuleSet, err error) {
defer func() {
if r := recover(); r != nil {
if yaraError, ok := r.(gyperror.Error); ok {
err = yaraError
} else {
err = gyperror.Error{
Code: gyperror.UnknownError,
Data: fmt.Sprintf("%v", r),
Code: gyperror.UnknownError,
Message: fmt.Sprintf("%v", r),
}
}
}
}()

// "Reset" the global ParsedRuleset
ParsedRuleset = ast.RuleSet{}

lexer := Lexer{
lexer: *NewScanner(),
lexer := &lexer{
scanner: *NewScanner(),
ruleSet: &ast.RuleSet{
Imports: make([]string, 0),
Rules: make([]*ast.Rule, 0),
},
}
lexer.lexer.In = input
lexer.lexer.Out = ioutil.Discard
lexer.scanner.In = input
lexer.scanner.Out = ioutil.Discard

if result := yrParse(&lexer); result != 0 {
err = lexer.lexicalError
if result := yrParse(lexer); result != 0 {
err = lexer.err
}

rs = &ParsedRuleset

return
return lexer.ruleSet, err
}

// ParseString parses a YARA rule from the provided string.
Expand All @@ -56,24 +55,51 @@ func ParseString(s string) (*ast.RuleSet, error) {
}

// Lexer is an adapter that fits the flexgo lexer ("Scanner") into goyacc
type Lexer struct {
lexer Scanner
lexicalError gyperror.Error
type lexer struct {
scanner Scanner
err gyperror.Error
ruleSet *ast.RuleSet
}

// Lex provides the interface expected by the goyacc parser.
// It sets the context's lval pointer (defined in the lexer file)
// to the one passed as an argument so that the parser actions
// can make use of it.
func (l *Lexer) Lex(lval *yrSymType) int {
l.lexer.Context.lval = lval
return l.lexer.Lex().(int)
func (l *lexer) Lex(lval *yrSymType) int {
l.scanner.Context.lval = lval
r := l.scanner.Lex()
if r.Error.Code != 0 {
r.Error.Line = l.scanner.Lineno
panic(r.Error)
}
return r.Token
}

// Error satisfies the interface expected of the goyacc parser.
func (l *Lexer) Error(e string) {
l.lexicalError = gyperror.Error{
Code: gyperror.LexicalError,
Data: fmt.Sprintf(`@%d - "%s"`, l.lexer.Lineno, e),
func (l *lexer) Error(msg string) {
l.err = gyperror.Error{
Code: gyperror.LexicalError,
Line: l.scanner.Lineno,
Message: msg,
}
}

// SetError sets the lexer error. The error message can be built by passing
// a format string and arguments as fmt.Sprintf. This function returns 1 as
// it's intended to by used in grammar.y as:
// return lexer.SetError(...)
// By returning 1 from the parser the parsing is aborted.
func (l *lexer) SetError(code gyperror.Code, format string, a ...interface{}) int {
l.err = gyperror.Error{
Code: code,
Line: l.scanner.Lineno,
Message: fmt.Sprintf(format, a...),
}
return 1
}

// Helper function that casts a yrLexer interface to a lexer struct. This
// function is used in grammar.y.
func asLexer(l yrLexer) *lexer {
return l.(*lexer)
}
Loading

0 comments on commit 87b0750

Please sign in to comment.