From 6eccbe37c0a55f38e0f2e2192b4697005bfbb6ec Mon Sep 17 00:00:00 2001 From: "Victor M. Alvarez" Date: Tue, 7 May 2024 12:43:23 +0200 Subject: [PATCH] docs: add documentation for the C API --- site/content/docs/api/c.md | 444 ++++++++++++++++++++++++++++++++ site/content/docs/api/python.md | 4 +- 2 files changed, 446 insertions(+), 2 deletions(-) diff --git a/site/content/docs/api/c.md b/site/content/docs/api/c.md index c2dc6cbb1..9bb66768f 100644 --- a/site/content/docs/api/c.md +++ b/site/content/docs/api/c.md @@ -76,4 +76,448 @@ Windows users will find all the files needed for importing YARA-X in the library `yara_x_capi.dll.lib` * A static library `yara_x_capi.lib` +## API overview +Using YARA-X from C involves a two-step process: rule compilation and scanning. +During the rule compilation phase you transform YARA rules from text +into a compiled [YRX_RULES](#yrx_rules) object. This object is later used for +scanning data. + +To compile rules, you can either use the [yrx_compile](#yrx_compile) +function or a [YRX_COMPILER](#yrx_compiler) object. The former is simpler and +sufficient for simpler scenarios. For more complex use-cases involving the use +of namespaces and multiple rule sets, the latter method is necessary. + +Once you have a [YRX_RULES](#yrx_rules) object, you must create +a [YRX_SCANNER](#yrx_scanner) object that will use the compiled rules for +scanning data. A new scanner can be created +with [yrx_scanner_create](#yrx_scanner_create). It's ok to use the +same [YRX_RULES](#yrx_rules) object with multiple scanners, and use each scanner +from a different thread to scan different data with the same rules in parallel. +Each scanner must be used by a single thread, though. + +Scanners and rules must be destroyed by +calling [yrx_scanner_destroy](#yrx_scanner_destroy) +and [yrx_rules_destroy](#yrx_rules_destroy) respectively, but the rules must +be destroyed only after all scanners using them are already destroyed. + +You must use [yrx_scanner_on_matching_rule](#yrx_scanner_on_matching_rule) to +give the scanner a callback function that will be called for every matching +rule. The callback function receives a pointer to a [YRX_RULE](#yrx_rule) +structure representing the matching rule, and gives you access to details +about the rule, like its identifier and namespace. + +## API reference + +### yrx_compile + +```c +enum YRX_RESULT yrx_compile( + const char *src, + struct YRX_RULES **rules); +``` + +Function that takes a string with one or more YARA rules and produces +a [YRX_RULES](#yrx_rules) object representing the rules in compiled form. This +is the simplest way for compiling YARA rules, for more advanced use-cases you +must use a [YRX_COMPILER](#yrx_compiler). + +------ + +### yrx_last_error + +```c +const char *yrx_last_error(void); +``` + +Returns the error message corresponding to the most recent invocation of a +function in this API by the current thread. The returned pointer will be `null` +if the most recent function call by the current thread was successfully. + +Also, the pointer is only valid until the current thread calls some other +function in this API. + +------ + +### YRX_COMPILER + +Type that represents a YARA-X compiler. It takes one or more sets of YARA +rules in text form and compile them into a [YRX_RULES](#yrx_rules) object. + +#### yrx_compiler_create + +```c +enum YRX_RESULT yrx_compiler_create( + struct YRX_COMPILER **compiler); +``` + +Creates a new compiler. It must be destroyed +with [yrx_compiler_destroy](#yrx_compiler_destroy). + +#### yrx_compiler_destroy + +```c +void yrx_compiler_destroy( + struct YRX_COMPILER *compiler); +``` + +Destroys the compiler [YRX_COMPILER](#yrx_compiler) object. + +#### yrx_compiler_add_source + +```c +enum YRX_RESULT yrx_compiler_add_source( + struct YRX_COMPILER *compiler, + const char *src); +``` + +Adds a YARA source code to be compiled. This function can be called multiple +times. + +#### yrx_compiler_new_namespace + +```c +enum YRX_RESULT yrx_compiler_new_namespace( + struct YRX_COMPILER *compiler, + const char *namespace_); +``` + +Creates a new namespace. Further calls +to [yrx_compiler_add_source](#yrx_compiler_add_source) will put the +rules under the newly created namespace. The `namespace` argument must be +pointer to null-terminated UTF-8 string. If the string is not valid UTF-8 the +result is an `INVALID_ARGUMENT` error. + +#### yrx_compiler_define_global_xxxx + +```c +enum YRX_RESULT yrx_compiler_define_global_str( + struct YRX_COMPILER *compiler, + const char *ident, + const char *value); + +enum YRX_RESULT yrx_compiler_define_global_bool( + struct YRX_COMPILER *compiler, + const char *ident, + bool value); + +enum YRX_RESULT yrx_compiler_define_global_int( + struct YRX_COMPILER *compiler, + const char *ident, + int64_t value); + +enum YRX_RESULT yrx_compiler_define_global_float( + struct YRX_COMPILER *compiler, + const char *ident, + double value); +``` + +Defines a global variable and sets its initial value. + +Global variables must be defined before +calling [yrx_compiler_add_source](#yrx_compiler_add_source) with some YARA rule +that uses the variable. The variable will retain its initial value when the +[YRX_RULES](#yrx_rules) object is used for scanning data, however each scanner +can change the variable's value by +calling any of the [yrx_scanner_set_global_xxxx](#yrx_scanner_set_global_xxxx) +functions. + +The `ident` argument must be pointer to null-terminated UTF-8 string. If the +string is not valid UTF-8 the result is an `INVALID_ARGUMENT` error. + +#### yrx_compiler_build + +```c +struct YRX_RULES *yrx_compiler_build(struct YRX_COMPILER *compiler); +``` + +Builds the source code previously added to the compiler, producing +a [YRX_RULES](#yrx_rules) object that can be used for scanning data. + +The [YRX_RULES](#yrx_rules) object must be destroyed +with [yrx_rules_destroy](#yrx_rules_destroy) when not used anymore. + +After calling this function the compiler is reset to its initial state, +you can keep using it by adding more sources and calling this function again. + +------ + +### YRX_RULES + +Type that represents a set of compiled rules. The compiled rules can be used for +scanning data by creating a scanner +with [yrx_scanner_create](#yrx_scanner_create). + +#### yrx_rules_destroy + +```c +void yrx_rules_destroy(struct YRX_RULES *rules); +``` + +Destroys the [YRX_RULES](#yrx_rules) object. This function must be called only +after all the scanners using the [YRX_RULES](#yrx_rules) object are destroyed. + +------ + +### YRX_SCANNER + +#### yrx_scanner_create + +```c +enum YRX_RESULT yrx_scanner_create( + const struct YRX_RULES *rules, + struct YRX_SCANNER **scanner); +``` + +Creates a [YRX_SCANNER](#yrx_scanner) object that can be used for scanning data +with the provided [YRX_RULES](#yrx_rules). + +It's ok to pass the same [YRX_RULES](#yrx_rules) to multiple scanners, and use +each scanner from a different thread. The scanner can be used as many times as +you want, and it must be destroyed +with [yrx_scanner_destroy](#yrx_scanner_destroy). Also, the scanner is valid as +long as the rules are not destroyed, so, always destroy +the [YRX_SCANNER](#yrx_scanner) object before the [YRX_RULES](#yrx_rules) +object. + +#### yrx_scanner_destroy + +```c +void yrx_scanner_destroy(struct YRX_SCANNER *scanner); +``` + +Destroys the [YRX_SCANNER](#yrx_scanner) object. + +#### yrx_scanner_on_matching_rule + +```c +enum YRX_RESULT yrx_scanner_on_matching_rule( + struct YRX_SCANNER *scanner, + YRX_ON_MATCHING_RULE callback, + void *user_data); +``` + +Sets a callback function that is called by the scanner for each rule that +matched during a scan. + +The `user_data` pointer can be used to provide additional context to your +callback function. If the callback is not set, the scanner doesn't notify +about matching rules. + +See [YRX_ON_MATCHING_RULE](#yrx_on_matching_rule) for more details. + +#### yrx_scanner_scan + +```c +enum YRX_RESULT yrx_scanner_scan( + struct YRX_SCANNER *scanner, + const uint8_t *data, + size_t len); +``` + +#### yrx_scanner_set_timeout + +```c +enum YRX_RESULT yrx_scanner_set_timeout( + struct YRX_SCANNER *scanner, + uint64_t timeout); +``` + +#### yrx_scanner_set_global_xxxx + +```c +enum YRX_RESULT yrx_scanner_set_global_str( + struct YRX_SCANNER *scanner, + const char *ident, + const char *value); + +enum YRX_RESULT yrx_scanner_set_global_bool( + struct YRX_SCANNER *scanner, + const char *ident, + bool value); + +enum YRX_RESULT yrx_scanner_set_global_int( + struct YRX_SCANNER *scanner, + const char *ident, + int64_t value); + +enum YRX_RESULT yrx_scanner_set_global_float( + struct YRX_SCANNER *scanner, + const char *ident, + double value); +``` + +------ + +### YRX_ON_MATCHING_RULE + +```c +typedef void (*YRX_ON_MATCHING_RULE)( + const struct YRX_RULE *rule, + void *user_data); +``` + +Callback function passed to the scanner +via [yrx_scanner_on_matching_rule](#yrx_on_matching_rule), which receives +notifications about matching rules. + +The callback receives a pointer to the matching rule, represented by a +[YRX_RULE](#yrx_rule) structure. This pointer is guaranteed to be valid while +the callback function is being executed, but it may be freed after the callback +function returns, so you cannot use the pointer outside the callback. + +It also receives the `user_data` pointer that was passed to +[yrx_scanner_on_matching_rule](#yrx_scanner_on_matching_rule), which can point +to arbitrary data owned by the user. + +------ + +### YRX_RULE + +Represents a single YARA rule. The callback function passed to the scanner +for reporting matches receives a pointer to a [YRX_RULE](#yrx_rule). + +#### yrx_rule_identifier + +```c +enum YRX_RESULT yrx_rule_identifier( + const struct YRX_RULE *rule, + const uint8_t **ident, + size_t *len); +``` + +Returns the identifier of the rule represented by `rule`. + +Arguments `ident` and `len` are output parameters that receive pointers to a +`const uint8_t*` and `size_t`, where this function will leave a pointer +to the rule's namespace and its length, respectively. The namespace is **NOT** +null-terminated, you must use the returned `len` as the size of the namespace. +The `*ident` pointer will be valid as long as the [YRX_RULES](#yrx_rules) object +that contains the rule is not destroyed. The namespace is guaranteed to be a +valid UTF-8 string. + +#### yrx_rule_namespace + +```c +enum YRX_RESULT yrx_rule_namespace( + const struct YRX_RULE *rule, + const uint8_t **ns, + size_t *len); +``` + +Returns the namespace of the rule represented by `rule`. + +Arguments `ns` and `len` are output parameters that receive pointers to a +`const uint8_t*` and `size_t`, where this function will leave a pointer +to the rule's namespace and its length, respectively. The namespace is **NOT** +null-terminated, you must use the returned `len` as the size of the namespace. +The `*ns` pointer will be valid as long as the [YRX_RULES](#yrx_rules) object +that contains the rule is not destroyed. The namespace is guaranteed to be a +valid UTF-8 string. + +#### yrx_rule_patterns + +```c +struct YRX_PATTERNS *yrx_rule_patterns(const struct YRX_RULE *rule); +``` + +Returns an array with all the patterns defined by the rule. + +Each pattern contains information about whether it matched or not, and where +in the data it matched. The patterns are represented by +a [YRX_PATTERNS](#yrx_patterns) object that must be destroyed +with [yrx_patterns_destroy](#yrx_patterns_destroy) when not needed anymore. + +------ + +### YRX_PATTERNS + +A set of patterns defined by a rule. You will get a pointer to one of these +structures when calling [yrx_rule_patterns](#yrx_rule_patterns), you are +responsible for calling [yrx_patterns_destroy](#yrx_patterns_destroy) when not +using the structure anymore. + +```c +typedef struct YRX_PATTERNS { + // Number of patterns. + size_t num_patterns; + // Pointer to an array of YRX_PATTERN structures. The array has + // num_patterns items. If num_patterns is zero this pointer is + // invalid and should not be de-referenced. + struct YRX_PATTERN *patterns; +} YRX_PATTERNS; +``` + +#### yrx_patterns_destroy + +```c +void yrx_patterns_destroy(struct YRX_PATTERNS *patterns); +``` + +Destroys the [YRX_PATTERNS](#yrx_patterns) object. + + +------ + +### YRX_PATTERN + +An individual pattern defined in a rule. The [YRX_PATTERNS](#yrx_patterns) +object has a pointer to an array of these structures. + +```c +typedef struct YRX_PATTERN { + // Pattern's identifier (i.e: $a, $foo) + char *identifier; + // Number of matches found for this pattern. + size_t num_matches; + // Pointer to an array of YRX_MATCH structures describing the matches + // for this pattern. The array has num_matches items. If num_matches is + // zero this pointer is invalid and should not be de-referenced. + struct YRX_MATCH *matches; +} YRX_PATTERN; +``` + +------ + +### YRX_MATCH + +An individual match found for a pattern. The [YRX_PATTERN](#yrx_pattern) +object has a pointer to an array of these structures. + +```c +typedef struct YRX_MATCH { + size_t offset; + size_t length; +} YRX_MATCH; +``` + +------ + +### YRX_RESULT + +Error codes returned by multiple functions in this API. + +```c +typedef enum YRX_RESULT { + // Everything was OK. + SUCCESS, + // A syntax error occurred while compiling YARA rules. + SYNTAX_ERROR, + // An error occurred while defining or setting a global variable. This may + // happen when a variable is defined twice and when you try to set a value + // that doesn't correspond to the variable's type. + VARIABLE_ERROR, + // An error occurred during a scan operation. + SCAN_ERROR, + // A scan operation was aborted due to a timeout. + SCAN_TIMEOUT, + // An error indicating that some of the arguments passed to a function is + // invalid. Usually indicates a nil pointer to a scanner or compiler. + INVALID_ARGUMENT, + // An error indicating that some of the strings passed to a function is + // not valid UTF-8. + INVALID_UTF8, + // An error occurred while serializing/deserializing YARA rules. + SERIALIZATION_ERROR, +} YRX_RESULT; +``` \ No newline at end of file diff --git a/site/content/docs/api/python.md b/site/content/docs/api/python.md index 79a8efdef..67d3f5a26 100644 --- a/site/content/docs/api/python.md +++ b/site/content/docs/api/python.md @@ -64,8 +64,8 @@ scanning. During the rule compilation phase you transform YARA rules from text into a compiled [Rules](#rules) object. This object is later used for scanning data. -To compiler rules, you can either use the [yara_x.compile(...)](#compile) -function or a [Compiler](#compiler) object. The former is simpler and sufficent +To compile rules, you can either use the [yara_x.compile(...)](#compile) +function or a [Compiler](#compiler) object. The former is simpler and sufficient for simpler scenarios. For more complex use-cases involving the use of namespaces and multiple rule sets, the latter method is necessary.