You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CoBRA is validated against 73,136 lines drawn from 34 dataset files spanning 7 independent sources. Every expression is parsed, simplified, and spot-checked at runtime. The numbers below are enforced by automated test assertions in test/verify/test_dataset_benchmarks.cpp and verified on every CI run. OSES Fast is currently disabled (OOM on deeply nested expressions); its numbers are from the last successful run on master.
loki_tiny: 25 sections covering add, subtract, AND, OR, XOR at depths 1-5. All 25,000 are 2-variable linear MBAs.
mba_obf_nonlinear: 500 polynomial + 500 linear expressions, all with linear ground-truth targets. All 1,000 pass full-width verification.
syntia: All 500 expressions simplify via the orchestrator's decomposition and lifting passes.
qsynth_ea: The most challenging dataset. 413 of 500 expressions simplify. The 87 unsupported expressions break down into 8 verify-failed, 3 representation-gap, and 69 search-exhausted. Impact-ranked lifting with budget supplementation recovers many expressions that were previously blocked by structural redundancy exhausting the worklist budget.
oses: 479 MBA expressions extracted from the OSES synth.py evaluation script (plus 1 header comment). Expressions span linear, nonlinear/product, and constant categories with 1-14 variables. The dataset is split into fast (472 expressions under 50K characters) and slow (7 mega-expressions over 50K characters). Both subsets are currently disabled — the fast subset OOMs on deeply nested expressions that exceed memory limits during recursive evaluation, and the slow subset requires minutes per expression. Numbers shown are from the last successful run on master.
ObfuscatorX Dataset
Dataset
Total Lines
Parsed
Simplified
Unsupported
Rate
obfuscatorx.txt
8
7
7
0
100%
obfuscatorx: 7 expressions lifted from ObfuscatorX. All simplify via the standard pipeline.
Aggregate Summary
Metric
Count
Total dataset lines
73,136
Comment/header lines skipped
70
Parsed expressions
73,066
Simplified
72,909
Unsupported (by design)
157
Errors / failures
0
MBA Class
Expressions
Simplified
Rate
Linear
~55,000
~55,000
~100%
Semilinear
~1,000
~1,000
~100%
Polynomial
~5,000
~4,950
~99%
Mixed
~9,000
~8,800
~98%
All simplified results are validated via spot-check (random-input evaluation) at 64-bit width. When Z3 is available, full equivalence proofs are performed.