diff --git a/claude/Makefile b/claude/Makefile new file mode 100644 index 000000000..6414ba7d1 --- /dev/null +++ b/claude/Makefile @@ -0,0 +1,24 @@ +.PHONY: install uninstall clean + +SKILL_NAME := ofrak-developer +SKILL_DIR := $(CURDIR)/$(SKILL_NAME) +CLAUDE_SKILLS_DIR := $(HOME)/.claude/skills +INSTALL_PATH := $(CLAUDE_SKILLS_DIR)/$(SKILL_NAME) + +install: + @echo "Installing $(SKILL_NAME) skill..." + @mkdir -p $(CLAUDE_SKILLS_DIR) + @ln -sf $(SKILL_DIR) $(INSTALL_PATH) + @echo "✓ Skill installed at $(INSTALL_PATH)" + +uninstall: + @echo "Uninstalling $(SKILL_NAME) skill..." + @rm -f $(INSTALL_PATH) + @echo "✓ Skill uninstalled" + +package: + @echo "Packaging $(SKILL_NAME) skill..." + @zip -r $(SKILL_NAME).zip $(SKILL_DIR) + @echo "✓ Skill packaged at $(SKILL_NAME).zip" + +clean: uninstall diff --git a/claude/README.md b/claude/README.md new file mode 100644 index 000000000..b10679483 --- /dev/null +++ b/claude/README.md @@ -0,0 +1,139 @@ +# OFRAK Developer Skill for Claude Code + +A Claude Code skill for OFRAK contributors and developers. This skill provides guidance for writing OFRAK scripts, creating/modifying OFRAK components, adding tests, fixing bugs, and contributing to OFRAK internals while maintaining 100% code coverage. + +## Overview + +This skill enables Claude Code to: +- Write OFRAK scripts and components following best practices +- Create comprehensive tests with 100% code coverage +- Develop and modify OFRAK internals +- Debug and fix OFRAK-related issues +- Follow OFRAK coding standards and conventions + +## Prerequisites + +- Claude Code + +## Installation + +### Quick Install + +```bash +# Install the skill +make install +``` + +This will create a symlink from `~/.claude/skills/ofrak-developer` to the skill directory. + +### Manual Installation + +If you prefer to install manually: + +```bash +mkdir -p ~/.claude/skills +ln -sf $(pwd)/ofrak-developer ~/.claude/skills/ofrak-developer +``` + +### Packaging + +```bash +make package +``` +This will create `ofrak-developer.zip`, which can be installed in the Claude Code desktop app. + +## Usage + +Once installed, the skill is automatically available in Claude Code when working on OFRAK-related projects. + +### Activating the Skill + +The skill activates automatically when you ask Claude Code to work on OFRAK development tasks, such as creating components, adding tests, or modifying OFRAK internals. + +### Common Use Cases + +#### 1. Creating a New OFRAK Component + +``` +Create a new OFRAK unpacker component for XYZ format +``` + +The skill will: +- Generate the component following OFRAK patterns +- Create comprehensive tests +- Ensure proper type annotations +- Follow OFRAK coding standards + +#### 2. Adding Tests + +``` +Add tests for the ExampleAnalyzer component +``` + +The skill will: +- Create test fixtures +- Write unit tests with 100% coverage +- Follow OFRAK testing conventions +- Use proper mocking and assertions + +#### 3. Fixing Bugs + +``` +Fix the bug in the ELF unpacker where sections are not properly aligned +``` + +The skill will: +- Analyze the issue +- Implement the fix +- Add regression tests +- Maintain code coverage + +#### 4. Writing OFRAK Scripts + +``` +Write a script to unpack and analyze all embedded resources in this firmware +``` + +The skill will: +- Use OFRAK APIs correctly +- Follow best practices +- Add proper error handling +- Include documentation + +## What Makes This Different from the General OFRAK Skill? + +This skill is specifically for **OFRAK contributors and developers**. It: +- Knows OFRAK internals and architecture +- Enforces 100% code coverage requirements +- Follows OFRAK contribution guidelines +- Uses OFRAK development patterns +- Understands the OFRAK codebase structure + +For **using** OFRAK (not developing it), see the general `ofrak-user` skill. + +## Skill Features + +- ✅ Component development guidance +- ✅ Test generation with full coverage +- ✅ Code review against OFRAK standards +- ✅ Bug fixing with regression tests +- ✅ Documentation generation +- ✅ Type annotation enforcement + +## Uninstalling + +To remove the skill: + +```bash +make uninstall +``` + +Or manually: + +```bash +rm ~/.claude/skills/ofrak-developer +``` + +## Feedback + +We want your feedback! Please open an issue on [OFRAK Github](https://github.com/redballoonsecurity/ofrak/issues). diff --git a/claude/ofrak-developer/SKILL.md b/claude/ofrak-developer/SKILL.md new file mode 100644 index 000000000..99e4ca193 --- /dev/null +++ b/claude/ofrak-developer/SKILL.md @@ -0,0 +1,461 @@ +--- +name: ofrak-developer +description: Guide for OFRAK contributors and developers. Use this skill when writing OFRAK scripts, creating/modifying OFRAK components, adding tests, fixing bugs, or contributing to OFRAK internals. Ensures contributions follow OFRAK standards, automatically creates comprehensive tests, and maintains 100% code coverage. Superset of ofrak-user skill. +--- + +# OFRAK Developer + +## Overview + +Comprehensive guide for contributing to the OFRAK (Open Firmware Reverse Analysis Konsole) project and writing OFRAK scripts. This skill covers both using OFRAK (writing standalone scripts) and developing OFRAK (contributing components, fixing bugs, refactoring internals). Ensures all contributions follow OFRAK's coding standards, include comprehensive tests, and meet the 100% code coverage requirement. + +**This skill is stateless** - it recognizes existing code without tests and creates appropriate tests. It handles script writing, component development, bugfixes, refactoring, and ensures compliance with contribution guidelines. + +## When to Use This Skill + +Invoke this skill when: +- **Writing OFRAK Scripts** - Creating or modifying standalone Python scripts that use OFRAK +- **Adding Components** - Creating new analyzers, unpackers, modifiers, packers, or identifiers +- **Modifying Components** - Fixing bugs, refactoring, or enhancing existing OFRAK components +- **Writing Tests** - Creating tests for components that lack them (stateless - detects missing tests) +- **Contributing to OFRAK** - Making any changes to OFRAK internals, GUI, or modules +- **Creating Pull Requests** - Preparing contributions that meet OFRAK standards + +## Core Principles + +### 1. Stateless Test Creation + +**CRITICAL: Automatically detect and create missing tests.** + +When working on any OFRAK code: +1. Check if tests exist for the component/function +2. If tests are missing, create comprehensive tests automatically +3. Ensure 100% coverage (statement or function level) +4. Follow testing patterns from `references/testing_patterns.md` + +**Do NOT ask user if they want tests - always create them automatically.** + +### 2. Mandatory Documentation Reading + +**BEFORE implementing any component, MUST read relevant guides:** + +- **Always read**: `ofrak/docs/contributor-guide/getting-started.md` +- **For components**: Read component-specific guide from `ofrak/docs/contributor-guide/component/` + - Unpacker → `unpacker.md` + - Analyzer → `analyzer.md` + - Modifier → `modifier.md` + - Packer → `packer.md` + - Identifier → `identifier.md` + +### 3. Follow OFRAK Patterns + +**Research similar implementations before writing code:** + +1. Search for similar components in the codebase +2. Read similar component implementations +3. Follow the same patterns and structure +4. Check `references/component_patterns.md` for templates + +### 4. Focused Contributions + +**Each PR should focus on ONE change:** + +- Don't mix features, bugfixes, and refactoring +- Break large changes into multiple focused PRs +- Update appropriate CHANGELOG.md files +- Use `#PLACEHOLDER` for PR numbers in changelog + +## Decision Tree: Which Component Type? + +``` +Do you need to IDENTIFY a file format? + └─> Identifier (adds tags, no data extraction) + +Do you need to EXTRACT INFORMATION (metadata, headers)? + └─> Analyzer (returns attributes, no children) + +Do you need to EXTRACT CONTENT (files, sections, embedded data)? + └─> Unpacker (creates children, no modification) + +Do you need to MODIFY DATA (patch, replace, inject)? + └─> Modifier (changes data, no children) + +Do you need to REBUILD from modified children? + └─> Packer (reconstructs parent from children) +``` + +## Common Component Combinations + +**New file format support:** +1. Identifier - detect the format +2. Analyzer - extract metadata +3. Unpacker - extract embedded content +4. Packer - rebuild after modifications (optional) + +**Analysis only:** +1. Identifier - detect format +2. Analyzer - extract information +(No Unpacker/Modifier/Packer needed) + +**Binary patching:** +1. Identifier - detect format (may already exist) +2. Modifier - apply patches + +**Archive modification:** +1. Identifier - detect archive type +2. Unpacker - extract files +3. Modifier - modify extracted files +4. Packer - rebuild archive + +## Contribution Workflow + +### Step 0: Check for Existing Issues and PRs (Recommended) + +**Best practice: Check for duplicate work before starting:** + +To avoid duplicate effort, recommend the user check: +- Existing GitHub issues: https://github.com/redballoonsecurity/ofrak/issues +- Open pull requests: https://github.com/redballoonsecurity/ofrak/pulls + +**Search tips**: +- Use relevant keywords (e.g., "ZIP unpacker", "ELF analyzer", "memory leak") +- Check both open and closed issues/PRs +- Review maintainer comments on similar requests + +**If duplicate found**: +- For open issue: Comment on existing issue instead of creating new one +- For open PR: Consider collaborating on existing PR +- For closed issue/PR: Review why it was closed before proceeding + +**Creating new issues**: +- Large features: Issue first for discussion (recommended) +- Bug fixes: Issue optional but helpful for tracking +- Small improvements: Can go directly to PR + +### Step 1: Determine Task Type + +Is this a: +- **Script writing task?** → Follow Script Writing Workflow below +- **Component development task?** → Follow Component Development Workflow below +- **Bug fix task?** → Follow Bug Fix Workflow below +- **Refactoring task?** → Follow Refactoring Workflow below + +### Step 2A: Script Writing Workflow + +For writing standalone OFRAK scripts: + +**Follow the detailed 7-step workflow in `references/ofrak_script_patterns.md`** + +Key requirements: +- Main function: `async def main(ofrak_context: OFRAKContext, ...)` +- Use `ofrak.run(main, ...)` in `if __name__ == "__main__"` +- Include argparse for CLI arguments +- Only use components that actually exist (verify in `references/ofrak_usage_guide.md`) + +### Step 2B: Component Development Workflow + +For creating new OFRAK components: + +**Follow the detailed 11-step workflow in `references/component_patterns.md` → "Component Development Workflow"** + +Quick summary: +1. **Read documentation** (MANDATORY) - getting-started.md + component-specific guide +2. **Research similar components** - Search codebase for patterns +3. **Use component template** - Start from `assets/component_template.py.template` + +4. **Analyze implementation approach** (if using external tools): + + **CRITICAL: Perform Python vs External Tool analysis** + + If component needs external tools, evaluate: + + **Use External Tool When:** + - Tool is widely used and well-tested (e.g., `7z`, `squashfs-tools`) + - Format is complex (filesystems, compression algorithms) + - Tool is **cross-platform** (macOS/Linux/Windows) + - Tool has stable API/output format + - Performance is critical + - Reimplementation would be error-prone + + **Use Pure Python When:** + - Format is simple or good Python libraries exist + - No suitable cross-platform external tool available + - External tool would add heavy dependency + - Need fine-grained control over parsing + + **Cross-Platform Requirements:** + - External tools MUST work on macOS, Linux, AND Windows + - Verify availability in package managers (brew, apt, chocolatey) + - Document installation requirements + - ❌ Avoid: Linux-only tools, kernel modules, platform-specific utilities + + See `references/component_patterns.md` for detailed analysis framework. + +5. **Implement using Write/Test/Evaluate Loop** (CRITICAL): + + **Implementation → Test → Evaluate → Repeat until ✅** + + **A. Write Implementation:** + - Follow structure from documentation and similar components + - Use proper type annotations and comprehensive docstrings + - Handle errors appropriately, match OFRAK coding style + - **If adding dependencies**: See `references/contributing_guidelines.md` → "Dependency Management" + - Python modules → Pin to latest stable version in package's `requirements.txt` + - Apt packages → Add to package's `Dockerstub` + - Avoid dependencies requiring build from source + + **B. Create Tests Automatically:** + - Use `assets/test_template.py.template` as starting point + - Follow `references/testing_patterns.md` patterns + - Cover all code paths, edge cases, and error conditions + - **NEVER MOCK** - test with real code, real tools, real binary data + - **Test Data Strategy**: + - Write tests assuming real data exists in `tests/components/assets/` + - Reference asset files by path (e.g., `tests/components/assets/sample.dmg`) + - Instruct user to place real test files: "Place test file at `tests/components/assets/sample.dmg`" + - **Remind user**: Test data must be suitable for public distribution (self-created, public domain, or permissively licensed) + - ❌ Don't create synthetic data in test code + - ❌ Don't generate test files programmatically + + **C. Run and Evaluate:** + - Execute: `pytest path/to/test_file.py -v --cov=module_name` + - Check: ✅ All tests pass? ✅ 100% coverage? ✅ Edge cases covered? + - If NO → Fix code/tests → Re-run → Repeat until all ✅ + + **⚠️ USER MUST VERIFY:** + - LLM-generated code for bugs and quality standards + - Tests actually test functionality (not just coverage cheating) + - Real data usage where applicable + + **Do NOT proceed until: tests pass + 100% coverage + user verification.* + +6. **Update changelog**: + - Locate the appropriate CHANGELOG.md file for the modified package + - Most components go in `ofrak_core/CHANGELOG.md` + - Add entry with `#PLACEHOLDER` for PR number + - Follow format in `references/contributing_guidelines.md` + +### Step 2C: Bug Fix Workflow + +For fixing bugs in existing code: + +1. **Understand the bug** - Read error messages, stack traces +2. **Locate affected code** - Find the buggy component/function +3. **Check if tests exist**: + - If tests exist: Fix code and update tests + - If no tests: **Create tests first** (test-driven fix) +4. **Fix the bug** - Make minimal focused changes +5. **Verify fix** - Run tests to ensure bug is fixed +6. **Update changelog** - Add entry under "Fixed" section + +### Step 2D: Refactoring Workflow + +For refactoring existing code: + +1. **Check existing tests**: + - If tests missing: **Create tests first** + - Tests act as safety net for refactoring +2. **Plan refactoring** - What needs to change? +3. **Refactor incrementally** - Small steps, run tests after each +4. **Ensure tests still pass** - Verify behavior unchanged +5. **Update changelog** - Add entry under "Changed" section + +## Component Types and Patterns + +### Identifier Pattern + +```python +class MyFormatIdentifier(Identifier): + """Identify MyFormat files by checking signature.""" + + id = b"MyFormatIdentifier" + targets = () + + async def identify(self, resource: Resource, config=None) -> None: + data = await resource.get_data() + if data[:4] == b"MYFT": + resource.add_tag(MyFormat) +``` + +### Analyzer Pattern + +```python +class MyFormatAnalyzer(Analyzer[None, MyFormatAttributes]): + """Extract metadata from MyFormat files.""" + + id = b"MyFormatAnalyzer" + targets = (MyFormat,) + outputs = (MyFormatAttributes,) + + async def analyze(self, resource: Resource, config=None) -> MyFormatAttributes: + data = await resource.get_data() + # Extract and return attributes + return MyFormatAttributes(...) +``` + +### Unpacker Pattern + +```python +class MyFormatUnpacker(Unpacker[None]): + """Unpack MyFormat archives.""" + + id = b"MyFormatUnpacker" + targets = (MyFormat,) + children = (File,) + + async def unpack(self, resource: Resource, config=None) -> None: + data = await resource.get_data() + # Extract entries and create children + await resource.create_child(tags=(File,), data=entry_data, ...) +``` + +### Modifier Pattern + +```python +class MyModifier(Modifier[MyModifierConfig]): + """Modify MyFormat resources.""" + + id = b"MyModifier" + targets = (MyFormat,) + + async def modify(self, resource: Resource, config: MyModifierConfig) -> None: + data = await resource.get_data() + modified = transform(data, config) + resource.queue_patch(Range(0, len(data)), modified) +``` + +### Packer Pattern + +```python +class MyFormatPacker(Packer[None]): + """Pack MyFormat archives.""" + + id = b"MyFormatPacker" + targets = (MyFormat,) + + async def pack(self, resource: Resource, config=None) -> None: + children = await resource.get_children() + packed_data = build_archive(children) + resource.queue_patch(Range(0, original_size), packed_data) +``` + +## Changelog Management + +**Every change requires a changelog entry.** + +1. **Find correct changelog**: + - Locate the appropriate CHANGELOG.md file for the modified package + - Common locations: + - `ofrak_core/CHANGELOG.md` - Core components, formats, binary analysis + - `ofrak_patch_maker/CHANGELOG.md` - Patch maker modifications + - `disassemblers/ofrak_*/CHANGELOG.md` - Disassembler-specific changes + - If unsure, look for CHANGELOG.md in the same package as the modified file + +2. **Add entry format**: + ```markdown + ### [Added/Fixed/Changed/etc] + - Brief description ([#PLACEHOLDER](https://github.com/redballoonsecurity/ofrak/pull/PLACEHOLDER)) + ``` + +3. **Remind user**: + "Please update #PLACEHOLDER with actual PR number after creating PR" + +## Pull Request Preparation + +1. **Quality checklist**: Code follows patterns, tests pass (100% coverage), changelog updated with PLACEHOLDER, example provided +2. **PR description**: Read `ofrak/.github/pull_request_template.md`, fill all sections concisely (5-7 sentences max) +3. **Output**: Filled PR template to console for copy/paste +4. **Post-PR**: Update #PLACEHOLDER with actual PR number, link to related issues + +**Important**: Don't hardcode template format (read from file), don't add "Generated with Claude Code" attributions + +## Important Reminders + +### Always DO: +- ✅ **READ relevant contributor guides FIRST** (getting-started.md + component-specific) +- ✅ **RESEARCH similar components** before implementing +- ✅ **CREATE TESTS AUTOMATICALLY** - never skip, never ask +- ✅ **ENSURE 100% test coverage** (required by CI) +- ✅ **TEST WITH REAL DATA** - prefer actual binary samples over synthetic/mock data +- ✅ Follow OFRAK coding patterns and style +- ✅ Use proper type annotations throughout +- ✅ Include comprehensive docstrings +- ✅ Keep PRs focused on one change +- ✅ Update correct CHANGELOG.md file +- ✅ Create example usage script (output to console) +- ✅ Read actual PR template from ofrak/.github/pull_request_template.md (don't hardcode) +- ✅ Keep PR descriptions concise (5-7 sentences max) +- ✅ Use proper async/await patterns +- ✅ Handle errors gracefully + +### Never DO: +- ❌ Skip reading contributor documentation +- ❌ Ask user if they want tests - always create them +- ❌ Submit code without tests +- ❌ Miss coverage requirements (100% required) +- ❌ Create tests that cheat coverage without actually testing functionality +- ❌ Mix multiple unrelated changes in one PR +- ❌ Forget to update changelog +- ❌ Hardcode PR template format (always read from file) +- ❌ Make PR descriptions overly verbose +- ❌ Add "Generated with Claude Code" attributions by default +- ❌ Invent component names without verifying they exist +- ❌ Use synchronous patterns with OFRAK (must be async) +- ❌ Create documentation files unless explicitly requested + +### ⚠️ CRITICAL: User Must Verify LLM-Generated Code +**The user MUST manually review all LLM-generated contributions:** +- 🔍 **Check for bugs and errors** - LLMs can make mistakes +- 🔍 **Verify OFRAK code quality** - Ensure standards are maintained +- 🔍 **Validate tests** - Tests must actually test, not just achieve coverage +- 🔍 **Prefer real data** - Use actual binary samples when available, not just synthetic test data +- 🔍 **Review logic** - Ensure implementations are correct, not just plausible + +**AI-generated code is a starting point, not a finished product. User review is essential.** + +## Script Writing (Subset: OFRAK User) + +This skill includes all ofrak-user functionality. For script-only tasks: + +- Check `references/ofrak_script_patterns.md` for patterns +- Check `references/ofrak_usage_guide.md` for available components +- Use `assets/script_template.py.template` as starting point +- Follow proper async/await structure +- Only use verified, existing components + +See ofrak-user skill or references for comprehensive script writing guidance. + +## Additional Resources + +### Bundled References + +**For script writing:** +- `references/ofrak_script_patterns.md` - OFRAK script patterns and **7-step workflow** +- `references/ofrak_usage_guide.md` - Available components guide + +**For component development:** +- `references/component_patterns.md` - Component implementation patterns, detailed component type guide, and **11-step development workflow** +- `references/testing_patterns.md` - Comprehensive testing guide +- `references/contributing_guidelines.md` - Contribution standards + +### Bundled Assets + +- `assets/script_template.py.template` - OFRAK script template +- `assets/component_template.py.template` - Component implementation template +- `assets/test_template.py.template` - Test suite template + +### External Resources (in OFRAK repository) + +**Must read for contributions:** +- `ofrak/docs/contributor-guide/getting-started.md` - Coding standards, testing +- `ofrak/docs/contributor-guide/component/[type].md` - Component-specific guides + +**For reference:** +- `ofrak/examples/` - Example OFRAK scripts +- `ofrak/.github/pull_request_template.md` - PR template + +## Getting Help + +- **OFRAK Documentation**: Available in the cloned `ofrak` repository under `ofrak/docs/`. If you don't have the repo cloned, instruct the user to clone it: `git clone https://github.com/redballoonsecurity/ofrak.git` +- **GitHub Issues**: https://github.com/redballoonsecurity/ofrak/issues +- **Slack Community**: https://join.slack.com/t/ofrak/shared_invite/zt-1jku9h6r5-mY7CeeZ4AT8JVmu5YWw2Qg diff --git a/claude/ofrak-developer/assets/component_template.py.template b/claude/ofrak-developer/assets/component_template.py.template new file mode 100644 index 000000000..3fd935fe6 --- /dev/null +++ b/claude/ofrak-developer/assets/component_template.py.template @@ -0,0 +1,143 @@ +""" +[Component Name] - [Brief description of what this component does] + +This module provides [component type] functionality for [target resource type]. +""" +from dataclasses import dataclass +from typing import Optional + +from ofrak.component.[component_base] import [ComponentBaseClass] +from ofrak.model.component_model import ComponentConfig +from ofrak.model.resource_model import ResourceAttributes +from ofrak.resource import Resource +from ofrak.core.binary import GenericBinary + + +# ============================================================================= +# Configuration (if component needs configuration) +# ============================================================================= + +@dataclass +class MyComponentConfig(ComponentConfig): + """ + Configuration for MyComponent. + + Attributes: + option1: Description of option1 + option2: Description of option2 (has default value) + """ + option1: str + option2: int = 42 + + +# ============================================================================= +# Attributes (for Analyzers - what information this extracts) +# ============================================================================= + +@dataclass +class MyComponentAttributes(ResourceAttributes): + """ + Attributes extracted by MyComponent. + + Attributes: + field1: Description of field1 + field2: Description of field2 + """ + field1: int + field2: str + + +# ============================================================================= +# Component Implementation +# ============================================================================= + +class MyComponent([ComponentBaseClass]): + """ + [Detailed description of what this component does] + + This component [explain what it does, when to use it, any prerequisites]. + + For Identifiers: + - Detects [format/type] by checking [signature/structure] + - Adds [Tag] tag to matching resources + + For Analyzers: + - Extracts [information] from [resource type] + - Produces MyComponentAttributes with [fields] + - Requires [any prerequisites] + + For Unpackers: + - Unpacks [format] resources + - Creates [child type] children + - Handles [compression/encoding] + + For Modifiers: + - Modifies [what aspect] of resources + - Requires [configuration options] + - Preserves [what should be preserved] + + For Packers: + - Packs [resource type] from children + - Uses [compression/encoding] + - Reconstructs [format structure] + """ + + # Component ID (should match class name) - must be bytes + id = b"MyComponent" + + # For Identifiers: typically empty tuple + # For others: tuple of tags this component can operate on + targets = (GenericBinary,) + + # For Unpackers: types of children created + children = () + + # For Analyzers: attributes produced + outputs = (MyComponentAttributes,) + + # If component uses external tools, list them here + # external_dependencies = ("tool-name",) + + async def run(self, resource: Resource, config: Optional[MyComponentConfig] = None) -> None: + """ + Execute the component. + + For Identifiers, use: async def identify(self, resource: Resource, config=None) -> None + For Analyzers, use: async def analyze(self, resource: Resource, config=None) -> MyComponentAttributes + For Unpackers, use: async def unpack(self, resource: Resource, config=None) -> None + For Modifiers, use: async def modify(self, resource: Resource, config: MyComponentConfig) -> None + For Packers, use: async def pack(self, resource: Resource, config=None) -> None + + Args: + resource: The resource to operate on + config: Component configuration (may be None if no config needed) + """ + # Get resource data + data = await resource.get_data() + + # TODO: Implement component logic + # For Identifiers: + # if matches_format(data): + # resource.add_tag(MyFormat) + + # For Analyzers: + # field1 = extract_field1(data) + # field2 = extract_field2(data) + # return MyComponentAttributes(field1=field1, field2=field2) + + # For Unpackers: + # for entry in parse_entries(data): + # await resource.create_child( + # tags=(File,), + # data=entry.data, + # attributes=(File(entry.name, entry.size),) + # ) + + # For Modifiers: + # modified_data = modify_data(data, config) + # resource.queue_patch(Range(0, len(data)), modified_data) + + # For Packers: + # children = await resource.get_children() + # packed_data = pack_children(children) + # resource.queue_patch(Range(0, await resource.get_data_length()), packed_data) diff --git a/claude/ofrak-developer/assets/script_template.py.template b/claude/ofrak-developer/assets/script_template.py.template new file mode 100644 index 000000000..c3983a53f --- /dev/null +++ b/claude/ofrak-developer/assets/script_template.py.template @@ -0,0 +1,65 @@ +""" +[Brief description of what this script does] + +Usage: + python script_name.py [options] + +Example: + python script_name.py firmware.bin --output extracted/ +""" +import argparse +from ofrak import OFRAK, OFRAKContext + +# Import OFRAK components and views as needed +# from ofrak.core.binary import BinaryPatchConfig, BinaryPatchModifier +# from ofrak.core.strings import StringsAnalyzer, StringsAttributes +# from ofrak.core.elf.model import Elf +# from ofrak.core.filesystem import File + + +async def main(ofrak_context: OFRAKContext, input_file: str, output_file: str): + """ + Main function that performs OFRAK operations. + + Args: + ofrak_context: The OFRAK context for creating and managing resources + input_file: Path to input binary file + output_file: Path to output file + """ + # Load the binary file + print(f"Loading {input_file}...") + root_resource = await ofrak_context.create_root_resource_from_file(input_file) + + # Perform your analysis/modification here + # Example: Unpack the resource + # await root_resource.unpack() + + # Example: Run an analyzer + # await root_resource.run(StringsAnalyzer) + # strings = await root_resource.analyze(StringsAttributes) + + # Example: Modify the resource + # config = BinaryPatchConfig(offset=0x1000, patch=b"\x90\x90") + # await root_resource.run(BinaryPatchModifier, config) + + # Save the results + print(f"Saving results to {output_file}...") + await root_resource.flush_data_to_disk(output_file) + + print("Done!") + + +if __name__ == "__main__": + # Set up command line argument parsing + parser = argparse.ArgumentParser(description="[Brief description of what this script does]") + parser.add_argument("input_file", help="Path to the input binary file") + parser.add_argument("output_file", help="Path to save the output file") + parser.add_argument( + "--option", type=int, default=42, help="An optional parameter (default: 42)" + ) + + args = parser.parse_args() + + # Create OFRAK instance and run the main function + ofrak = OFRAK() + ofrak.run(main, args.input_file, args.output_file) diff --git a/claude/ofrak-developer/assets/test_template.py.template b/claude/ofrak-developer/assets/test_template.py.template new file mode 100644 index 000000000..14abed54d --- /dev/null +++ b/claude/ofrak-developer/assets/test_template.py.template @@ -0,0 +1,279 @@ +""" +Tests for [ComponentName]. + +This module contains comprehensive tests for [ComponentName] to ensure +100% code coverage as required by OFRAK. +""" +import pytest +from ofrak import OFRAKContext +from ofrak.resource import Resource + +from my_module import MyComponent, MyComponentConfig + + +class TestMyComponent: + """ + Test suite for MyComponent. + + Tests cover: + - Basic functionality + - Edge cases (empty input, large input, boundary conditions) + - Error handling + - Integration with other components + """ + + # ========================================================================= + # Fixtures + # ========================================================================= + + @pytest.fixture + async def test_resource(self, ofrak_context: OFRAKContext) -> Resource: + """ + Create a basic test resource for MyComponent tests. + + Args: + ofrak_context: OFRAK context fixture (provided by framework) + + Returns: + Resource ready for testing + """ + # Create test data + test_data = b"test binary data" + + # Create resource + resource = await ofrak_context.create_root_resource("test.bin", test_data) + + # Add any required tags + # resource.add_tag(MyFormat) + + # Run any prerequisite components + # await resource.run(PrerequisiteAnalyzer) + + return resource + + @pytest.fixture + def sample_data(self) -> bytes: + """ + Provide sample data for testing. + + Returns: + Sample binary data + """ + return b"sample test data" + + # ========================================================================= + # Basic Functionality Tests + # ========================================================================= + + async def test_component_basic_functionality( + self, test_resource: Resource, ofrak_context: OFRAKContext + ): + """ + Test MyComponent performs its basic function correctly. + + For Identifiers: Test that matching resources get tagged + For Analyzers: Test that attributes are extracted correctly + For Unpackers: Test that children are created correctly + For Modifiers: Test that data is modified as expected + For Packers: Test that children are packed correctly + """ + # Run component + await test_resource.run(MyComponent) + + # Verify results + # For Identifiers: + # assert test_resource.has_tag(MyFormat) + + # For Analyzers: + # attrs = await test_resource.analyze(MyComponentAttributes) + # assert attrs.field1 == expected_value1 + # assert attrs.field2 == expected_value2 + + # For Unpackers: + # children = await test_resource.get_children() + # assert len(children) == expected_count + # child_data = await children[0].get_data() + # assert child_data == expected_data + + # For Modifiers: + # modified_data = await test_resource.get_data() + # assert modified_data == expected_modified_data + + # For Packers: + # packed_data = await test_resource.get_data() + # assert packed_data[:4] == b"MAGIC" # Check format signature + + async def test_component_with_config( + self, test_resource: Resource, ofrak_context: OFRAKContext + ): + """Test MyComponent with custom configuration.""" + # Create configuration + config = MyComponentConfig(option1="custom_value", option2=123) + + # Run component with config + await test_resource.run(MyComponent, config) + + # Verify config was applied + result = await test_resource.get_data() + # assert result == expected_output_with_config + + # ========================================================================= + # Edge Case Tests + # ========================================================================= + + async def test_empty_input(self, ofrak_context: OFRAKContext): + """Test component handles empty input gracefully.""" + empty_resource = await ofrak_context.create_root_resource("empty.bin", b"") + + # Should either handle gracefully or raise appropriate error + # If should handle: + await empty_resource.run(MyComponent) + # assert appropriate behavior + + # If should raise error: + # with pytest.raises(ValueError): + # await empty_resource.run(MyComponent) + + async def test_large_input(self, ofrak_context: OFRAKContext): + """Test component handles large inputs.""" + large_data = b"x" * 1024 * 1024 # 1 MB + resource = await ofrak_context.create_root_resource("large.bin", large_data) + + await resource.run(MyComponent) + + # Verify component handled large input correctly + # result = await resource.get_data() + # assert len(result) == expected_size + + async def test_boundary_conditions(self, ofrak_context: OFRAKContext): + """Test component at boundary conditions.""" + # Test at boundaries (e.g., minimum size, maximum values, etc.) + boundary_data = create_boundary_test_data() + resource = await ofrak_context.create_root_resource("boundary.bin", boundary_data) + + await resource.run(MyComponent) + + # Verify behavior at boundaries + # assert appropriate behavior + + # ========================================================================= + # Error Handling Tests + # ========================================================================= + + async def test_invalid_input_raises_error(self, ofrak_context: OFRAKContext): + """Test component raises appropriate error for invalid input.""" + invalid_data = b"INVALID_FORMAT" + resource = await ofrak_context.create_root_resource("invalid.bin", invalid_data) + + with pytest.raises(ValueError, match="Invalid format"): + await resource.run(MyComponent) + + async def test_missing_prerequisite_raises_error(self, ofrak_context: OFRAKContext): + """Test component raises error when prerequisite is missing.""" + resource = await ofrak_context.create_root_resource("test.bin", b"data") + + # Don't run prerequisite component + + with pytest.raises(ComponentDependencyError): + await resource.run(MyComponent) + + async def test_handles_corrupted_data(self, ofrak_context: OFRAKContext): + """Test component handles corrupted data gracefully.""" + corrupted_data = create_corrupted_test_data() + resource = await ofrak_context.create_root_resource("corrupted.bin", corrupted_data) + + # Should either handle or raise appropriate error + with pytest.raises(ValueError, match="Corrupted"): + await resource.run(MyComponent) + + # ========================================================================= + # Parameterized Tests + # ========================================================================= + + @pytest.mark.parametrize( + "input_data,expected_output", + [ + (b"input1", b"output1"), + (b"input2", b"output2"), + (b"input3", b"output3"), + ], + ) + async def test_multiple_cases( + self, ofrak_context: OFRAKContext, input_data: bytes, expected_output: bytes + ): + """Test component with multiple input/output pairs.""" + resource = await ofrak_context.create_root_resource("test.bin", input_data) + + await resource.run(MyComponent) + + result = await resource.get_data() + assert result == expected_output + + # ========================================================================= + # Integration Tests + # ========================================================================= + + async def test_integration_with_other_components(self, ofrak_context: OFRAKContext): + """Test MyComponent works correctly with other components.""" + resource = await ofrak_context.create_root_resource("test.bin", b"test data") + + # Run prerequisite components + await resource.run(PrerequisiteComponent) + + # Run this component + await resource.run(MyComponent) + + # Run dependent components + await resource.run(DependentComponent) + + # Verify full workflow + # result = await resource.get_data() + # assert result == expected_final_result + + async def test_full_workflow(self, ofrak_context: OFRAKContext): + """Test complete workflow: identify -> analyze -> unpack -> modify -> pack.""" + # Load or create test resource + resource = await ofrak_context.create_root_resource("test.bin", b"test data") + + # Identify + await resource.run(MyIdentifier) + assert resource.has_tag(MyFormat) + + # Analyze + await resource.run(MyAnalyzer) + attrs = await resource.analyze(MyFormatAttributes) + + # Unpack + await resource.unpack() + children = await resource.get_children() + assert len(children) > 0 + + # Modify + await resource.run(MyModifier) + + # Pack + await resource.pack() + + # Verify final result + final_data = await resource.get_data() + # assert final_data meets expectations + + +# ============================================================================= +# Helper Functions +# ============================================================================= + + +def create_boundary_test_data() -> bytes: + """Create test data at boundary conditions.""" + return b"boundary test data" + + +def create_corrupted_test_data() -> bytes: + """Create corrupted test data.""" + return b"corrupted data" + + +def create_valid_test_data() -> bytes: + """Create valid test data.""" + return b"valid test data" diff --git a/claude/ofrak-developer/references/component_patterns.md b/claude/ofrak-developer/references/component_patterns.md new file mode 100644 index 000000000..e9e2d5d7e --- /dev/null +++ b/claude/ofrak-developer/references/component_patterns.md @@ -0,0 +1,802 @@ +# OFRAK Component Patterns + +This document covers common patterns for implementing OFRAK components (Identifiers, Analyzers, Unpackers, Modifiers, and Packers). + +## Component Base Classes + +OFRAK provides five main component types: + +1. **Identifier** - Detects resource types and adds appropriate tags +2. **Analyzer** - Extracts information and creates attributes +3. **Unpacker** - Extracts embedded content and creates child resources +4. **Modifier** - Modifies resources (patches, injections, etc.) +5. **Packer** - Packs/compresses resources + +## Understanding Component Types + +**CRITICAL: Deep understanding of when to use each component type.** + +Choosing the wrong component type will cause implementation issues. This section provides detailed guidance for each type. + +### Identifier + +**Purpose**: Detect and tag resource types based on file signatures, magic bytes, or structure. + +**When to use**: +- Need to recognize a new file format +- Detect specific file types (ELF, PE, ZIP, etc.) +- Add tags to resources without extracting data + +**Key characteristics**: +- Reads data to check signatures/structure +- Adds tags using `resource.add_tag(TagClass)` +- Does NOT create attributes or children +- Does NOT modify data +- Typically has empty `targets` tuple (runs on unidentified resources) + +**Example use cases**: +- Identify ZIP files by checking `PK\x03\x04` signature +- Detect ELF binaries by checking magic number `\x7fELF` +- Recognize custom firmware formats + +**Common mistakes**: +- Don't use Identifier to extract metadata - use Analyzer for that +- Don't create children in Identifier - use Unpacker for that + +### Analyzer + +**Purpose**: Extract information and metadata from resources, creating attributes. + +**When to use**: +- Need to parse headers or metadata +- Extract configuration or properties +- Gather information without changing the resource + +**Key characteristics**: +- Returns `ResourceAttributes` with extracted data +- Does NOT create child resources +- Does NOT modify data +- Defines `targets` (what resource types it analyzes) +- Defines `outputs` (what attributes it produces) + +**Example use cases**: +- Parse ELF header to extract entry point, architecture, sections +- Extract ZIP metadata (compression method, file count) +- Analyze PE headers for imports/exports +- Extract firmware version information +- Parse configuration data from binaries + +**Common mistakes**: +- Don't use Analyzer to create children - that's Unpacker's job +- Don't modify resource data in Analyzer - use Modifier for that +- Don't tag resources in Analyzer - use Identifier for that + +### Unpacker + +**Purpose**: Extract embedded content, creating child resources. + +**When to use**: +- Need to extract files from archives +- Decompress or decrypt embedded data +- Split a resource into meaningful parts (sections, segments, etc.) + +**Key characteristics**: +- Creates child resources using `await resource.create_child()` +- Does NOT return attributes (use Analyzer for that) +- Does NOT modify parent data +- Defines `targets` (what resource types it unpacks) +- Defines `children` (what types of children it creates) + +**Example use cases**: +- Extract files from ZIP/TAR archives +- Unpack ELF sections and segments +- Extract firmware partitions +- Decompress LZMA/GZIP data +- Extract embedded filesystems + +**Common mistakes**: +- Don't modify parent data in Unpacker - use Modifier for that +- Don't return attributes in Unpacker - use Analyzer for that +- Don't forget to tag children appropriately (use `tags=(File,)` etc.) + +### Modifier + +**Purpose**: Modify resource data (patch, inject, transform). + +**When to use**: +- Need to patch bytes at specific offsets +- Replace strings or values +- Inject code or data +- Transform data in-place + +**Key characteristics**: +- Uses `resource.queue_patch()` to queue modifications (NOT async) +- Must call `await resource.save()` to apply queued patches +- Does NOT create children +- Does NOT return attributes +- Takes configuration specifying what to modify +- Defines `targets` (what resource types it can modify) + +**Example use cases**: +- Patch bytes at offset (NOP instructions, change values) +- Replace strings in binaries +- Inject shellcode or payloads +- Change configuration values +- Modify firmware checksums + +**Common mistakes**: +- Must use `queue_patch()`, not direct data modification +- Don't forget to call `await resource.save()` after queueing patches +- Don't create children in Modifier - use Unpacker for that +- Don't forget to handle offset calculations correctly + +### Packer + +**Purpose**: Reconstruct parent resource from modified children (reverse of Unpacker). + +**When to use**: +- Need to rebuild archive after modifying extracted files +- Recompress data after changes +- Reconstruct binary format after child modifications + +**Key characteristics**: +- Reads children and rebuilds parent data +- Uses `resource.queue_patch()` to queue updates (NOT async) +- Must call `await resource.save()` to apply queued patches +- Pairs with corresponding Unpacker +- Defines `targets` (what resource types it packs) + +**Example use cases**: +- Rebuild ZIP archive after modifying files +- Reconstruct ELF after modifying sections +- Recompress firmware after patches +- Pack squashfs after file changes +- Rebuild TAR archives + +**Common mistakes**: +- Packer is NOT always needed - only when format requires reconstruction +- Must match the structure created by corresponding Unpacker +- Don't forget to update checksums/sizes if format requires them + +### Quick Reference Table + +| Component Type | Creates Children | Returns Attributes | Modifies Data | Primary Use | +|---------------|------------------|-------------------|---------------|-------------| +| **Identifier** | ❌ | ❌ | ❌ | Detect file format | +| **Analyzer** | ❌ | ✅ | ❌ | Extract metadata | +| **Unpacker** | ✅ | ❌ | ❌ | Extract content | +| **Modifier** | ❌ | ❌ | ✅ | Patch/transform | +| **Packer** | ❌ | ❌ | ✅ | Rebuild from children | + +## Component Development Workflow + +When adding a new component (using Unpacker as example), follow these steps: + +1. **READ contributor documentation (MANDATORY)** + - Read `ofrak/docs/contributor-guide/getting-started.md` + - Read component-specific guide: `ofrak/docs/contributor-guide/component/unpacker.md` + +2. **SEARCH for similar components** + - Use Glob to find similar implementations: `Glob("**/zip*.py")`, `Glob("**/tar*.py")` + - Search for components of the same type + - Look for similar file formats or functionality + +3. **READ similar unpacker implementation** + - Study how existing components are structured + - Note patterns: how they create children, handle errors, tag resources + - Understand the coding style and conventions + +4. **USE assets/component_template.py as starting point** + - Start from the provided template + - Follow structure from similar components + - Adapt template to your specific needs + +5. **IMPLEMENT unpacker following patterns** + - Follow structure from documentation and similar components + - Define `targets` (what resource types to unpack) + - Define `children` (what types of children are created) + - Use `await resource.create_child()` to create children + - Tag children appropriately + +6. **CREATE TESTS AUTOMATICALLY** + - Use `assets/test_template.py` as starting point + - Follow patterns from `references/testing_patterns.md` + - Test with real binary data (not mocks) + - Cover edge cases: empty archives, corrupted data, large files + +7. **ENSURE 100% coverage** + - Execute: `pytest path/to/test_file.py -v --cov=module_name` + - Fix any failures + - Add tests until all code paths are covered + +8. **CREATE example script (output to console)** + - Write example usage showing practical application + - Follow patterns from `ofrak/examples/` + - Output to console, don't create a file + - Show how to use the new component + +9. **UPDATE changelog with #PLACEHOLDER** + - Find appropriate CHANGELOG.md (likely `ofrak_core/CHANGELOG.md`) + - Add entry under "Added" section + - Format: `- Add support for XYZ format unpacking ([#PLACEHOLDER](...)` + +10. **READ ofrak/.github/pull_request_template.md** + - Read the actual PR template from the file + - Don't hardcode or assume format + +11. **FILL and OUTPUT PR template for easy copy/paste** + - Fill in all sections concisely (5-7 sentences max) + - Include links to related issues or "N/A" + - Output to console for easy copying + - Remind user to update #PLACEHOLDER with actual PR number + +**Note**: These steps apply to all component types (Identifier, Analyzer, Unpacker, Modifier, Packer). Adjust step 1 to read the appropriate component-specific guide. + +## General Component Structure + +All components follow this basic pattern: + +```python +from dataclasses import dataclass +from ofrak.component.abstract import ComponentSubprocessRunner +from ofrak.resource import Resource +from ofrak.model.component_model import ComponentConfig + +@dataclass +class MyComponentConfig(ComponentConfig): + """Configuration for MyComponent.""" + option1: str + option2: int = 42 # Default value + +class MyComponent(ComponentSubprocessRunner): + """ + Brief description of what this component does. + + Detailed explanation of component behavior, requirements, etc. + """ + + # Component metadata (id must be bytes) + id = b"MyComponent" + + async def run(self, resource: Resource, config: MyComponentConfig) -> None: + """ + Execute the component logic. + + Args: + resource: The resource to operate on + config: Component configuration + """ + # Implementation here +``` + +## Identifier Pattern + +Identifiers detect resource types and add appropriate tags. + +```python +from ofrak.component.identifier import Identifier +from ofrak.resource import Resource +from ofrak.core.magic import MagicMimeIdentifier, MagicDescriptionIdentifier + +class MyFormatIdentifier(Identifier): + """ + Identify MyFormat files by checking file signature. + """ + + id = b"MyFormatIdentifier" + + # Identifiers to run before this one + # This ensures magic has been run to get basic file info + targets = () + + async def identify(self, resource: Resource, config=None) -> None: + """ + Identify MyFormat files and add MyFormat tag. + + Args: + resource: Resource to identify + config: Unused + """ + # Get file data + data = await resource.get_data() + + # Check for MyFormat signature (magic bytes) + if data[:4] == b"MYFT": + # Add the tag + resource.add_tag(MyFormat) +``` + +**Key points:** +- Inherit from `Identifier` +- Implement `identify()` method +- Add tags using `resource.add_tag(TagClass)` +- Check file signatures, magic numbers, or structure +- Don't create attributes or children (that's for Analyzers/Unpackers) + +## Analyzer Pattern + +Analyzers extract information and create attributes. + +```python +from dataclasses import dataclass +from ofrak.component.analyzer import Analyzer +from ofrak.model.resource_model import ResourceAttributes +from ofrak.resource import Resource + +@dataclass +class MyFormatAttributes(ResourceAttributes): + """Attributes for MyFormat resources.""" + version: int + compression_type: str + entry_count: int + +class MyFormatAnalyzer(Analyzer[None, MyFormatAttributes]): + """ + Analyze MyFormat files and extract metadata. + """ + + id = b"MyFormatAnalyzer" + targets = (MyFormat,) # Only run on MyFormat resources + outputs = (MyFormatAttributes,) # What this analyzer produces + + async def analyze(self, resource: Resource, config=None) -> MyFormatAttributes: + """ + Extract MyFormat metadata. + + Args: + resource: MyFormat resource to analyze + config: Unused + + Returns: + MyFormatAttributes with extracted metadata + """ + data = await resource.get_data() + + # Parse header + version = int.from_bytes(data[4:6], "little") + compression_type = data[6:10].decode("ascii") + entry_count = int.from_bytes(data[10:14], "little") + + return MyFormatAttributes( + version=version, + compression_type=compression_type, + entry_count=entry_count + ) +``` + +**Key points:** +- Inherit from `Analyzer[ConfigType, OutputAttributeType]` +- Define `targets` - what resource types this analyzer applies to +- Define `outputs` - what attributes this analyzer produces +- Implement `analyze()` method that returns attributes +- Don't modify resource or create children + +## Unpacker Pattern + +Unpackers extract embedded content and create child resources. + +```python +from ofrak.component.unpacker import Unpacker +from ofrak.resource import Resource +from ofrak.core.filesystem import File + +class MyFormatUnpacker(Unpacker[None]): + """ + Unpack MyFormat archives and extract contained files. + """ + + id = b"MyFormatUnpacker" + targets = (MyFormat,) # Only run on MyFormat resources + children = (File,) # What types of children this creates + + async def unpack(self, resource: Resource, config=None) -> None: + """ + Extract files from MyFormat archive. + + Args: + resource: MyFormat archive resource + config: Unused + """ + data = await resource.get_data() + + # Get format attributes (assumes analyzer ran first) + attrs = await resource.analyze(MyFormatAttributes) + + # Parse entries + offset = 14 # After header + for i in range(attrs.entry_count): + # Read entry metadata + name_len = int.from_bytes(data[offset:offset+2], "little") + offset += 2 + name = data[offset:offset+name_len].decode("utf-8") + offset += name_len + + file_size = int.from_bytes(data[offset:offset+4], "little") + offset += 4 + + # Extract file data + file_data = data[offset:offset+file_size] + offset += file_size + + # Create child resource for this file + await resource.create_child( + tags=(File,), + data=file_data, + attributes=(File(name, file_size),) + ) +``` + +**Key points:** +- Inherit from `Unpacker[ConfigType]` +- Define `targets` - what this unpacker can unpack +- Define `children` - what types of children it creates +- Implement `unpack()` method +- Create children using `resource.create_child()` +- Can use attributes from analyzers + +## Modifier Pattern + +Modifiers change resource data. + +```python +from dataclasses import dataclass +from ofrak.component.modifier import Modifier +from ofrak.model.component_model import ComponentConfig +from ofrak.resource import Resource + +@dataclass +class MyModifierConfig(ComponentConfig): + """Configuration for MyModifier.""" + target_string: bytes + replacement: bytes + +class MyModifier(Modifier[MyModifierConfig]): + """ + Replace occurrences of a string in MyFormat resources. + """ + + id = b"MyModifier" + targets = (MyFormat,) + + async def modify(self, resource: Resource, config: MyModifierConfig) -> None: + """ + Replace target string with replacement in resource data. + + Args: + resource: Resource to modify + config: Modification configuration + """ + # Get current data + data = await resource.get_data() + + # Perform modification + modified_data = data.replace(config.target_string, config.replacement) + + # Queue modification (NOT async) + resource.queue_patch(Range(0, len(data)), modified_data) + + # Apply the patch (required!) + await resource.save() +``` + +**Key points:** +- Inherit from `Modifier[ConfigType]` +- Define `targets` - what this modifier can modify +- Implement `modify()` method +- Use `resource.queue_patch()` to queue modifications (NOT async) +- Call `await resource.save()` to apply queued patches +- Don't directly write data - use patching system + +## Packer Pattern + +Packers compress or pack resources (reverse of unpackers). + +```python +from ofrak.component.packer import Packer +from ofrak.resource import Resource +from ofrak.core.filesystem import File + +class MyFormatPacker(Packer[None]): + """ + Pack files into MyFormat archive. + """ + + id = b"MyFormatPacker" + targets = (MyFormat,) + + async def pack(self, resource: Resource, config=None) -> None: + """ + Pack child files into MyFormat archive format. + + Args: + resource: MyFormat resource with children to pack + config: Unused + """ + # Get all file children + children = await resource.get_children_as_view(File) + + # Build header + header = bytearray(b"MYFT") # Magic + header.extend((1).to_bytes(2, "little")) # Version + header.extend(b"NONE") # Compression type + header.extend(len(children).to_bytes(4, "little")) # Entry count + + # Build entries + entries = bytearray() + for child_file in children: + # Get file data + child_data = await child_file.resource.get_data() + + # Write entry + name_bytes = child_file.name.encode("utf-8") + entries.extend(len(name_bytes).to_bytes(2, "little")) + entries.extend(name_bytes) + entries.extend(len(child_data).to_bytes(4, "little")) + entries.extend(child_data) + + # Combine and queue patch (NOT async) + packed_data = bytes(header + entries) + original_size = await resource.get_data_length() + resource.queue_patch(Range(0, original_size), packed_data) + + # Apply the patch (required!) + await resource.save() +``` + +**Key points:** +- Inherit from `Packer[ConfigType]` +- Define `targets` - what this packer can pack +- Implement `pack()` method +- Reconstruct parent data from children +- Use `resource.queue_patch()` to queue updates (NOT async) +- Call `await resource.save()` to apply queued patches + +## External Tool Integration + +**IMPORTANT: When adding external dependencies, see `contributing_guidelines.md` → "Dependency Management" section.** + +- **Python dependencies** → Update package's `requirements.txt` (pinned to latest stable version) +- **Apt dependencies** → Update package's `Dockerstub` +- **Avoid** dependencies requiring build from source + +For components that use external tools: + +```python +from ofrak.component.abstract import ComponentSubprocessRunner +from ofrak.core.binary import GenericBinary +import tempfile +import subprocess + +class MyExternalToolComponent(ComponentSubprocessRunner): + """ + Component that uses an external tool. + """ + + id = b"MyExternalToolComponent" + targets = (GenericBinary,) + + # External dependencies + external_dependencies = ("my-external-tool",) + + async def run(self, resource: Resource, config=None) -> None: + """ + Run external tool on resource. + + Args: + resource: Resource to process + config: Unused + """ + # Write data to temp file + with tempfile.NamedTemporaryFile(suffix=".bin", delete=False) as f: + temp_path = f.name + data = await resource.get_data() + f.write(data) + + try: + # Run external tool + result = subprocess.run( + ["my-external-tool", "--option", temp_path], + capture_output=True, + check=True + ) + + # Process results + output = result.stdout.decode("utf-8") + # ... handle output ... + + finally: + # Clean up temp file + import os + os.unlink(temp_path) +``` + +**Key points:** +- Inherit from `ComponentSubprocessRunner` for external tools +- Declare `external_dependencies` tuple +- **CRITICAL: Only use cross-platform external tools (macOS/Linux/Windows compatible)** +- Perform pro/con analysis before choosing Python vs external tool +- Use `tempfile` for temporary file I/O +- Always clean up temporary files +- Handle subprocess errors properly + +## Choosing Between Pure Python vs External Tools + +**When deciding implementation approach, analyze trade-offs:** + +### Use External Tool When: +- ✅ Tool is widely used and well-tested (e.g., `7z`, `squashfs-tools`) +- ✅ Format is complex (filesystems, compression algorithms) +- ✅ Tool is cross-platform (available on macOS/Linux/Windows) +- ✅ Tool has stable API/output format +- ✅ Performance is critical (native code often faster) +- ✅ Reimplementation would be error-prone + +**Examples**: `7z` for archives, `unsquashfs` for SquashFS, `e2fsprogs` for ext2/3/4 + +### Use Pure Python When: +- ✅ Format is simple or well-documented +- ✅ Good Python libraries exist (e.g., `zipfile`, `tarfile`) +- ✅ No suitable cross-platform external tool available +- ✅ External tool would add heavy dependency +- ✅ Need fine-grained control over parsing +- ✅ Easier testing and debugging + +**Examples**: ZIP (use `zipfile`), TAR (use `tarfile`), JSON parsing + +### Cross-Platform Requirements: +**External tools MUST work on all three platforms:** +- macOS +- Linux (various distributions) +- Windows + +**How to verify cross-platform compatibility:** +1. Check if tool is in standard package managers: + - macOS: Homebrew (`brew`) + - Linux: apt, yum, pacman + - Windows: chocolatey, scoop +2. Test on multiple platforms or research tool availability +3. Document installation requirements in component docstring + +**Red flags (avoid these):** +- ❌ Linux-only tools without Windows/macOS alternatives +- ❌ Platform-specific utilities (`dd`, `losetup` without alternatives) +- ❌ Tools requiring kernel modules or drivers +- ❌ Tools with incompatible versions across platforms + +## Resource View Pattern + +Resource views provide convenient access to resources with specific tags: + +```python +from dataclasses import dataclass +from ofrak.model.resource_model import ResourceAttributes +from ofrak.resource_view import ResourceView +from ofrak.core.addressable import Addressable + +@dataclass +class MyFormatAttributes(ResourceAttributes): + version: int + entry_count: int + +class MyFormat(ResourceView): + """ + View for MyFormat resources. + + Provides convenient access to MyFormat-specific attributes. + """ + + # Required views that this view depends on + # Empty if no dependencies + view_dependencies = () + + # Attributes this view uses + # Can be retrieved with self. + + async def get_version(self) -> int: + """Get MyFormat version.""" + attrs = await self.resource.analyze(MyFormatAttributes) + return attrs.version + + async def extract_entry(self, index: int) -> bytes: + """ + Extract specific entry by index. + + Args: + index: Entry index to extract + + Returns: + Entry data as bytes + """ + # Implementation + pass +``` + +## Common Patterns + +### Pattern: Checking Dependencies + +```python +async def analyze(self, resource: Resource, config=None): + # Ensure required analyzer has run + if not resource.has_attributes(RequiredAttributes): + await resource.analyze(RequiredAttributes) + + required_attrs = await resource.analyze(RequiredAttributes) + # Use required_attrs... +``` + +### Pattern: Creating Tagged Children + +```python +# Create child with multiple tags +await resource.create_child( + tags=(File, ExecutableFile), + data=file_data, + attributes=( + File(name="binary.elf", size=len(file_data)), + ) +) +``` + +### Pattern: Handling Errors Gracefully + +```python +try: + result = external_tool_call() +except subprocess.CalledProcessError as e: + raise ComponentError( + f"External tool failed: {e.stderr.decode()}" + ) +``` + +### Pattern: Lazy Attribute Access + +```python +class MyFormat(ResourceView): + async def get_header(self): + """Get header (cached after first access).""" + if not hasattr(self, "_header"): + data = await self.resource.get_data(Range(0, 16)) + self._header = parse_header(data) + return self._header +``` + +## Testing Components + +Every component needs comprehensive tests. See `testing_patterns.md` for details. + +## Component Registration + +Components are automatically discovered by OFRAK if they're in the right packages. For custom components: + +```python +from ofrak import OFRAK + +ofrak = OFRAK() +ofrak.discover(MyCustomComponent) +``` + +## Best Practices + +1. **Single Responsibility**: Each component should do one thing well +2. **Clear Targets**: Define precise targets to avoid running on wrong resources +3. **Proper Dependencies**: Use `view_dependencies` and check required attributes +4. **Error Handling**: Raise appropriate exceptions with clear messages +5. **Documentation**: Include comprehensive docstrings +6. **Testing**: Write tests for all code paths (100% coverage required) +7. **Type Annotations**: Use proper type hints throughout +8. **Efficient Data Access**: Use ranges when reading specific offsets +9. **Clean External Tools**: Always clean up temporary files +10. **Follow Patterns**: Look at similar existing components for patterns + +## Reference Implementation Examples + +For real examples of each component type, examine these files in the OFRAK repository: + +- **Identifiers**: `ofrak_core/ofrak/core/zip.py` (ZipIdentifier) +- **Analyzers**: `ofrak_core/ofrak/core/elf/model.py` (ElfAnalyzer) +- **Unpackers**: `ofrak_core/ofrak/core/zip.py` (ZipUnpacker) +- **Modifiers**: `ofrak_core/ofrak/core/binary.py` (BinaryPatchModifier) +- **Packers**: `ofrak_core/ofrak/core/zip.py` (ZipPacker) + +Always refer to existing implementations when creating new components to ensure you follow established patterns. diff --git a/claude/ofrak-developer/references/contributing_guidelines.md b/claude/ofrak-developer/references/contributing_guidelines.md new file mode 100644 index 000000000..485cc5c4d --- /dev/null +++ b/claude/ofrak-developer/references/contributing_guidelines.md @@ -0,0 +1,202 @@ +# OFRAK Contributing Guidelines + +This document contains the key contributing guidelines extracted from the OFRAK repository. + +## Pull Request Best Practices + +### Keep PRs Focused +- Each PR should focus on ONE change +- Avoid large changes that affect functionality in multiple ways +- Break up large changes into multiple pull requests +- This makes PRs easier to review and reduces the chance of merge conflicts + +### Review Size Guidelines +When reviewing, it can take developers a little over an hour to get through a few hundred lines of code and find most defects. Keep your contributions to a reasonable review size. + +## Changelog Requirements + +The following OFRAK packages maintain changelogs and MUST be updated if changes affect them: + +- `ofrak_core` → `ofrak_core/CHANGELOG.md` +- `ofrak_io` → `ofrak_io/CHANGELOG.md` +- `ofrak_patch_maker` → `ofrak_patch_maker/CHANGELOG.md` +- `ofrak_type` → `ofrak_type/CHANGELOG.md` +- `ofrak_tutorial` → `ofrak_tutorial/CHANGELOG.md` +- `ofrak_angr` → `disassemblers/ofrak_angr/CHANGELOG.md` +- `ofrak_capstone` → `disassemblers/ofrak_capstone/CHANGELOG.md` +- `ofrak_ghidra` → `disassemblers/ofrak_ghidra/CHANGELOG.md` +- `ofrak_pyghidra` → `disassemblers/ofrak_pyghidra/CHANGELOG.md` +- `ofrak_cached_disassembly` → `disassemblers/ofrak_cached_disassembly/CHANGELOG.md` + +### Changelog Format + +Changelogs follow the [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format and adhere to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +Each changelog has sections: +- **Added** - for new features +- **Fixed** - for bug fixes +- **Changed** - for changes in existing functionality +- **Deprecated** - for soon-to-be removed features +- **Removed** - for now removed features +- **Security** - for security-related changes + +### Changelog Entry Format + +Add your changes to the `Unreleased` section. The format is: + +```markdown +- Brief description of the change ([#PR_NUMBER](https://github.com/redballoonsecurity/ofrak/pull/PR_NUMBER)) +``` + +**Since the PR hasn't been created yet, use a placeholder:** +```markdown +- Brief description of the change ([#PLACEHOLDER](https://github.com/redballoonsecurity/ofrak/pull/PLACEHOLDER)) +``` + +**Ask the user to update the placeholder with the actual PR number once the PR is published.** + +### Example Changelog Entries + +Good examples from existing changelogs: + +```markdown +### Added +- Add `-V, --version` flag to ofrak cli ([#652](https://github.com/redballoonsecurity/ofrak/pull/652)) +- Add modifier to add and remove sections using lief. ([#443](https://github.com/redballoonsecurity/ofrak/pull/443)) +- Add UEFI binary unpacker. ([#399](https://github.com/redballoonsecurity/ofrak/pull/399)) + +### Fixed +- Fix `java` and `apktool` CLI arguments for checking components. ([#390](https://github.com/redballoonsecurity/ofrak/pull/390)) +- Fixed front end "Replace" button. Before it was appending new data instead of replacing it as intended. ([#403](https://github.com/redballoonsecurity/ofrak/pull/403)) +- Fix bug in OFRAK GUI server which causes an error when parsing a default config value of bytes. ([#409](https://github.com/redballoonsecurity/ofrak/pull/409)) + +### Changed +- By default, the ofrak log is now `ofrak-YYYYMMDDhhmmss.log` rather than just `ofrak.log` and the name can be specified on the command line ([#480](https://github.com/redballoonsecurity/ofrak/pull/480)) +- `Resource.flush_to_disk` method renamed to `Resource.flush_data_to_disk`. ([#373](https://github.com/redballoonsecurity/ofrak/pull/373)) +``` + +## Testing Requirements + +**The packages in this repository maintain 100% test coverage, either at the statement or function level.** + +This test coverage is enforced in the CI pipeline. Pull Requests that do not meet this requirement will not be merged. + +When contributing: +1. Always create tests for your changes +2. Ensure your tests cover all new code paths +3. Run tests locally before submitting PR +4. Tests should be placed in the appropriate `tests/` directory for the module + +## Pre-commit Hooks + +Please install and run the `pre-commit` hooks before submitting your PR. This helps maintain code quality and consistency. + +## Code Review Guidelines + +1. Please be respectful. Remember to discuss the merits of the idea, not the individual. +2. Please back your code review suggestions with technical reasoning. +3. If the value of your code review suggestion is subjective, please use words like "I think...". +4. If you have to write a long-winded explanation in the review, we expect to see some code comments. +5. Please keep your contributions within the scope of the proposed fix, feature, or maintenance task. + +## Python Coding Standard + +Please see `ofrak/docs/contributor-guide/getting-started.md` in the OFRAK repository for functional and stylistic expectations. + +## Linking to Issues + +Please link your Pull Request to an outstanding issue if one exists. For small fixes in docs or typos, you probably won't need to create an issue first. + +For feature proposals or very large fixes, create an issue first to discuss it beforehand. + +## Dependency Management + +**OFRAK uses a multi-package structure where each package manages its own dependencies.** + +### Package Structure + +Each OFRAK package has its own: +- `requirements.txt` - Python dependencies +- `Dockerstub` - System/apt dependencies +- `CHANGELOG.md` - Version history + +When adding dependencies to a specific OFRAK package, you MUST update that package's dependency files, not a global one. + +### Adding Python Dependencies + +**When adding a Python module dependency:** + +1. **Check PyPI for the latest stable version** + - Visit https://pypi.org/project/package-name/ + - Identify the latest stable release version + - Verify it's not a pre-release (alpha, beta, rc) + +2. **Pin to the exact latest version** + - Use `==` notation with the specific version + - Example: `package-name==2.5.3` (not `>=` or `~=`) + +3. **Update the package's `requirements.txt` file** + - Locate: `ofrak_core/requirements.txt`, `ofrak_patch_maker/requirements.txt`, etc. + - Add the pinned dependency + +4. **Update the package's `CHANGELOG.md`** + - Note the new dependency under "Added" or "Changed" section + +**Example:** +``` +# ofrak_core/requirements.txt +existing-package==1.2.3 +new-package==2.5.3 # Latest stable version as of 2024-01-15 +``` + +### Adding System Dependencies + +**When adding apt/system dependencies:** + +1. Locate the appropriate package's `Dockerstub` file + - Example: `ofrak_core/Dockerstub`, `disassemblers/ofrak_ghidra/Dockerstub` +2. Add the apt package(s) +3. Update the package's `CHANGELOG.md` noting the new dependency + +**Example Dockerstub:** +```dockerfile +# ofrak_core/Dockerstub +RUN apt-get update && \ + apt-get install -y --no-install-recommends \ + existing-tool \ + new-tool && \ # New dependency added + rm -rf /var/lib/apt/lists/* +``` + +### Dependency Guidelines + +**IMPORTANT RULES:** + +1. ✅ **Always check PyPI for latest stable version before pinning** +2. ✅ **Pin Python dependencies to exact version (use `==`)** +3. ✅ **Use packages from PyPI or standard apt repositories** +4. ✅ **Prefer well-maintained, cross-platform dependencies** +5. ❌ **AVOID dependencies that must be built from source** +6. ❌ **AVOID platform-specific dependencies without alternatives** +7. ❌ **AVOID dependencies with complex manual installation** + +### Common Package Locations + +- **Core components**: `ofrak_core/requirements.txt` and `ofrak_core/Dockerstub` +- **Patch maker**: `ofrak_patch_maker/requirements.txt` and `ofrak_patch_maker/Dockerstub` +- **Type definitions**: `ofrak_type/requirements.txt` and `ofrak_type/Dockerstub` +- **Disassemblers**: `disassemblers/ofrak_*/requirements.txt` and `disassemblers/ofrak_*/Dockerstub` +- **I/O operations**: `ofrak_io/requirements.txt` and `ofrak_io/Dockerstub` + +### Verifying Dependencies + +Before submitting PR: + +1. Test that dependencies install correctly +2. Verify Docker build succeeds with new dependencies +3. Document any special installation requirements in component docstring +4. Update CHANGELOG.md with dependency changes + +## Maintainers + +Every Pull Request requires at least one review by an OFRAK maintainer. You may request review from specific maintainers in your PR, or a maintainer will pick up your PR for review. diff --git a/claude/ofrak-developer/references/ofrak_script_patterns.md b/claude/ofrak-developer/references/ofrak_script_patterns.md new file mode 100644 index 000000000..843ebf780 --- /dev/null +++ b/claude/ofrak-developer/references/ofrak_script_patterns.md @@ -0,0 +1,409 @@ +# OFRAK Script Patterns + +This document covers common patterns for writing standalone Python scripts that use the OFRAK library. + +## Script Writing Workflow + +When writing an OFRAK script, follow these steps: + +1. **UNDERSTAND what script should do** + - Define the goal clearly + - Identify input files/parameters + - Determine expected output + +2. **CHECK references/ofrak_usage_guide.md for components** + - Verify which components you'll need + - Check component names and usage patterns + - Understand configuration requirements + +3. **VERIFY components exist (don't invent)** + - Only use components documented in ofrak_usage_guide.md + - Don't assume components exist based on naming patterns + - Check OFRAK documentation if uncertain + +4. **USE assets/script_template.py** + - Start from the provided template + - Follow the established structure + - Include proper imports and argparse setup + +5. **IMPLEMENT with proper async/await** + - All OFRAK operations must use `await` + - Main function must be `async def main(ofrak_context: OFRAKContext, ...)` + - Use `ofrak.run(main, ...)` in `if __name__ == "__main__"` + +6. **TEST script is valid Python** + - Check syntax is correct + - Verify all imports are available + - Run with sample data if possible + +7. **OUTPUT script to user** + - Provide complete, runnable script + - Include usage instructions + - Show example invocation + +## Basic Script Structure + +All OFRAK scripts follow a consistent structure: + +```python +""" +Brief description of what this script does. +""" +import argparse +from ofrak import OFRAK, OFRAKContext +# Import OFRAK components and views as needed + +async def main(ofrak_context: OFRAKContext, arg1: str, arg2: int): + """ + Main async function that performs the OFRAK operations. + + Args: + ofrak_context: The OFRAK context for creating and managing resources + arg1: Description of argument 1 + arg2: Description of argument 2 + """ + # Create root resource + root_resource = await ofrak_context.create_root_resource_from_file(arg1) + + # Perform operations on the resource + # ... + + # Save results if needed + await root_resource.flush_data_to_disk("output.bin") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Description of what the script does") + parser.add_argument("input_file", help="Input binary file") + parser.add_argument("--option", type=int, default=42, help="Optional parameter") + args = parser.parse_args() + + # Create OFRAK instance and run + ofrak = OFRAK() + ofrak.run(main, args.input_file, args.option) +``` + +## Key Patterns + +### 1. Creating Resources + +**From a file:** +```python +root_resource = await ofrak_context.create_root_resource_from_file(file_path) +``` + +**From bytes:** +```python +data = b"\x7fELF..." +root_resource = await ofrak_context.create_root_resource( + name="mybinary.bin", + data=data +) +``` + +### 2. Unpacking Resources + +**Basic unpacking:** +```python +# Automatically selects appropriate unpacker based on resource type +await root_resource.unpack() +``` + +**Recursive unpacking:** +```python +# Unpacks resource and all children +await root_resource.unpack_recursively() +``` + +### 3. Working with Resource Views + +**Get a specific view:** +```python +from ofrak.core.elf.model import Elf + +# View resource as ELF +elf = await root_resource.view_as(Elf) +print(f"Entry point: {hex(elf.header.e_entry)}") +``` + +**Check if resource has a tag:** +```python +from ofrak.core.elf.model import Elf + +if root_resource.has_tag(Elf): + elf = await root_resource.view_as(Elf) + # Work with ELF +``` + +### 4. Accessing Children + +**Get all children:** +```python +children = await root_resource.get_children() +for child in children: + print(f"Child: {child.get_caption()}") +``` + +**Get children with specific tag:** +```python +from ofrak.core.filesystem import File + +files = await root_resource.get_children_as_view(File) +for file in files: + print(f"File: {file.name}, Size: {file.size}") +``` + +**Get descendants (children and grandchildren):** +```python +descendants = await root_resource.get_descendants() +``` + +### 5. Running Components + +**Run a specific component:** +```python +from ofrak.core.strings import StringsAnalyzer + +# Run component with default config +await root_resource.run(StringsAnalyzer) + +# Access the results +strings = await root_resource.analyze(StringsAttributes) +for offset, string in strings.strings.items(): + print(f"{hex(offset)}: {string}") +``` + +**Run component with custom config:** +```python +from ofrak.core.strings import StringsAnalyzer, StringsAnalyzerConfig + +config = StringsAnalyzerConfig(min_length=10) +await root_resource.run(StringsAnalyzer, config) +``` + +### 6. Modifying Resources + +**Patch data at offset:** +```python +from ofrak.core.binary import BinaryPatchConfig, BinaryPatchModifier + +config = BinaryPatchConfig(offset=0x1000, patch=b"\x90\x90\x90\x90") +await root_resource.run(BinaryPatchModifier, config) +``` + +**Search and replace:** +```python +from ofrak.core.strings import StringFindReplaceConfig, StringFindReplaceModifier + +config = StringFindReplaceConfig( + to_find=b"old_string", + replace_with=b"new_string" +) +await root_resource.run(StringFindReplaceModifier, config) +``` + +### 7. Saving Results + +**Save to file:** +```python +await root_resource.flush_data_to_disk("output.bin") +``` + +**Save child to file:** +```python +child = await root_resource.get_only_child() +await child.flush_data_to_disk("extracted_file.bin") +``` + +**Get data as bytes:** +```python +data = await root_resource.get_data() +print(f"Size: {len(data)} bytes") +``` + +### 8. Error Handling + +**Handle missing components:** +```python +from ofrak.service.error import ComponentNotFoundError + +try: + await root_resource.run(SomeComponent) +except ComponentNotFoundError: + print("Component not available, skipping...") +``` + +**Handle unpacking failures:** +```python +try: + await root_resource.unpack() +except Exception as e: + print(f"Failed to unpack: {e}") + # Handle failure or continue +``` + +## Common Script Templates + +### Analysis Script + +```python +"""Analyze a binary and extract information.""" +import argparse +from ofrak import OFRAK, OFRAKContext +from ofrak.core.strings import StringsAnalyzer + +async def main(ofrak_context: OFRAKContext, binary_path: str): + # Load binary + root_resource = await ofrak_context.create_root_resource_from_file(binary_path) + + # Run analysis + await root_resource.run(StringsAnalyzer) + + # Get results + strings = await root_resource.analyze(StringsAttributes) + print(f"Found {len(strings.strings)} strings") + + # Print strings + for offset, string in sorted(strings.strings.items()): + print(f"{hex(offset)}: {string}") + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("binary", help="Binary file to analyze") + args = parser.parse_args() + + ofrak = OFRAK() + ofrak.run(main, args.binary) +``` + +### Unpacking Script + +```python +"""Unpack an archive or firmware image.""" +import argparse +from ofrak import OFRAK, OFRAKContext +from ofrak.core.filesystem import File + +async def main(ofrak_context: OFRAKContext, archive_path: str, output_dir: str): + # Load archive + root_resource = await ofrak_context.create_root_resource_from_file(archive_path) + + # Recursively unpack + await root_resource.unpack_recursively() + + # Extract all files + files = await root_resource.get_descendants_as_view(File) + + for file in files: + # Save each file + output_path = f"{output_dir}/{file.name}" + await file.resource.flush_data_to_disk(output_path) + print(f"Extracted: {output_path}") + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("archive", help="Archive to unpack") + parser.add_argument("output_dir", help="Output directory") + args = parser.parse_args() + + ofrak = OFRAK() + ofrak.run(main, args.archive, args.output_dir) +``` + +### Patching Script + +```python +"""Patch a binary at specific locations.""" +import argparse +from ofrak import OFRAK, OFRAKContext +from ofrak.core.binary import BinaryPatchConfig, BinaryPatchModifier + +async def main(ofrak_context: OFRAKContext, binary_path: str, output_path: str): + # Load binary + root_resource = await ofrak_context.create_root_resource_from_file(binary_path) + + # Apply patches + patches = [ + (0x1000, b"\x90\x90\x90\x90"), # NOP at 0x1000 + (0x2000, b"\xc3"), # RET at 0x2000 + ] + + for offset, patch_data in patches: + config = BinaryPatchConfig(offset=offset, patch=patch_data) + await root_resource.run(BinaryPatchModifier, config) + print(f"Patched at {hex(offset)}: {patch_data.hex()}") + + # Save patched binary + await root_resource.flush_data_to_disk(output_path) + print(f"Saved to: {output_path}") + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("input", help="Input binary") + parser.add_argument("output", help="Output patched binary") + args = parser.parse_args() + + ofrak = OFRAK() + ofrak.run(main, args.input, args.output) +``` + +## Best Practices + +1. **Always use async/await**: OFRAK operations are asynchronous +2. **Use argparse for CLI**: Makes scripts more usable and professional +3. **Include docstrings**: Document what the script does and its parameters +4. **Handle errors gracefully**: Don't let exceptions crash your script +5. **Clean up resources**: OFRAK handles this automatically when using `ofrak.run()` +6. **Use type hints**: Makes code clearer and catches errors earlier +7. **Keep scripts focused**: One script = one task +8. **Use meaningful variable names**: `root_resource` not just `r` +9. **Print progress**: Let users know what's happening +10. **Validate inputs**: Check file paths, ranges, etc. before processing + +## Common Imports + +```python +# Core OFRAK +from ofrak import OFRAK, OFRAKContext + +# Binary operations +from ofrak.core.binary import BinaryPatchConfig, BinaryPatchModifier + +# Strings +from ofrak.core.strings import ( + StringsAnalyzer, + StringsAttributes, + StringFindReplaceConfig, + StringFindReplaceModifier +) + +# File formats +from ofrak.core.elf.model import Elf +from ofrak.core.pe.model import Pe +from ofrak.core.filesystem import File, Folder, FilesystemRoot + +# Archives +from ofrak.core.zip import ZipArchive +from ofrak.core.tar import TarArchive + +# Code analysis +from ofrak.core.code_region import CodeRegion +from ofrak.core.basic_block import BasicBlock +from ofrak.core.instruction import Instruction +``` + +## Discovering Available Components + +When writing a script, you may need to find which components are available: + +```python +# List all available components +from ofrak.service.component_locator import ComponentLocator + +locator = ComponentLocator() +components = locator.get_components_matching_filter() + +for component in components: + print(f"{component.get_id()}: {component.__doc__}") +``` + +For detailed information about specific components, refer to the OFRAK documentation in the `ofrak/docs/` directory of the cloned repository. diff --git a/claude/ofrak-developer/references/ofrak_usage_guide.md b/claude/ofrak-developer/references/ofrak_usage_guide.md new file mode 100644 index 000000000..7043a1780 --- /dev/null +++ b/claude/ofrak-developer/references/ofrak_usage_guide.md @@ -0,0 +1,418 @@ +# OFRAK Usage Guide + +This guide explains how to effectively use OFRAK components in standalone Python scripts. + +## Understanding OFRAK Components + +OFRAK provides four main types of components: + +1. **Identifiers** - Detect file types and formats +2. **Analyzers** - Extract information from resources +3. **Unpackers** - Extract embedded content (files, sections, etc.) +4. **Modifiers** - Modify resources (patch, inject, etc.) + +## Component Discovery + +### Finding Available Components + +OFRAK automatically discovers and selects appropriate components based on resource type. You generally don't need to manually specify which unpacker or identifier to use. + +**Automatic component selection:** +```python +# OFRAK automatically detects file type +root_resource = await ofrak_context.create_root_resource_from_file("firmware.bin") + +# OFRAK automatically selects appropriate unpacker +await root_resource.unpack() +``` + +**Manual component execution:** +```python +# Run a specific analyzer +from ofrak.core.strings import StringsAnalyzer +await root_resource.run(StringsAnalyzer) +``` + +### Understanding Component Requirements + +Components have **targets** - the resource types they can operate on. OFRAK ensures components only run on appropriate resources. + +**Example:** +- `ElfUnpacker` targets `Elf` resources +- `StringsAnalyzer` can target any binary resource +- `BinaryPatchModifier` targets binary data + +## Working with Different File Formats + +### ELF Binaries + +```python +from ofrak.core.elf.model import Elf, ElfSection, ElfSegment + +# Load and view as ELF +root_resource = await ofrak_context.create_root_resource_from_file("binary.elf") +elf = await root_resource.view_as(Elf) + +# Access ELF header info +print(f"Entry point: {hex(elf.header.e_entry)}") +print(f"Architecture: {elf.header.e_machine}") + +# Unpack to get sections and segments +await root_resource.unpack() + +# Get all sections +sections = await root_resource.get_children_as_view(ElfSection) +for section in sections: + print(f"Section: {section.name} at {hex(section.virtual_address)}") +``` + +### PE Binaries + +```python +from ofrak.core.pe.model import Pe + +# Load and view as PE +root_resource = await ofrak_context.create_root_resource_from_file("binary.exe") +pe = await root_resource.view_as(Pe) + +# Access PE info +print(f"Machine type: {pe.machine_type}") +print(f"Entry point: {hex(pe.entry_point)}") +``` + +### Archives (ZIP, TAR, etc.) + +```python +from ofrak.core.zip import ZipArchive +from ofrak.core.filesystem import File + +# Load archive +root_resource = await ofrak_context.create_root_resource_from_file("archive.zip") + +# Unpack archive +await root_resource.unpack() + +# Get all files +files = await root_resource.get_children_as_view(File) +for file in files: + print(f"File: {file.name}, Size: {file.size}") + data = await file.resource.get_data() + # Process file data +``` + +### Filesystem Images + +```python +from ofrak.core.filesystem import FilesystemRoot, File, Folder + +# Load filesystem image +root_resource = await ofrak_context.create_root_resource_from_file("filesystem.img") + +# Unpack filesystem +await root_resource.unpack_recursively() + +# Navigate filesystem structure +fs_root = await root_resource.get_only_child_as_view(FilesystemRoot) + +# Find specific files +descendants = await root_resource.get_descendants() +for desc in descendants: + if desc.has_tag(File): + file = await desc.view_as(File) + if file.name.endswith(".conf"): + print(f"Found config: {file.name}") +``` + +## Common Analysis Tasks + +### String Extraction + +```python +from ofrak.core.strings import StringsAnalyzer, StringsAttributes + +# Run strings analysis +await root_resource.run(StringsAnalyzer) + +# Get results +strings_attr = await root_resource.analyze(StringsAttributes) + +# Print strings with offsets +for offset, string in sorted(strings_attr.strings.items()): + print(f"{hex(offset)}: {string}") + +# Filter strings +long_strings = {off: s for off, s in strings_attr.strings.items() if len(s) > 20} +``` + +### Code Analysis + +```python +from ofrak.core.code_region import CodeRegion +from ofrak.core.basic_block import BasicBlock +from ofrak.core.instruction import Instruction + +# Unpack to get code regions +await root_resource.unpack_recursively() + +# Find all basic blocks +blocks = await root_resource.get_descendants_as_view(BasicBlock) + +for block in blocks: + print(f"Basic block at {hex(block.virtual_address)}, size: {block.size}") + + # Get instructions in block + instructions = await block.resource.get_children_as_view(Instruction) + for instr in instructions: + print(f" {hex(instr.virtual_address)}: {instr.mnemonic} {instr.operands}") +``` + +### Memory Mapping Analysis + +```python +from ofrak.core.memory_region import MemoryRegion + +# Get memory regions +regions = await root_resource.get_descendants_as_view(MemoryRegion) + +for region in regions: + print(f"Region: {hex(region.virtual_address)} - {hex(region.virtual_address + region.size)}") + print(f" Permissions: {'R' if region.readable else '-'}{'W' if region.writable else '-'}{'X' if region.executable else '-'}") +``` + +## Common Modification Tasks + +### Binary Patching + +```python +from ofrak.core.binary import BinaryPatchConfig, BinaryPatchModifier + +# Patch at specific offset +config = BinaryPatchConfig(offset=0x1000, patch=b"\x90" * 4) +await root_resource.run(BinaryPatchModifier, config) +``` + +### String Replacement + +```python +from ofrak.core.strings import StringFindReplaceConfig, StringFindReplaceModifier + +# Replace string (must be same length or shorter) +config = StringFindReplaceConfig( + to_find=b"debug_mode_off", + replace_with=b"debug_mode_on\x00" +) +await root_resource.run(StringFindReplaceModifier, config) +``` + +### Code Modification + +```python +from ofrak.core.instruction import Instruction +from ofrak.core.binary import BinaryPatchModifier, BinaryPatchConfig + +# Find instruction to patch +instructions = await root_resource.get_descendants_as_view(Instruction) +for instr in instructions: + if instr.mnemonic == "jne" and instr.virtual_address == 0x401234: + # Change jne to jmp (example) + patch_offset = instr.virtual_address - base_address + config = BinaryPatchConfig(offset=patch_offset, patch=b"\xeb") + await root_resource.run(BinaryPatchModifier, config) + break +``` + +## Resource Navigation + +### Finding Specific Resources + +```python +# Find by tag +from ofrak.core.elf.model import ElfSection + +sections = await root_resource.get_descendants_as_view(ElfSection) + +# Find specific section +text_section = None +for section in sections: + if section.name == ".text": + text_section = section + break + +if text_section: + print(f".text section at {hex(text_section.virtual_address)}") +``` + +### Working with Resource Hierarchy + +```python +# Get immediate children only +children = await root_resource.get_children() + +# Get all descendants (children, grandchildren, etc.) +descendants = await root_resource.get_descendants() + +# Get parent +child = children[0] +parent = await child.get_parent() + +# Get ancestors +ancestors = await child.get_ancestors() +``` + +### Filtering Resources + +```python +# Get only resources with specific tag +from ofrak.core.filesystem import File + +files = [r for r in await root_resource.get_descendants() if r.has_tag(File)] + +# Filter by attribute +large_files = [] +for file_resource in files: + file_view = await file_resource.view_as(File) + if file_view.size > 1024 * 1024: # > 1MB + large_files.append(file_view) +``` + +## Configuration Options + +### Component Configs + +Many components accept configuration to customize behavior: + +```python +from ofrak.core.strings import StringsAnalyzer, StringsAnalyzerConfig + +# Custom min length for strings +config = StringsAnalyzerConfig(min_length=10) +await root_resource.run(StringsAnalyzer, config) +``` + +### OFRAK Context Options + +```python +# Create OFRAK with custom options +ofrak = OFRAK() + +# Register custom components +ofrak.discover(MyCustomComponent) + +# Run with context +await ofrak.run(main, arg1, arg2) +``` + +## Error Handling Best Practices + +### Handle Missing Components + +```python +from ofrak.service.error import ComponentNotFoundError + +try: + await root_resource.run(SomeOptionalComponent) +except ComponentNotFoundError: + print("Component not available, using fallback") + # Use alternative approach +``` + +### Handle Unpacking Failures + +```python +try: + await root_resource.unpack() +except Exception as e: + print(f"Warning: Failed to unpack {root_resource.get_caption()}: {e}") + # Continue with analysis on packed data +``` + +### Validate Resource State + +```python +# Check if resource has expected tag before using view +from ofrak.core.elf.model import Elf + +if root_resource.has_tag(Elf): + elf = await root_resource.view_as(Elf) + # Safe to use ELF-specific operations +else: + print("Not an ELF file") +``` + +## Performance Considerations + +### Recursive Operations + +```python +# For deep unpacking, use unpack_recursively +await root_resource.unpack_recursively() + +# For controlled unpacking, manually iterate +await root_resource.unpack() +children = await root_resource.get_children() +for child in children: + if should_unpack(child): + await child.unpack() +``` + +### Resource Querying + +```python +# Efficient: Get specific children +text_sections = await root_resource.get_children_as_view( + ElfSection, + r_filter=lambda r: r.name == ".text" +) + +# Less efficient: Get all then filter in Python +all_sections = await root_resource.get_children_as_view(ElfSection) +text_sections = [s for s in all_sections if s.name == ".text"] +``` + +## Debugging Scripts + +### Print Resource Information + +```python +# Print resource tree +def print_tree(resource, indent=0): + print(" " * indent + resource.get_caption()) + children = await resource.get_children() + for child in children: + await print_tree(child, indent + 1) + +await print_tree(root_resource) +``` + +### Inspect Resource Attributes + +```python +# Get all attributes +attributes = await root_resource.get_attributes() +for attr_type, attr in attributes.items(): + print(f"{attr_type.__name__}: {attr}") +``` + +### Enable Logging + +```python +import logging + +# Enable OFRAK debug logging +logging.basicConfig(level=logging.DEBUG) +``` + +## Reference Documentation + +For comprehensive API documentation, see the cloned OFRAK repository: +- **OFRAK User Guide**: `ofrak/docs/user-guide/` directory +- **OFRAK API Reference**: `ofrak/docs/reference/` directory +- **Example Scripts**: `ofrak/examples/` directory + +If you don't have the OFRAK repository cloned, instruct the user to clone it: `git clone https://github.com/redballoonsecurity/ofrak.git` + +## Getting Help + +- **OFRAK Documentation**: Available in `ofrak/docs/` of the cloned repository +- **GitHub Issues**: https://github.com/redballoonsecurity/ofrak/issues +- **Slack Community**: https://join.slack.com/t/ofrak/shared_invite/zt-1jku9h6r5-mY7CeeZ4AT8JVmu5YWw2Qg diff --git a/claude/ofrak-developer/references/testing_patterns.md b/claude/ofrak-developer/references/testing_patterns.md new file mode 100644 index 000000000..b6b7d60ec --- /dev/null +++ b/claude/ofrak-developer/references/testing_patterns.md @@ -0,0 +1,760 @@ +# OFRAK Testing Patterns + +This document covers patterns for writing comprehensive tests for OFRAK components and scripts. + +## Testing Requirements + +**OFRAK requires 100% test coverage at the statement or function level.** + +This is enforced by CI. Pull requests that don't meet this requirement will not be merged. + +## Test Data Strategy + +**CRITICAL: Always use real binary data. Never create synthetic test data in code.** + +### Test Asset Workflow + +1. **Write tests assuming data exists** in `tests/components/assets/` directory +2. **Reference asset files by path** in your test code +3. **Instruct user to provide real data** at the specified path + +### Example Pattern + +```python +class TestMyFormatUnpacker: + """Tests for MyFormatUnpacker.""" + + @pytest.fixture + def sample_file(self) -> bytes: + """Load real sample file for testing.""" + # Reference asset file that user must provide + asset_path = "tests/components/assets/sample.myformat" + with open(asset_path, "rb") as f: + return f.read() + + async def test_unpacks_real_file( + self, ofrak_context: OFRAKContext, sample_file: bytes + ): + """Test unpacker with real MyFormat file.""" + resource = await ofrak_context.create_root_resource("sample.myformat", sample_file) + resource.add_tag(MyFormat) + + await resource.run(MyFormatUnpacker) + + children = await resource.get_children() + assert len(children) > 0 +``` + +### User Instructions + +When creating tests, instruct the user: + +``` +Please place a real MyFormat test file at: + tests/components/assets/sample.myformat + +IMPORTANT: Test data must be suitable for public distribution, as the OFRAK +repository is open source and publicly accessible. Use test data that is: +- Created by you +- Public domain +- Permissively licensed (e.g., CC0, MIT, BSD) +- Otherwise freely redistributable + +You can obtain appropriate test data by: +- Creating your own test file with [tool] +- Using public domain samples from [source] +- Generating test cases yourself +- Extracting from openly licensed firmware/archives +``` + +### What NOT to Do + +❌ **Don't create synthetic data:** +```python +# BAD - Don't do this +test_data = b"MYFT" + b"\x00" * 100 # Fake data +``` + +❌ **Don't generate test files programmatically:** +```python +# BAD - Don't do this +def create_fake_myformat(): + return build_fake_structure() # Generated data +``` + +✅ **Do use real files:** +```python +# GOOD - Do this +with open("tests/components/assets/real_sample.myformat", "rb") as f: + real_data = f.read() +``` + +### When Synthetic Data is Acceptable + +Minimal synthetic data is acceptable ONLY for: +- Testing error conditions with intentionally malformed inputs +- Unit testing specific parsing functions with small, well-defined inputs +- Testing edge cases where real data would be impractical (e.g., 2GB files) + +Even in these cases, prefer real data when possible. + +## Test Structure + +### Basic Test Pattern + +```python +import pytest +from ofrak import OFRAKContext +from ofrak.resource import Resource + +from my_module import MyComponent, MyComponentConfig + + +class TestMyComponent: + """Tests for MyComponent.""" + + @pytest.fixture + async def test_resource(self, ofrak_context: OFRAKContext) -> Resource: + """ + Create a test resource for MyComponent tests. + + Args: + ofrak_context: OFRAK context fixture + + Returns: + Test resource + """ + # Create test data + test_data = b"test binary data" + + # Create resource + resource = await ofrak_context.create_root_resource("test.bin", test_data) + return resource + + async def test_my_component_basic( + self, test_resource: Resource, ofrak_context: OFRAKContext + ): + """Test MyComponent with basic input.""" + # Run component + await test_resource.run(MyComponent) + + # Verify results + result = await test_resource.get_data() + assert result == b"expected output" + + async def test_my_component_with_config( + self, test_resource: Resource, ofrak_context: OFRAKContext + ): + """Test MyComponent with custom configuration.""" + config = MyComponentConfig(option1="value", option2=123) + await test_resource.run(MyComponent, config) + + # Verify results + result = await test_resource.get_data() + assert result == b"expected output with config" +``` + +## Fixture Patterns + +### Standard Fixtures + +OFRAK provides standard fixtures: + +```python +@pytest.fixture +async def ofrak_context() -> OFRAKContext: + """Provides an OFRAK context for tests.""" + # Provided by OFRAK test framework +``` + +### Custom Fixtures for Test Data + +**Always reference real asset files:** + +```python +@pytest.fixture +def sample_elf_binary() -> bytes: + """Load real ELF binary for testing. + + User must provide: tests/components/assets/sample.elf + """ + asset_path = "tests/components/assets/sample.elf" + with open(asset_path, "rb") as f: + return f.read() + +@pytest.fixture +async def elf_resource( + ofrak_context: OFRAKContext, sample_elf_binary: bytes +) -> Resource: + """Create ELF resource for testing.""" + resource = await ofrak_context.create_root_resource( + "test.elf", sample_elf_binary + ) + return resource +``` + +### Parameterized Fixtures + +```python +@pytest.fixture(params=[ + ("input1.bin", b"expected1"), + ("input2.bin", b"expected2"), + ("input3.bin", b"expected3"), +]) +def test_case(request) -> tuple[str, bytes]: + """Parameterized test cases.""" + return request.param +``` + +## Testing Identifiers + +```python +from ofrak.core.magic import MagicMimeIdentifier, MagicDescriptionIdentifier + + +class TestMyFormatIdentifier: + """Tests for MyFormatIdentifier.""" + + @pytest.fixture + def valid_myformat_file(self) -> bytes: + """Load real valid MyFormat file. + + User must provide: tests/components/assets/valid.myf + """ + with open("tests/components/assets/valid.myf", "rb") as f: + return f.read() + + @pytest.fixture + def invalid_format_file(self) -> bytes: + """Load real non-MyFormat file. + + User must provide: tests/components/assets/not_myformat.bin + """ + with open("tests/components/assets/not_myformat.bin", "rb") as f: + return f.read() + + async def test_identifies_valid_format( + self, ofrak_context: OFRAKContext, valid_myformat_file: bytes + ): + """Test that valid MyFormat files are identified.""" + resource = await ofrak_context.create_root_resource("test.myf", valid_myformat_file) + + # Run identifier + await resource.run(MyFormatIdentifier) + + # Verify tag was added + assert resource.has_tag(MyFormat) + + async def test_does_not_identify_invalid_format( + self, ofrak_context: OFRAKContext, invalid_format_file: bytes + ): + """Test that non-MyFormat files are not identified.""" + resource = await ofrak_context.create_root_resource("test.bin", invalid_format_file) + await resource.run(MyFormatIdentifier) + + # Verify tag was not added + assert not resource.has_tag(MyFormat) + + async def test_identifies_with_magic( + self, ofrak_context: OFRAKContext, valid_myformat_file: bytes + ): + """Test identification after magic identifiers run.""" + resource = await ofrak_context.create_root_resource("test.myf", valid_myformat_file) + + # Run magic identifiers first (simulating real workflow) + await resource.run(MagicMimeIdentifier) + await resource.run(MagicDescriptionIdentifier) + await resource.run(MyFormatIdentifier) + + assert resource.has_tag(MyFormat) +``` + +## Testing Analyzers + +```python +class TestMyFormatAnalyzer: + """Tests for MyFormatAnalyzer.""" + + @pytest.fixture + def myformat_sample(self) -> bytes: + """Load real MyFormat file for testing. + + User must provide: tests/components/assets/sample_v2.myf + (A MyFormat file with version=2, compression=GZIP, entry_count=5) + """ + with open("tests/components/assets/sample_v2.myf", "rb") as f: + return f.read() + + @pytest.fixture + async def myformat_resource( + self, ofrak_context: OFRAKContext, myformat_sample: bytes + ) -> Resource: + """Create MyFormat test resource from real data.""" + resource = await ofrak_context.create_root_resource("test.myf", myformat_sample) + resource.add_tag(MyFormat) + return resource + + async def test_analyzes_format(self, myformat_resource: Resource): + """Test MyFormatAnalyzer extracts correct attributes.""" + # Run analyzer + await myformat_resource.run(MyFormatAnalyzer) + + # Get attributes + attrs = await myformat_resource.analyze(MyFormatAttributes) + + # Verify + assert attrs.version == 2 + assert attrs.compression_type == "GZIP" + assert attrs.entry_count == 5 + + async def test_analyzes_different_versions(self, ofrak_context: OFRAKContext): + """Test analyzer handles different format versions. + + User must provide: tests/components/assets/sample_v1.myf + (A MyFormat file with version=1, compression=NONE) + """ + with open("tests/components/assets/sample_v1.myf", "rb") as f: + data_v1 = f.read() + + resource = await ofrak_context.create_root_resource("test.myf", data_v1) + resource.add_tag(MyFormat) + + await resource.run(MyFormatAnalyzer) + attrs = await resource.analyze(MyFormatAttributes) + + assert attrs.version == 1 + assert attrs.compression_type == "NONE" +``` + +## Testing Unpackers + +```python +class TestMyFormatUnpacker: + """Tests for MyFormatUnpacker.""" + + @pytest.fixture + def archive_sample(self) -> bytes: + """Load real MyFormat archive for testing. + + User must provide: tests/components/assets/archive_2files.myf + (A MyFormat archive containing file1.txt and file2.txt) + """ + with open("tests/components/assets/archive_2files.myf", "rb") as f: + return f.read() + + @pytest.fixture + async def archive_resource( + self, ofrak_context: OFRAKContext, archive_sample: bytes + ) -> Resource: + """Create MyFormat archive resource from real data.""" + resource = await ofrak_context.create_root_resource("archive.myf", archive_sample) + resource.add_tag(MyFormat) + return resource + + async def test_unpacks_files(self, archive_resource: Resource): + """Test unpacker extracts all files correctly.""" + # Run analyzer first (unpacker depends on it) + await archive_resource.run(MyFormatAnalyzer) + + # Run unpacker + await archive_resource.run(MyFormatUnpacker) + + # Get children + from ofrak.core.filesystem import File + children = await archive_resource.get_children_as_view(File) + + # Verify + assert len(children) == 2 + + file1 = [f for f in children if f.name == "file1.txt"][0] + file2 = [f for f in children if f.name == "file2.txt"][0] + + file1_data = await file1.resource.get_data() + file2_data = await file2.resource.get_data() + + assert file1_data == b"Hello, World!" + assert file2_data == b"Goodbye!" + + async def test_unpacks_empty_archive(self, ofrak_context: OFRAKContext): + """Test unpacker handles empty archives. + + User must provide: tests/components/assets/archive_empty.myf + (An empty MyFormat archive with 0 entries) + """ + with open("tests/components/assets/archive_empty.myf", "rb") as f: + empty_archive = f.read() + + resource = await ofrak_context.create_root_resource("empty.myf", empty_archive) + resource.add_tag(MyFormat) + + await resource.run(MyFormatAnalyzer) + await resource.run(MyFormatUnpacker) + + children = await resource.get_children() + assert len(children) == 0 +``` + +## Testing Modifiers + +```python +class TestMyModifier: + """Tests for MyModifier.""" + + @pytest.fixture + def test_binary(self) -> bytes: + """Load real binary for modification testing. + + User must provide: tests/components/assets/test_binary.bin + (Binary file containing the target strings to replace) + """ + with open("tests/components/assets/test_binary.bin", "rb") as f: + return f.read() + + async def test_replaces_string( + self, ofrak_context: OFRAKContext, test_binary: bytes + ): + """Test modifier replaces target string in real binary.""" + resource = await ofrak_context.create_root_resource("test.bin", test_binary) + + # Run modifier + config = MyModifierConfig( + target_string=b"Hello", + replacement=b"Goodbye" + ) + await resource.run(MyModifier, config) + + # Verify replacement occurred + modified_data = await resource.get_data() + assert b"Goodbye" in modified_data + assert b"Hello" not in modified_data + + async def test_no_match_no_change( + self, ofrak_context: OFRAKContext, test_binary: bytes + ): + """Test modifier leaves data unchanged when no match.""" + resource = await ofrak_context.create_root_resource("test.bin", test_binary) + + config = MyModifierConfig( + target_string=b"NotFound", + replacement=b"Something" + ) + await resource.run(MyModifier, config) + + modified_data = await resource.get_data() + assert modified_data == test_binary +``` + +## Testing Packers + +```python +class TestMyFormatPacker: + """Tests for MyFormatPacker.""" + + async def test_packs_files(self, ofrak_context: OFRAKContext): + """Test packer creates valid archive from files.""" + from ofrak.core.filesystem import File + + # Create root resource + resource = await ofrak_context.create_root_resource("archive.myf", b"") + resource.add_tag(MyFormat) + + # Add child files + await resource.create_child( + tags=(File,), + data=b"File 1 content", + attributes=(File("file1.txt", 14),) + ) + await resource.create_child( + tags=(File,), + data=b"File 2 content", + attributes=(File("file2.txt", 14),) + ) + + # Run packer + await resource.run(MyFormatPacker) + + # Verify packed data + packed_data = await resource.get_data() + + # Check magic + assert packed_data[:4] == b"MYFT" + + # Verify can be unpacked + await resource.run(MyFormatUnpacker) + children = await resource.get_children_as_view(File) + assert len(children) == 2 +``` + +## Testing External Tool Components + +**CRITICAL: Do NOT mock external tools. Test with real tools installed.** + +```python +class TestExternalToolComponent: + """Tests for components using external tools.""" + + async def test_with_external_tool(self, ofrak_context: OFRAKContext): + """Test component that uses external tool.""" + # Create test resource with appropriate binary data + resource = await ofrak_context.create_root_resource("test.bin", b"real test data") + + # Run component with real external tool + await resource.run(MyExternalToolComponent) + + # Verify component processed real tool output correctly + result = await resource.get_data() + assert result == b"expected real output" + + async def test_handles_tool_failure(self, ofrak_context: OFRAKContext): + """Test component handles external tool failures.""" + # Create resource with data that will cause tool to fail + invalid_data = b"corrupted or invalid data" + resource = await ofrak_context.create_root_resource("test.bin", invalid_data) + + # Verify component raises appropriate error when tool fails + with pytest.raises(ComponentError): + await resource.run(MyExternalToolComponent) +``` + +**Requirements for external tool testing:** +- External tools must be installed in test environment +- Use real binary test data that exercises the tool +- Test both success and failure cases with real tool behavior +- Document required external dependencies in test docstrings + +## Testing Resource Views + +```python +class TestMyFormatView: + """Tests for MyFormat resource view.""" + + @pytest.fixture + async def myformat_view(self, ofrak_context: OFRAKContext) -> MyFormat: + """Create MyFormat view for testing.""" + data = create_test_myformat_data() + resource = await ofrak_context.create_root_resource("test.myf", data) + resource.add_tag(MyFormat) + return await resource.view_as(MyFormat) + + async def test_get_version(self, myformat_view: MyFormat): + """Test getting format version.""" + version = await myformat_view.get_version() + assert version == 1 + + async def test_extract_entry(self, myformat_view: MyFormat): + """Test extracting specific entry.""" + entry_data = await myformat_view.extract_entry(0) + assert entry_data == b"expected entry data" +``` + +## Parameterized Tests + +**Use real test files for parameterized testing:** + +```python +@pytest.mark.parametrize("input_file,expected_file", [ + ("tests/components/assets/sample1.bin", "tests/components/assets/expected1.bin"), + ("tests/components/assets/sample2.bin", "tests/components/assets/expected2.bin"), + ("tests/components/assets/sample3.bin", "tests/components/assets/expected3.bin"), +]) +async def test_multiple_cases( + ofrak_context: OFRAKContext, input_file: str, expected_file: str +): + """Test component with multiple real file pairs. + + User must provide: + - tests/components/assets/sample1.bin and expected1.bin + - tests/components/assets/sample2.bin and expected2.bin + - tests/components/assets/sample3.bin and expected3.bin + """ + with open(input_file, "rb") as f: + test_input = f.read() + with open(expected_file, "rb") as f: + expected = f.read() + + resource = await ofrak_context.create_root_resource("test.bin", test_input) + await resource.run(MyComponent) + + result = await resource.get_data() + assert result == expected +``` + +## Error Handling Tests + +**Prefer real malformed files, but minimal synthetic data is acceptable for error testing:** + +```python +async def test_invalid_format_raises_error(self, ofrak_context: OFRAKContext): + """Test component raises appropriate error for invalid input. + + Option 1 (preferred): Use real corrupted file + User provides: tests/components/assets/corrupted.myf + + Option 2 (acceptable): Minimal synthetic malformed data + """ + # Preferred: Load real corrupted file + # with open("tests/components/assets/corrupted.myf", "rb") as f: + # invalid_data = f.read() + + # Acceptable for error testing: minimal synthetic invalid data + invalid_data = b"INVALID" + + resource = await ofrak_context.create_root_resource("test.bin", invalid_data) + + with pytest.raises(ValueError, match="Invalid format"): + await resource.run(MyComponent) + +async def test_missing_dependency_raises_error( + self, ofrak_context: OFRAKContext, test_binary: bytes +): + """Test component raises error when dependency missing.""" + resource = await ofrak_context.create_root_resource("test.bin", test_binary) + + # Don't run required analyzer + with pytest.raises(ComponentDependencyError): + await resource.run(MyComponent) +``` + +## Integration Tests + +```python +async def test_full_workflow(self, ofrak_context: OFRAKContext): + """Test complete workflow with multiple components. + + User must provide: tests/components/assets/test_workflow.myf + (A complete MyFormat file for end-to-end testing) + """ + # Load real test file + with open("tests/components/assets/test_workflow.myf", "rb") as f: + test_data = f.read() + + resource = await ofrak_context.create_root_resource("test.myf", test_data) + + # Identify + await resource.run(MyFormatIdentifier) + assert resource.has_tag(MyFormat) + + # Analyze + await resource.run(MyFormatAnalyzer) + attrs = await resource.analyze(MyFormatAttributes) + assert attrs.version == 1 + + # Unpack + await resource.unpack() + children = await resource.get_children() + assert len(children) > 0 + + # Modify + config = MyModifierConfig(target_string=b"old", replacement=b"new") + await resource.run(MyModifier, config) + + # Pack + await resource.pack() + + # Save + await resource.flush_data_to_disk("modified.myf") +``` + +## Test Organization + +### File Structure + +``` +tests/ +├── __init__.py +├── test_my_identifier.py # Identifier tests +├── test_my_analyzer.py # Analyzer tests +├── test_my_unpacker.py # Unpacker tests +├── test_my_modifier.py # Modifier tests +├── test_my_packer.py # Packer tests +├── test_my_view.py # Resource view tests +└── data/ # Test data + ├── sample1.myf + ├── sample2.myf + └── ... +``` + +### Test Class Organization + +```python +class TestMyComponent: + """Tests for MyComponent. + + Organized by feature/scenario: + - Basic functionality + - Edge cases + - Error handling + - Integration with other components + """ + + # Fixtures + @pytest.fixture + def setup(self): + ... + + # Basic functionality tests + async def test_basic_case(self): + ... + + # Edge case tests + async def test_empty_input(self): + ... + + async def test_large_input(self): + ... + + # Error handling tests + async def test_invalid_input_raises_error(self): + ... + + # Integration tests + async def test_works_with_other_component(self): + ... +``` + +## Coverage Best Practices + +1. **Test all code paths**: Ensure every branch is tested +2. **Test error conditions**: Don't just test happy paths +3. **Test edge cases**: Empty inputs, maximum sizes, boundary conditions +4. **Test integrations**: How components work together +5. **Use parameterized tests**: Cover multiple scenarios efficiently +6. **Use real data and tools**: Test with actual binaries and real external tools (no mocking) +7. **Keep tests isolated**: Each test should be independent +8. **Name tests clearly**: Test names should describe what they test +9. **Use appropriate assertions**: Be specific about what you're checking +10. **Document complex tests**: Add comments explaining non-obvious test logic + +## Running Tests + +```bash +# Run all tests +pytest + +# Run specific test file +pytest tests/test_my_component.py + +# Run specific test +pytest tests/test_my_component.py::TestMyComponent::test_basic_case + +# Run with coverage +pytest --cov=my_module tests/ + +# Generate coverage report +pytest --cov=my_module --cov-report=html tests/ +``` + +## CI/CD Integration + +Tests are automatically run by GitHub Actions on every PR. Ensure: +- All tests pass locally before pushing +- Coverage meets 100% requirement +- Tests are reasonably fast (optimize test data size, not by mocking) +- Tests are deterministic (no random failures) +- Required external tools are available in CI environment + +## Additional Resources + +- pytest documentation: https://docs.pytest.org/ +- OFRAK testing examples: Look at existing tests in `ofrak/tests/` directory +- Coverage documentation: https://coverage.readthedocs.io/ diff --git a/ofrak_core/CHANGELOG.md b/ofrak_core/CHANGELOG.md index e78d62ef6..77c87c6c1 100644 --- a/ofrak_core/CHANGELOG.md +++ b/ofrak_core/CHANGELOG.md @@ -6,6 +6,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) ## [Unreleased](https://github.com/redballoonsecurity/ofrak/tree/master) ### Added +- Add ofrak-developer Claude Skill for LLM assisted OFRAK development ([#663](https://github.com/redballoonsecurity/ofrak/pull/663)) - Add OFRAK requirements, requirement to test mapping, test specifications ([#656](https://github.com/redballoonsecurity/ofrak/pull/656)) - Add `-V, --version` flag to ofrak cli ([#652](https://github.com/redballoonsecurity/ofrak/pull/652))