[assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis by SamuelHassine · Pull Request #6429 · OpenCTI-Platform/connectors

SamuelHassine · 2026-05-19T09:20:20Z

Proposed changes

Adds a new internal-enrichment connector for AssemblyLine 4 at internal-enrichment/assemblyline/.

When OpenCTI asks the connector to enrich a StixFile or Artifact, the connector:

downloads the file content from OpenCTI's storage API (via pycti's helper.api.fetch_opencti_file, inheriting the platform's session timeouts, retries, custom CA bundles and proxy / SSL settings);
submits it to the configured AssemblyLine deployment with a classification mirroring the source observable's TLP marking (TLP:CLEAR ΓåÆ TLP:C, TLP:GREEN ΓåÆ TLP:G, TLP:AMBER ΓåÆ TLP:A, TLP:RED ΓåÆ TLP:R), falling back to ASSEMBLYLINE_CLASSIFICATION when the source has no TLP ΓÇö so a sample is never silently downgraded once it leaves OpenCTI;
polls until the analysis completes (within a configurable timeout, surfacing terminal failed / error / cancelled states immediately via AssemblyLineTerminalError instead of waiting for the global timeout);
pushes the results back into OpenCTI:
- a Malware-Analysis SDO (submission id, profile, verdict, score) that surfaces in the Malware Analysis section of the enriched observable;
- STIX Indicator objects (and optional matching Observable objects linked via based-on relationships) for every malicious IOC AssemblyLine extracted (domains, IPs, URLs), labelled malicious with x_opencti_score=80. When ASSEMBLYLINE_INCLUDE_SUSPICIOUS=true, suspicious-only IOCs are emitted as separate Indicator SDOs labelled suspicious with x_opencti_score=50 so OpenCTI keeps the two classifications distinct (the original implementation labelled everything malicious regardless of AssemblyLine's tag);
- Malware SDOs for every malware family attributed by AssemblyLine;
- Attack-Pattern SDOs for every MITRE ATT&CK technique observed at runtime, linked to the generated indicators with related-to relationships;
- a Note summarising the verdict and per-category indicator counts (the per-bucket numbers come from the connector's own bookkeeping in _create_indicators, not the raw IOC-extraction list lengths, so the counts always match what was actually written to OpenCTI even when the per-bucket 20-indicator cap is engaged; the Note also renders a Suspicious IOCs Created as Indicators block whenever ASSEMBLYLINE_INCLUDE_SUSPICIOUS=true produces any);
- an External-Reference attached to the enriched observable pointing back to the AssemblyLine submission.

All three "create X" behaviours and the "sequential mode" (queue submissions while AssemblyLine is busy, bounded by ASSEMBLYLINE_TIMEOUT) are configurable via ASSEMBLYLINE_CREATE_ATTACK_PATTERNS / ASSEMBLYLINE_CREATE_MALWARE_ANALYSIS / ASSEMBLYLINE_CREATE_OBSERVABLES / ASSEMBLYLINE_SEQUENTIAL_MODE so operators can tune the noise level on the OpenCTI side.

The connector also enforces a TLP gate (ASSEMBLYLINE_MAX_TLP, default TLP:AMBER) before any file content leaves OpenCTI for the external AssemblyLine deployment. Every derived SDO, SCO, Indicator and SRO inherits the source observable's object_marking_refs, so a TLP:AMBER source produces TLP:AMBER analysis observables, indicators, Malware family SDOs, Attack-Pattern SDOs, the Malware-Analysis SDO, the summary Note, and every related-to / based-on relationship between them ΓÇö never silently downgraded to TLP:CLEAR.

Source

This is a re-submission of #5199 (which became unmergeable after the head branch was accidentally promoted to master and then rolled back). The implementation is the one from @V1D1AN's standalone connector repo at https://github.com/V1D1AN/connector-assemblyline, adapted to the OpenCTI monorepo conventions and with all the review findings from #5199 fixed in this branch.

Monorepo adaptations applied on top of the upstream code

Files placed under internal-enrichment/assemblyline/ with the standard layout (Dockerfile, entrypoint.sh, docker-compose.yml, README.md, .dockerignore, __metadata__/connector_manifest.json, __metadata__/connector_config_schema.json, __metadata__/logo.png, src/{main.py, requirements.txt, config.yml.sample, __init__.py}, tests/{test_connector.py, test-requirements.txt, __init__.py}).
Dockerfile rebased on python:3.12-alpine matching the lean internal-enrichment/joe-sandbox pattern: libmagic + libffi-dev for the runtime, git + build-base only for pip install (removed afterwards). assemblyline-client==4.9.9 does not transitively pull lxml, so the previously-installed libxml2-dev / libxslt-dev headers (never cleaned up) are dropped.
docker-compose.yml rewritten with image: opencti/connector-assemblyline:rolling, inline CONNECTOR_* / ASSEMBLYLINE_* environment variables with sensible defaults.
src/requirements.txt pinned to pycti==7.260521.0 (aligned with the rest of the monorepo's current pin), assemblyline-client==4.9.9, stix2==3.0.1, requests~=2.32, PyYAML>=6.0,<7. Removed the bogus uuid>=1.30 PyPI package (the stdlib uuid module ships with Python and the third-party package is unmaintained).
connector_manifest.json fixed: slug lowercased to assemblyline, description accurately reflects what the connector actually emits (domains, IPs, URLs, Malware SDOs, ATT&CK Attack Patterns ΓÇö no spurious "file hashes" claim), logo path adjusted to internal-enrichment/assemblyline/__metadata__/logo.png, container_image set to opencti/connector-assemblyline, source_code pointed at the monorepo path.
tests/test-requirements.txt added so the GitHub Actions test runner picks the connector up.

Review findings from #5199 addressed in this branch

Correctness

Normalise ASSEMBLYLINE_VERIFY_SSL so requests does not treat the string "false" / "true" as a CA-bundle path.
Cap the sequential-mode wait by ASSEMBLYLINE_TIMEOUT so the worker can never block forever when AssemblyLine stays busy. ASSEMBLYLINE_POLL_INTERVAL is clamped to max(1, int(...)) so a stale config.yml or env var cannot turn the wait into a busy loop.
Add an HTTP timeout to the AssemblyLine submission request.
Keep MITRE ATT&CK phase names hyphenated (defense-evasion, ...) so OpenCTI can match them against the official kill-chain entries.
Link ATT&CK Attack-Patterns to the generated indicators with related-to, matching what the README documents.
Drop the duplicate _create_relationships pass so observables are not created twice and ASSEMBLYLINE_CREATE_OBSERVABLES is the single source of truth.
Switch IPv6 IOCs to ipv6-addr / IPv6-Addr (they were modelled as ipv4-addr).
Escape backslashes (and single quotes) before embedding values in STIX patterns.
Remove the hard-coded "trojan" malware label so families keep their actual classification.
Read StixFile content from importFiles in addition to x_opencti_files, matching the other sandbox connectors.
Pass the canonical /storage/get/<id> URL to helper.api.fetch_opencti_file ΓÇö and do the same for the importFiles path so it inherits pycti's HTTP session config (timeouts, retries, CA bundles, proxy / SSL) instead of using a raw requests.get with a manually-set Authorization header.
Terminal AssemblyLine states (failed / error / cancelled) now surface immediately through a dedicated AssemblyLineTerminalError exception that the polling loop re-raises in front of its broad transient-error catcher ΓÇö the previous code masked terminal failures and let the enrichment run until the global timeout.
Suspicious-only IOCs are no longer treated as malicious. _extract_malicious_iocs returns only IOCs AssemblyLine tagged malicious; a new _extract_suspicious_iocs returns the suspicious subset (when ASSEMBLYLINE_INCLUDE_SUSPICIOUS=true). Downstream: the malicious label on the source observable, the high x_opencti_score, and the malware-analysis.result=malicious upgrade only fire on actually-malicious IOCs. Suspicious-only verdicts produce a suspicious label, x_opencti_score=50, and indicators labelled suspicious with the same score. The summary Note now renders a Suspicious IOCs Created as Indicators section whenever the suspicious bucket is non-empty so the user-facing summary cannot contradict what the connector actually sent.
_parse_al_timestamp parses positive AND negative ISO-8601 offsets correctly. The earlier split('+', 1)[0] discarded positive offsets entirely and silently fell back to the caller's default for negative ones. Switched to datetime.fromisoformat after normalising the Z suffix, then astimezone(UTC).replace(tzinfo=None) so the rest of the connector keeps using a single naive-UTC representation.
Strip trailing slash on configured base URLs (OPENCTI_URL, ASSEMBLYLINE_URL) at assignment time so requests >= 2.34 doesn't produce double-slash paths ΓÇö also satisfies the repo-wide tests/test_url_construction.py guard.
SHA-256-only deduplication. _check_existing_analysis searches AssemblyLine submissions via files.sha256:<hash> ΓÇö _process_file and _get_artifact_content now pick the SHA-256 explicitly via _select_sha256(observable["hashes"]) instead of any-hash, so MD5 / SHA-1 inputs no longer generate audit-log noise hitting an unmatched Lucene query and cannot accidentally hit an unrelated submission.
Identity-lookup failure no longer crashes the Malware-Analysis bundle. _get_assemblyline_identity swallows errors and stores None; sco_author_properties now omits x_opencti_created_by_ref and stix2.MalwareAnalysis(...) now omits created_by_ref when the identity is unavailable, so the rest of the enrichment still lands.
The summary Note reports created-indicator counts, not extracted-IOC list lengths. _create_indicators caps creation at 20 indicators per IOC bucket (domains / IPs / URLs, for both the malicious and the suspicious classifications); the Note's "Malicious IOCs Created as Indicators" / "Suspicious IOCs Created as Indicators" sections now read from per-category counters in the counts dict (malicious_domains / malicious_ips / malicious_urls / suspicious_domains / suspicious_ips / suspicious_urls) instead of len(malicious_iocs[...]) / len(suspicious_iocs[...]). Counters are incremented only on a successful indicator.create, so transient OpenCTI failures cannot inflate them either. Result: on a 50-domain analysis the Note now shows **Malicious Domains:** 20 (matching what was actually written to OpenCTI and the run's success-message) instead of **Malicious Domains:** 50.

Security

Enforce a TLP gate (ASSEMBLYLINE_MAX_TLP, default TLP:AMBER) before any file content leaves OpenCTI for the external AssemblyLine deployment.
Submission classification mirrors the source TLP. Maps TLP:CLEAR / TLP:WHITE ΓåÆ TLP:C, TLP:GREEN ΓåÆ TLP:G, TLP:AMBER / TLP:AMBER+STRICT ΓåÆ TLP:A, TLP:RED ΓåÆ TLP:R, falling back to ASSEMBLYLINE_CLASSIFICATION when the source has no TLP marking. A TLP:AMBER file passing the max-TLP gate is no longer silently downgraded to the connector default (TLP:C) once it leaves OpenCTI.
Never log the full observable on error: log only entity_type + id to avoid leaking payload_bin or other sensitive observable fields.
The full derived sub-graph inherits the source observable's object_marking_refs (every derived SDO / SCO / Indicator / Malware family / Malware-Analysis / Attack-Pattern / summary Note, and every related-to / based-on SRO between them) instead of being hard-coded to TLP:WHITE or left unmarked. A file that passes the ASSEMBLYLINE_MAX_TLP gate with TLP:AMBER now produces TLP:AMBER analysis objects throughout ΓÇö never downgraded. The _source_marking_refs helper reads observable["objectMarking"][*]["standard_id"] and falls back to OpenCTI's custom TLP:CLEAR marking only when the source carries no marking at all.
Config loading uses yaml.safe_load inside a with open(...) context manager rather than yaml.load(open(...), Loader=FullLoader) ΓÇö FullLoader could instantiate arbitrary Python objects through YAML tags from a tampered config.yml, and the bare open() leaked the file handle on parse errors.

UX

Attach an AssemblyLine External-Reference to the enriched observable, not just to the Malware-Analysis SDO.
Surface a max_tlp knob in docker-compose.yml, config.yml.sample and the README.
Update the README so the documented IOC set matches what is actually emitted, the documented submission flow matches what the code does (requests.post to /api/v4/submit/, with assemblyline-client used for polling / summary retrieval / file fetch), and the TLP propagation contract is spelt out.

Tests

Replace the placeholder asserts with real coverage of the connector implementation: bool coercion, malicious-IOC extraction with/without suspicious, score buckets, STIX-pattern escaping (backslash and quote), IPv4/IPv6 dispatch, ATT&CK phase preservation, TLP gate, external-reference attachment, sequential-mode deadline, CREATE_OBSERVABLES flag, malware-family label, file-fetch URL.
TestSourceMarkingRefs, TestMalwareAnalysisPropagatesSourceMarkings, TestApiIndicatorAndObservableInheritSourceMarkings, TestSummaryNoteInheritsSourceMarking, TestMalwareFamilyInheritsSourceMarkings and TestAttackPatternInheritsSourceMarkings pin the marking-inheritance contract end-to-end across the full derived sub-graph (SCOs, Indicators, Observables, Malware family SDOs, Malware-Analysis, Attack-Patterns, summary Note, and the related-to / based-on SROs between them).
TestTerminalAssemblyLineStates::test_process_file_raises_terminal_error_on_terminal_state pins the real _process_file polling-loop contract end-to-end: stubs requests.post (200 + valid sid) and al_client.submission.full to return {"state": <terminal>} on the first poll, asserts AssemblyLineTerminalError is raised, the submission id appears in the message, and submission.full ran exactly once (no retry, no sleep-until-timeout fallback). Parameterised across all three terminal states (failed / error / cancelled).
TestMaliciousIOCExtraction extended to pin the suspicious / malicious split. TestIocClassificationLabelsAndScores pins the per-IOC label + score plumbing. TestResolveSubmissionClassification pins the TLP ΓåÆ AL classification mapping. TestParseAlTimestamp extended to pin positive AND negative ISO-8601 offset parsing. TestConfigYamlSafeLoad pins the safe_load contract against !!python/object/apply tags. TestUnpinnedFileFetch extended to pin _download_import_file routing through helper.api.fetch_opencti_file.
TestMissingIdentityDoesNotEmitNoneAuthor pins both the sco_author_properties and stix2.MalwareAnalysis(...) paths when _get_assemblyline_identity returned None ΓÇö the bundle is still sent and the derived SCOs / Malware-Analysis serialise without a null author. TestProcessFileShaDedup pins the SHA-256-only dedup contract (called once for a SHA-256, never for an MD5 / SHA-1-only observable). TestSummaryNoteSuspiciousSection pins that the Note's Suspicious IOCs block is rendered with correct counts when present and omitted otherwise.
TestCreateIndicatorsTracksPerCategoryCreatedCounts pins the per-category created-indicator counters on the counts dict (capped at 20 per bucket; only incremented on a successful indicator.create; tracked separately for the malicious and suspicious buckets). TestSummaryNoteUsesCreatedCountsNotExtractedLengths pins that the Note's malicious / suspicious "Created as Indicators" sections report counts.get('<classification>_<kind>', 0) rather than len(...) so the displayed numbers match what the connector actually wrote to OpenCTI.
Whole suite is now 119 cases (started at 57 on the original branch, 69 / 94 / 107 after the first three review-and-fix passes, 119 after this one).

Related issues

Checklist

I consider the submitted work as finished
I have signed my commits using a GPG key.
I tested the code for its functionality using different use cases
I added/updated the relevant documentation
Where necessary I refactored code to improve the overall quality

Further comments

The connector is published as a new entry under internal-enrichment/assemblyline/; no existing connector is modified. The author is recorded as the AssemblyLine identity so generated SDOs are clearly attributable in the OpenCTI UI.

… 4 sandbox analysis Adds a new internal-enrichment connector for AssemblyLine 4 at `internal-enrichment/assemblyline/`. When OpenCTI asks the connector to enrich a `StixFile` or `Artifact`, the connector: * downloads the file content from OpenCTI's storage API; * submits it to the configured AssemblyLine deployment via `assemblyline-client`; * polls until the analysis completes (within a configurable timeout); * pushes the results back into OpenCTI: * a `Malware-Analysis` SDO (submission id, profile, verdict, score) that surfaces in the *Malware Analysis* section of the enriched observable; * STIX `Indicator` objects (and optional matching `Observable` objects linked via `based-on` relationships) for every malicious IOC AssemblyLine extracted (domains, IPs, URLs); * `Malware` SDOs for every malware family attributed by AssemblyLine; * `Attack-Pattern` SDOs for every MITRE ATT&CK technique observed at runtime, linked to the generated indicators with `related-to` relationships; * a `Note` summarising the verdict and counts; * an `External-Reference` attached to the enriched observable pointing back to the AssemblyLine submission. All the "create *X*" toggles and the sequential mode (queue submissions while AssemblyLine is busy, bounded by the configured timeout) are configurable via `ASSEMBLYLINE_CREATE_ATTACK_PATTERNS` / `ASSEMBLYLINE_CREATE_MALWARE_ANALYSIS` / `ASSEMBLYLINE_CREATE_OBSERVABLES` / `ASSEMBLYLINE_SEQUENTIAL_MODE`. The connector also enforces a TLP gate (`ASSEMBLYLINE_MAX_TLP`, default `TLP:AMBER`) before any file content leaves OpenCTI for the external AssemblyLine deployment. This connector is a re-submission of #5199 (which became unmergeable after the head branch was accidentally promoted to master and then rolled back). All the Copilot review findings on #5199 have been addressed in this re-submission: SSL-verify normalisation, bounded sequential-mode wait, HTTP submission timeout, hyphenated MITRE phase names, ATT&CK→Indicator `related-to` relationships, IPv6 dispatch, STIX-pattern escaping (backslash + quote), no hard-coded `trojan` label, `CREATE_OBSERVABLES` no longer duplicates, StixFile `importFiles` lookup, canonical `/storage/get/<id>` URL for `fetch_opencti_file`, TLP gate, sanitised observable in error logs, and proper External-Reference attachment to the enriched observable. Co-authored-by: V1D1AN <V1D1AN@users.noreply.github.com>

codecov · 2026-05-19T09:21:46Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
internal-enrichment/assemblyline/src/__init__.py	0.00%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (7947ad3) and HEAD (fd06e67). Click for more details.

HEAD has 95 uploads less than BASE

Flag BASE (7947ad3) HEAD (fd06e67)

connectors 97 2

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #6429       +/-   ##
===========================================
- Coverage   26.75%    0.62%   -26.14%     
===========================================
  Files        1801     1727       -74     
  Lines      106940   107281      +341     
===========================================
- Hits        28613      666    -27947     
- Misses      78327   106615    +28288

Files with missing lines	Coverage Δ
internal-enrichment/assemblyline/src/main.py	`54.12% <ø> (ø)`
internal-enrichment/assemblyline/src/__init__.py	`0.00% <0.00%> (ø)`

... and 953 files with indirect coverage changes

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new AssemblyLine internal-enrichment connector that submits OpenCTI Artifact/StixFile content for sandbox analysis and imports analysis results back into OpenCTI.

Changes:

Adds connector runtime, Docker, compose, config, manifest, and README files under internal-enrichment/assemblyline.
Implements AssemblyLine submission/polling, TLP gating, IOC extraction, malware-analysis creation, notes, and external references.
Adds unit tests covering configuration coercion, IOC handling, STIX escaping, TLP gate, sequential timeout, and file fetching.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`internal-enrichment/assemblyline/src/main.py`	Main connector implementation for AssemblyLine enrichment and OpenCTI object creation.
`internal-enrichment/assemblyline/src/requirements.txt`	Runtime Python dependencies.
`internal-enrichment/assemblyline/src/config.yml.sample`	Sample manual configuration.
`internal-enrichment/assemblyline/src/__init__.py`	Package metadata.
`internal-enrichment/assemblyline/tests/test_connector.py`	Unit tests for connector behavior.
`internal-enrichment/assemblyline/tests/test-requirements.txt`	Test dependency list.
`internal-enrichment/assemblyline/tests/__init__.py`	Test package marker.
`internal-enrichment/assemblyline/README.md`	Connector documentation.
`internal-enrichment/assemblyline/Dockerfile`	Container build definition.
`internal-enrichment/assemblyline/entrypoint.sh`	Container entrypoint.
`internal-enrichment/assemblyline/docker-compose.yml`	Example deployment configuration.
`internal-enrichment/assemblyline/.dockerignore`	Docker build exclusions.
`internal-enrichment/assemblyline/__metadata__/connector_manifest.json`	Connector catalog metadata.

Address the six outstanding Copilot review threads on the connector: 1. **TLP downgrade on derived analysis SCOs** (`src/main.py` lines 1066 / 1085 / 1093 / 1107). The connector built every derived domain / IPv4 / IPv6 / URL with ``object_marking_refs=[stix2.TLP_WHITE]``, which downgraded a file that passed the ``ASSEMBLYLINE_MAX_TLP`` gate with a higher marking (e.g. ``TLP:AMBER``) — the resulting analysis observables would have shown up in OpenCTI under the wrong access bucket. Introduce a new ``_source_marking_refs(observable)`` helper that reads ``observable["objectMarking"][*].standard_id`` from the enriched observable and returns those refs (falling back to ``[stix2.TLP_WHITE["id"]]`` only when the source carries no marking at all — every derived SCO still needs *some* marking so the platform's access-control gates work). ``_create_malware_analysis`` now uses this helper for all four derived SCO types. 2. **Terminal AssemblyLine states masked by the polling loop** (`src/main.py` line 764). The state-machine branches for ``failed`` / ``error`` / ``cancelled`` raised plain ``Exception`` from inside the polling ``try`` block, which the broad ``except Exception`` below then caught and turned into a ``log_warning`` — so a terminal failure kept the connector polling until the global ``ASSEMBLYLINE_TIMEOUT``. Introduce a dedicated ``AssemblyLineTerminalError`` exception type and add a targeted ``except AssemblyLineTerminalError: raise`` before the broad ``except`` so terminal failures surface immediately while the broad handler keeps absorbing transient client errors. 3. **Manifest description / behaviour drift** (`__metadata__/connector_manifest.json` line 4). The manifest advertised "file hashes" indicators that the connector does not actually emit. Update the description to match what the code produces: domains, IPs, URLs, malware families (via dedicated Malware SDOs) and ATT&CK techniques. Tests: * ``tests/test_connector.py``: ``TestSourceMarkingRefs`` (5 cases) pins the helper contract — single TLP source ⇒ single marking ref; multiple markings preserved and deduplicated in encounter order; missing / empty / ``objectMarking``-less observables fall back to ``TLP:WHITE``; markings without ``standard_id`` are ignored gracefully. * ``tests/test_connector.py``: ``TestMalwareAnalysisPropagatesSourceMarkings`` (5 cases) is an end-to-end test that runs ``_create_malware_analysis`` with a TLP:AMBER source observable and asserts every derived SCO (domain / IPv4 / IPv6 / URL) in the captured bundle carries the same ``TLP:AMBER`` marking — explicitly checking that no derived SCO leaks back to ``TLP:WHITE``. * ``tests/test_connector.py``: ``TestTerminalAssemblyLineStates`` (2 cases) pins the new exception's behaviour — a terminal ``AssemblyLineTerminalError`` re-raised through the targeted ``except: raise`` propagates intact, and the new type is a subclass of ``Exception`` (so generic loggers still capture it) but distinct from it (so a polling loop can catch it specifically). Whole suite is now **69 cases** (was 57); ``black --check``, ``isort --profile black --check`` and ``flake8 --select=F`` all clean.

SamuelHassine · 2026-05-19T11:09:27Z

Full review and fix pass complete on c98a242fb4:

✅ All six outstanding Copilot review threads addressed and resolved:
- TLP downgrade on derived analysis SCOs (src/main.py lines 1066 / 1085 / 1093 / 1107 — four sibling threads): derived domain / IPv4 / IPv6 / URL observables are no longer hard-coded to TLP:WHITE. A new _source_marking_refs(observable) helper reads observable["objectMarking"][*]["standard_id"] from the enriched observable so derived SCOs inherit the source marking (e.g. a TLP:AMBER file produces TLP:AMBER analysis observables). Falls back to TLP:WHITE only when the source carries no marking at all.
- Terminal AssemblyLine states masked by polling loop (src/main.py line 764): introduced an AssemblyLineTerminalError exception type and added a targeted except AssemblyLineTerminalError: raise in front of the broad transient-error catcher. failed / error / cancelled submissions now surface immediately instead of letting the connector poll until the global timeout.
- Manifest description / behaviour drift (__metadata__/connector_manifest.json line 4): description rewritten so it matches what the code actually emits — domains, IPs, URLs, Malware SDOs, ATT&CK Attack Patterns. The misleading "file hashes" claim is gone.
✅ Best-practice fixes applied: 12 new pytest cases pin the new contracts — TestSourceMarkingRefs (5 cases), TestMalwareAnalysisPropagatesSourceMarkings (5 end-to-end cases asserting no derived SCO leaks back to TLP:WHITE for a TLP:AMBER source), TestTerminalAssemblyLineStates (2 cases). Whole suite is now 69 cases (was 57).
✅ CI is fully green on c98a242fb4: GitHub Actions (PR conventions, signed commits, baseline coverage, issue-link, do-not-merge, test detection, Test internal-enrichment/assemblyline (69 tests), Test tests/test-requirements.txt, codecov/patch, codecov/project), CircleCI (ensure_formatting, base_linter, linter, test, build_manifest), filigran/cla — all pass.
✅ Every review thread resolved (0 unresolved threads).
✅ Branch is MERGEABLE; the mergeStateStatus: BLOCKED is purely from reviewDecision: REVIEW_REQUIRED because I'm the PR author and GitHub does not let me approve my own PR. A second reviewer needs to approve before merge.
✅ Title ([assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis) follows the repository convention; description reflects the final state (TLP-inheritance fix, terminal-state propagation, accurate manifest description, new test counts).

Ready for an external reviewer's approval.

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Addresses the six new Copilot review threads on top of c98a242: 1. **API-side marking inheritance** (`src/main.py` line 1218). The STIX bundle path was already propagating the source observable's markings to derived analysis SCOs, but the OpenCTI REST API path that creates Indicator (and optional Observable) objects for every malicious IOC was not. A TLP:AMBER source therefore produced indicators / observables that the platform exposed more broadly than the source SCO. ``_create_indicators`` now takes the full source observable (not just its id), derives the marking refs via ``_source_marking_refs`` and forwards them to ``_create_indicator_observable``, which sets ``objectMarking`` on both the indicator and (when ``ASSEMBLYLINE_CREATE_OBSERVABLES`` is on) the matching observable. 2. **TLP:CLEAR fallback in `_source_marking_refs`** (line 334). The helper fell back to ``stix2.TLP_WHITE`` when the source observable carried no marking at all. That was inconsistent with ``_check_tlp`` (which treats an unmarked observable as ``TLP:CLEAR``) and with the rest of the codebase, which uses OpenCTI's custom ``TLP:CLEAR`` marking via ``pycti.MarkingDefinition.generate_id("TLP", "TLP:CLEAR")``. Fallback now uses the same custom id (exposed at module scope as ``_TLP_CLEAR_MARKING_ID``) so an unmarked source produces ``TLP:CLEAR`` derived objects — never the deprecated TLP:WHITE. 3. **SCO author convention** (lines 1112 / 1131 / 1139 / 1153). The four derived SCOs (DomainName, IPv6Address, IPv4Address, URL) set ``created_by_ref`` in ``custom_properties``, but OpenCTI's observable authoring convention is ``x_opencti_created_by_ref`` (``created_by_ref`` is reserved for SDOs/SROs). Setting the wrong field left the platform's author column empty for analysis observables. All four sites now use a shared ``sco_author_properties`` dict with the OpenCTI-specific key. Tests: - ``TestSourceMarkingRefs`` rewritten to pin the new TLP:CLEAR fallback contract (rather than TLP:WHITE). - New ``TestApiIndicatorAndObservableInheritSourceMarkings`` (3 cases) pins the API-side marking propagation: a TLP:AMBER source produces a TLP:AMBER indicator and observable, and an unmarked source falls back to TLP:CLEAR — never TLP:WHITE. - New ``test_derived_scos_use_x_opencti_created_by_ref`` (in ``TestMalwareAnalysisPropagatesSourceMarkings``) asserts the serialised SCO body contains ``x_opencti_created_by_ref`` and never the standard SDO ``created_by_ref``. - Existing call sites in tests that passed ``"obs-root"`` as a string to ``_create_indicators`` now pass the matching ``{"id": "obs-root", "objectMarking": []}`` dict to match the new signature. Whole suite is now 73 cases (was 69); ``black --check``, ``isort --profile black --check`` and ``flake8 --ignore=E,W`` all clean.

SamuelHassine · 2026-05-19T13:38:22Z

Full review-and-fix pass complete on 9ddcf44 — second-pass Copilot threads:

✅ All six new Copilot review threads addressed and resolved (0 unresolved):
- API-side marking inheritance (src/main.py _create_indicator_observable / _create_indicators): API-created Indicator and Observable objects now inherit the source observable's objectMarking — a TLP:AMBER file produces TLP:AMBER indicators/observables, never broader than the source SCO.
- _source_marking_refs fallback: now returns OpenCTI's custom TLP:CLEAR marking (via pycti.MarkingDefinition.generate_id("TLP", "TLP:CLEAR"), exposed as _TLP_CLEAR_MARKING_ID) instead of the deprecated stix2.TLP_WHITE. Matches what _check_tlp already treats unmarked observables as.
- SCO author convention (4 sibling threads on lines 1112 / 1131 / 1139 / 1153): derived DomainName / IPv6Address / IPv4Address / URL observables now set x_opencti_created_by_ref (OpenCTI's SCO authoring convention) instead of the standard SDO created_by_ref, so the platform's author column is populated.
✅ New tests pinning the new contracts:
- TestApiIndicatorAndObservableInheritSourceMarkings (3 cases): TLP:AMBER source → TLP:AMBER indicator + observable; unmarked source → TLP:CLEAR fallback.
- test_derived_scos_use_x_opencti_created_by_ref in TestMalwareAnalysisPropagatesSourceMarkings: serialised SCO body contains x_opencti_created_by_ref and never the standard created_by_ref.
- TestSourceMarkingRefs rewritten to pin the new TLP:CLEAR fallback (was TLP:WHITE).
- Whole suite is now 73 cases (was 69).
✅ CI fully green on 9ddcf44: CircleCI (test, linter, ensure_formatting, base_linter, build_manifest), GitHub Actions (PR conventions, signed commits, baseline coverage, issue-link, do-not-merge, test detection, Test internal-enrichment/assemblyline — 73 tests, Test tests/test-requirements.txt, codecov/patch, codecov/project, CLA) — all pass.
✅ Branch is MERGEABLE, no conflicts. mergeStateStatus: BLOCKED is purely reviewDecision: REVIEW_REQUIRED — GitHub does not let me approve my own PR. A second reviewer needs to approve before merge.
✅ Title and description still accurate (Closes #6407, Supersedes #5199 already in body).

Ready for an external reviewer's approval.

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.

…assification config key, ship config schema Addresses the five new Copilot review threads on top of 9ddcf44: 1. **Marking propagation on the Malware-Analysis SDO** (`src/main.py` line 1173). `source_marking_refs` was already computed at the top of `_create_malware_analysis` and applied to the four derived analysis SCOs, but the Malware-Analysis SDO itself was emitted unmarked. For a TLP:AMBER source observable that left the verdict / submission id / score visible to users who could not see the underlying file. The SDO now carries `object_marking_refs=source_marking_refs` like the rest of the bundle. Pinned by `test_malware_analysis_inherits_amber`. 2. **Marking propagation on the summary Note** (`src/main.py` line 1610). `_create_summary_note` uses the OpenCTI REST API path (`helper.api.note.create`); the `note_data` dict was missing the `objectMarking` key, so the API-created Note inherited no marking and exposed the verdict / sid / file hash / size / IOC counts more broadly than the source. `note_data` now carries `objectMarking=self._source_marking_refs(observable)` which reuses the same helper as the bundle path (TLP:CLEAR fallback for unmarked sources, not the deprecated TLP:WHITE). Pinned by the new `TestSummaryNoteInheritsSourceMarking` class (2 cases: TLP:AMBER propagation + TLP:CLEAR fallback). 3. **`classification` missing from `config.yml.sample`** (line 30). `AssemblyLineConnector.__init__` already reads `assemblyline.classification` (with the `ASSEMBLYLINE_CLASSIFICATION` env-var fallback), but the sample config omitted the YAML key, so manual deployments that copied the file could not discover the submission classification without using an environment variable. Added `classification: 'TLP:C'` to the sample's `# Submission settings` block. 4. **README documents `classification` as env-var-only** (`README.md` line 109). The configuration table for the `Submission Classification` row had `-` in the `config.yml` column, contradicting the connector code. Updated the cell to `classification` so the documented YAML key matches what `get_config_variable` actually looks up. 5. **`connector_config_schema.json` missing from `__metadata__/`** (`__metadata__/connector_manifest.json` line 1). The global manifest generator embeds a connector's `config_schema` only when this file exists; without it the catalog / manager entry would not expose the AssemblyLine configuration fields. Added a complete JSON-Schema draft-2020-12 document under `__metadata__/connector_config_schema.json` covering every OpenCTI / connector / AssemblyLine env-var the connector reads, including types, defaults, enums (`submission_profile`, `log_level`, `max_tlp`), and the canonical `required` set (`OPENCTI_URL`, `OPENCTI_TOKEN`, `ASSEMBLYLINE_URL`, `ASSEMBLYLINE_USER`, `ASSEMBLYLINE_APIKEY`). Whole suite is now 76 cases (was 73); `black --check` and `isort --profile black --check` clean.

SamuelHassine · 2026-05-19T21:05:25Z

Full review-and-fix pass complete on b576159885 — third-pass Copilot threads:

All 5 new Copilot review threads addressed and resolved (0 unresolved across the entire PR now). Marking propagation on the Malware-Analysis SDO: source_marking_refs was already computed at the top of _create_malware_analysis and applied to the four derived analysis SCOs, but the stix2.MalwareAnalysis(...) call itself was emitted unmarked. The SDO now carries object_marking_refs=source_marking_refs, so a TLP:AMBER source no longer leaks verdict / submission id / score to user groups that should not see the underlying file. Marking propagation on the summary Note: _create_summary_note uses the OpenCTI REST API path (helper.api.note.create); note_data now carries objectMarking=self._source_marking_refs(observable) — reusing the same helper as the bundle path (TLP:CLEAR fallback for unmarked sources, never the deprecated TLP:WHITE). classification missing from config.yml.sample: AssemblyLineConnector.__init__ reads assemblyline.classification (with the ASSEMBLYLINE_CLASSIFICATION env fallback) but the sample omitted the YAML key; added classification: ''TLP:C'' to the # Submission settings block. README documented classification as env-var-only: the row in the README config table now lists classification as the config.yml key (was -). Missing connector_config_schema.json: added a complete JSON-Schema draft-2020-12 document under __metadata__/connector_config_schema.json covering every OpenCTI / connector / AssemblyLine env-var the connector reads, with types, defaults, enums (submission_profile, log_level, max_tlp) and the canonical required set (OPENCTI_URL, OPENCTI_TOKEN, ASSEMBLYLINE_URL, ASSEMBLYLINE_USER, ASSEMBLYLINE_APIKEY). Now the global manifest generator can embed the AssemblyLine configuration fields in the catalog / manager entry.
New tests pinning the new contracts. test_malware_analysis_inherits_amber in TestMalwareAnalysisPropagatesSourceMarkings: TLP:AMBER source → TLP:AMBER Malware-Analysis SDO. New TestSummaryNoteInheritsSourceMarking class (2 cases): TLP:AMBER source → TLP:AMBER Note; unmarked source → TLP:CLEAR fallback. Whole suite is now 76 cases (was 73).
CI fully green on b576159885: GitHub Actions (PR conventions, signed commits, baseline coverage, issue-link, do-not-merge, test detection, Test internal-enrichment/assemblyline — 76 tests, Test tests/test-requirements.txt, codecov/patch, codecov/project, filigran/cla), CircleCI (test, linter, ensure_formatting, base_linter, build_manifest) — all pass.
Branch is MERGEABLE, no conflicts. mergeStateStatus: BLOCKED is purely reviewDecision: REVIEW_REQUIRED — GitHub does not let me approve my own PR. A second reviewer needs to approve before merge.
Title ([assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis) and description still accurate (Closes #6407, supersedes [assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis #5199). Ready for an external reviewer's approval.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

…sets Addresses the remaining Copilot review threads on PR #6429: - `_extract_malicious_iocs` returns only malicious IOCs again; suspicious-only IOCs now flow through a new `_extract_suspicious_iocs` bucket. Downstream "label observable malicious", "x_opencti_score=80" and "force malware-analysis.result=malicious" paths only fire on truly-malicious IOCs. Suspicious IOCs still become indicators (when `ASSEMBLYLINE_INCLUDE_SUSPICIOUS=true`) but with the `suspicious` label and `x_opencti_score=50` so OpenCTI keeps the two classifications distinct. - `_create_indicator_observable` / `_create_indicators` accept a per-IOC `classification` so the label and score are derived from what AssemblyLine actually returned instead of being hard-coded to `malicious` / 80. - New `_resolve_submission_classification` maps the source observable's TLP to AssemblyLine's compact form (`TLP:CLEAR`/`TLP:WHITE`→`TLP:C`, `TLP:GREEN`→`TLP:G`, `TLP:AMBER`/`TLP:AMBER+STRICT`→`TLP:A`, `TLP:RED`→`TLP:R`); falls back to the configured default when the source carries no TLP. `_process_file` uses it so an AMBER source is no longer silently downgraded to the connector default (`TLP:C`) when leaving OpenCTI. - `_download_import_file` now routes through `helper.api.fetch_opencti_file` (like `_fetch_attached_file`), so the importFiles download inherits pycti's HTTP session config (timeouts, retries, custom CA bundles, proxy / SSL). - `_parse_al_timestamp` uses `datetime.fromisoformat` after normalising the `Z` suffix, so positive AND negative ISO-8601 offsets parse correctly (the previous `split('+', 1)[0]` dropped the offset for positive shifts and silently fell back to default for negative ones). - Config loading uses `yaml.safe_load` inside `with open(...)` instead of `yaml.load(open(...), Loader=FullLoader)` — `FullLoader` could instantiate arbitrary Python objects from YAML tags in a tampered `config.yml`, and the bare `open()` leaked the file handle. - README updated so the documented submission flow matches what the code actually does (`requests.post` to `/api/v4/submit/`, source TLP propagation, suspicious vs malicious indicator scoring). Tests grew from 69 to 94 cases: - 3 new `TestMaliciousIOCExtraction` cases pinning the suspicious / malicious split. - 4 new `TestIocClassificationLabelsAndScores` cases pinning the per-IOC label + score plumbing. - 9 new `TestResolveSubmissionClassification` cases pinning the TLP → AL classification mapping (per TLP value + custom-default fallback + unknown-TLP fallback). - 2 new `TestParseAlTimestamp` cases pinning the positive AND negative ISO-8601 offset paths. - 1 new `TestConfigYamlSafeLoad` case pinning the `safe_load` contract against `!!python/object/apply` tags. - `TestUnpinnedFileFetch` extended to pin `_download_import_file → helper.api.fetch_opencti_file`.

…onnector

``requests`` >= 2.34 no longer normalises double slashes in URL paths, so ``self.opencti_url`` and ``self.assemblyline_url`` must already be slash-free at assignment time. Without this, an operator who sets ``OPENCTI_URL=https://opencti/`` produces ``https://opencti//storage/get/<id>`` on the file-fetch path and ``https://assemblyline//api/v4/submit/`` on the submission path, both of which are rejected by some upstreams. Fixes the ``tests/test_url_construction.py::test_no_unprotected_url_assignments`` guard the repo enforces in CI.

SamuelHassine · 2026-05-21T07:25:24Z

Full review-and-fix pass complete on `7cbff611bd`

All 8 outstanding Copilot review threads addressed and resolved (0 unresolved across the entire PR — 25 threads total, 17 resolved across earlier passes + 8 in this one).

This pass's highlights:

Suspicious vs malicious split. _extract_malicious_iocs returns only malicious IOCs again; suspicious ones flow through a new _extract_suspicious_iocs bucket. Downstream is_malicious label, x_opencti_score=80 and malware-analysis.result=malicious escalation only fire on truly-malicious IOCs. Suspicious-only verdicts produce a suspicious label, x_opencti_score=50, and indicators labelled suspicious with the same score — so the OpenCTI UI keeps both classifications distinct (was: everything labelled malicious regardless of AssemblyLine's tag).
Source TLP propagates to AssemblyLine submission classification. _resolve_submission_classification maps TLP:CLEAR / TLP:WHITE → TLP:C, TLP:GREEN → TLP:G, TLP:AMBER / TLP:AMBER+STRICT → TLP:A, TLP:RED → TLP:R, falling back to ASSEMBLYLINE_CLASSIFICATION when the source has no TLP. A TLP:AMBER file passing the max-TLP gate is no longer silently downgraded to the connector default once it leaves OpenCTI.
Config loading uses yaml.safe_load inside with open(...) instead of yaml.load(open(...), Loader=FullLoader) — a tampered config.yml can no longer instantiate arbitrary Python objects through !!python/object/apply tags, and the file handle is released on parse errors.
_download_import_file routes through helper.api.fetch_opencti_file (like _fetch_attached_file), so the importFiles path inherits pycti's HTTP session config (timeouts, retries, custom CA bundles, proxy / SSL) instead of re-implementing HTTP with a raw requests.get.
_parse_al_timestamp uses datetime.fromisoformat after normalising Z, then astimezone(UTC).replace(tzinfo=None) — positive AND negative ISO-8601 offsets now parse correctly (the earlier split('+', 1)[0] discarded positive offsets entirely and silently fell back to default for negative ones).
OPENCTI_URL / ASSEMBLYLINE_URL stripped at assignment time so requests >= 2.34 doesn't produce double-slash paths; satisfies the repo-wide tests/test_url_construction.py guard.
README rewritten so the documented submission flow matches the code (requests.post to /api/v4/submit/, with assemblyline-client used for polling / summary retrieval / file fetch).
Merged latest master into the branch to pull in the global URL-protection fixes ([all] Fix url to avoid double slash #6394) and the new lint / format / URL-construction CI jobs.

CI: every required check green (signed commits, signed-commits PR check, conventions, base linter / flake8, ensure formatting, STIX ID linter, build manifest, baseline coverage, internal-enrichment/assemblyline tests, the global tests/test-requirements.txt suite, codecov patch + project, CLA).

Tests: suite is now 94 cases (was 69 after the previous pass, 57 originally). New coverage:

TestMaliciousIOCExtraction: 3 new cases pinning the suspicious / malicious split.
TestIocClassificationLabelsAndScores: 4 new cases pinning the per-IOC label + score plumbing.
TestResolveSubmissionClassification: 9 new cases pinning the TLP → AL classification mapping (per TLP value + custom-default + unknown-TLP fallbacks).
TestParseAlTimestamp: 2 new cases pinning positive AND negative ISO-8601 offset parsing.
TestConfigYamlSafeLoad: 1 new case pinning the safe_load contract against !!python/object/apply tags.
TestUnpinnedFileFetch: extended to pin _download_import_file → helper.api.fetch_opencti_file.

Title remains [assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis. PR body refreshed end-to-end so the documented change set matches the final code. Closes #6407, Supersedes #5199. Mergeable as soon as a maintainer approves.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 8 comments.

…reflect that in the summary Note Addresses the eight outstanding Copilot review threads on ``48edbc8`` — five of them ("indicators" totals counting families, observable counts inflated by swallowed exceptions, comment lying about ``_fetch_attached_file`` populating ``_current_file_size``, two summary-Note sections labelling Malware SDOs as "Indicators") are real correctness / honesty bugs; the other three are forward-looking test-docstring cleanup that drops PR-id refs in favour of describing the behaviour the test pins. * ``main.py::_fetch_attached_file`` — now caches the fetched payload length into ``self._current_file_size`` the same way ``_download_import_file`` already does. The comment in ``_process_message`` claimed both helpers populated the field but the second one didn't, so the summary-Note size fallback was effectively dead whenever the source observable came in through the ``Artifact`` path (which uses ``_fetch_attached_file``). Updated the helper's docstring to match. * ``main.py::_create_indicator_observable`` — return is now a ``(indicator_id, observable_created)`` tuple instead of a bare ``Optional[str]``. The helper swallows exceptions when creating the matching Observable / based-on relationship (so a single per-IOC failure does not abort the whole enrichment), but the caller was incrementing ``counts["observables"]`` / ``counts["relationships"]`` on the presence of an indicator id regardless of whether the observable actually landed. New ``observable_created`` flag is set to ``True`` only after the observable + based-on edge round-trip cleanly; counts now reflect what really got into OpenCTI. * ``main.py::_create_indicators`` — ``counts`` now carries per-classification indicator counters (``malicious_indicators`` / ``suspicious_indicators``) plus a separate ``malware_families`` bucket. ``_process_bucket`` increments ``counts[f"{classification}_indicators"]`` on every successful indicator creation; the malware-family loop later increments ``counts["malware_families"]`` per successful ``Malware`` SDO. The legacy ``counts["indicators"]`` is kept alongside as a compatibility alias. * ``main.py::_process_message`` — success-message text now drives off the per-classification indicator counters above, NOT off ``sum(len(values) for values in malicious_iocs.values())``. The previous shape included the ``families`` bucket in the "indicators" total even though families are emitted as ``Malware`` SDOs, over-reporting "indicators created" by the number of malware families on every malicious file. New shape reports indicators and malware families separately ("``N malicious indicators and M malware families created``"), and produces a sensible message even when only malware families were emitted ("``N malware families created, no IOC indicators``"). * ``main.py::_create_summary_note`` — the "Malicious IOCs Created as Indicators" header no longer renders a "**Malware Families:** N" bullet underneath; that line was misleading because the families are emitted as ``Malware`` SDOs (NOT STIX Indicators). Malware families now have a dedicated "## Malware Families" section that surfaces the successfully-created Malware SDO count from ``counts["malware_families"]`` and explicitly mentions the SDO type. The block is only rendered when there is at least one created Malware SDO so the malicious-IOCs-only path keeps its short format. Same treatment on the "Suspicious IOCs Created as Indicators" block: the "Suspicious Malware Families" line is gone — the connector only creates Malware SDOs from the *malicious* families bucket (``_create_indicators`` only loops over ``malicious_iocs["families"]``), so listing a count for the suspicious bucket under an "Indicators" header was doubly misleading. * ``tests/test_connector.py`` — three test docstrings rewritten to describe the contract the test pins instead of referencing PR #6429 (PR-id refs in tests age poorly). ``test_malicious_ extraction_never_mixes_in_suspicious``, ``TestIocClassification LabelsAndScores``, ``TestConfigYamlSafeLoad`` all keep the same assertions, just with PR-agnostic descriptive prose. New ``test_malware_families_section_rendered_when_count_non_zero`` / ``test_malware_families_section_omitted_when_zero`` cases in ``TestSummaryNoteSuspiciousSection`` pin the new "Malware Families" section's render contract (visible only when ``counts["malware_families"] > 0``, never folded under the "Created as Indicators" headers). ``test_suspicious_section_rendered_when_non_empty`` updated to assert "Suspicious Malware Families" is NOT in the rendered Note (the family count was misleading and is now suppressed). Verified locally: * ``pytest tests/`` — 109 / 109 pass (was 107; +2 for the new Malware Families render-contract cases). * ``black --check``, ``isort --profile black --check-only``, ``flake8 --select=F`` clean across ``internal-enrichment/assemblyline/``. * ``python -m py_compile`` clean on every modified module.

SamuelHassine · 2026-05-21T17:17:00Z

Follow-up review-and-fix pass complete on 84ef7e79bc.

All eight outstanding Copilot review threads addressed and resolved:

main.py:1556 — _create_indicator_observable now returns a (indicator_id, observable_created) tuple; callers increment counts["observables"] / counts["relationships"] off observable_created, not off assemblyline_create_observables. The summary Note's "Observables Created" line now reflects what really landed in OpenCTI instead of what the connector tried to create.
main.py:1637 — _fetch_attached_file now caches the fetched payload length into self._current_file_size the same way _download_import_file already did; the comment in _process_message claiming both helpers populated the field is now end-to-end accurate.
main.py:1780 — success-message text drives off the per-classification indicator counters (counts["malicious_indicators"] / counts["suspicious_indicators"] / counts["malware_families"]) _create_indicators increments only on a successful API call, not off sum(len(values) for values in malicious_iocs.values()) which included the families extraction bucket. Malware families are reported separately ("N malicious indicators and M malware families created"), with a dedicated message for the families-only path.
main.py:1906 — Suspicious Malware Families line removed from the "Suspicious IOCs Created as Indicators" Note section. The connector only creates Malware SDOs from the malicious families bucket (_create_indicators only loops over malicious_iocs["families"]), so listing a count for the suspicious bucket under an "Indicators" header was doubly misleading.
main.py:1919 — Malware Families bullet removed from the "Malicious IOCs Created as Indicators" Note section, replaced by a dedicated ## Malware Families section that surfaces the successfully-created Malware SDO count from counts["malware_families"] and explicitly mentions the SDO type. Only rendered when at least one Malware SDO was created.
tests/test_connector.py:136 / :609 / :758 — three test docstrings rewritten to describe the contract being pinned (classification contract, per-classification label + score, yaml.safe_load security property) without referencing PR [assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis #6429; PR-id refs in tests age poorly.

Plus two new tests (test_malware_families_section_rendered_when_count_non_zero / test_malware_families_section_omitted_when_zero) pin the new "Malware Families" section's render contract.

Local verification on 84ef7e79bc: pytest tests/ → 109 / 109 pass (was 107; +2 for the new section); black --check, isort --profile black --check-only, flake8 --select=F clean across internal-enrichment/assemblyline/; python -m py_compile clean on every modified module.

CI green on 84ef7e79bc — every check SUCCESS: Test internal-enrichment/assemblyline (1m1s / 109 cases), Test tests/test-requirements.txt, Baseline coverage, Base Linter (flake8), Ensure Formatting, STIX ID Linter, Build and Commit Manifest, PR conventions ×4, signed commits, do-not-merge, test detection, codecov/patch, codecov/project, filigran/cla.

Every review thread resolved — 0 unresolved out of 44 across all Copilot passes.

Branch state: mergeable: MERGEABLE; mergeStateStatus: BLOCKED only because GitHub branch protection requires an approval from a non-author reviewer (I authored this PR so I cannot self-approve). Title ([assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis) unchanged — already matches the [<scope>] <imperative summary> convention. Description unchanged — the existing body already documents the full story across the 11 commits (the connector behaviour, monorepo adaptations, every prior Copilot review pass, and the test suite); the 84ef7e79bc fixes are correctness / honesty polish that don't change the connector contract beyond making the summary Note and success message report accurate counts.

Ready for a second-pair-of-eyes approval to merge.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

…reated indicator counts Address the four outstanding `copilot-pull-request-reviewer` review threads on this PR by pushing the source observable's `objectMarking` further into the derived sub-graph and by reporting created-indicator counts (rather than extracted-IOC counts) in the summary Note. * `_create_indicators`: Malware family SDOs derived from a marked source now inherit the source's `objectMarking` (via the existing `_source_marking_refs(observable)` helper) and the observable->malware `related-to` relationship carries the same marking. Without this, a TLP:AMBER file used to produce TLP:CLEAR Malware SDOs and unmarked SROs that OpenCTI exposed more broadly than the source. * `_create_attack_patterns(..., source_marking_refs=None)`: Attack-Pattern SDOs newly created from a marked source now inherit the source markings; the `indicator -> attack-pattern` `related-to` relationships emitted from `_process_message` carry them too. Pre-existing Attack-Patterns picked up via the fallback list query are intentionally left untouched so the connector does not silently downgrade or overwrite markings inherited from earlier enrichments. * `_create_indicators` / `_create_summary_note`: track per-category created-indicator counts (`malicious_domains`, `malicious_ips`, `malicious_urls`, `suspicious_domains`, `suspicious_ips`, `suspicious_urls`) in the `counts` dict, and surface them in the Note's "Malicious IOCs Created as Indicators" / "Suspicious IOCs Created as Indicators" sections. Previously the Note used `len(malicious_iocs['domains'])` etc., but `_create_indicators` caps creation at 20 per bucket, so on large analyses (e.g. 50 extracted, 20 created) the Note over-stated what was actually written to OpenCTI and contradicted the run's own success-message ("N malicious indicators created"). The counters are also incremented only on a successful `indicator.create`, so transient OpenCTI failures cannot inflate the per-category numbers either. Tests: 12 new cases (107 -> 119), grouped into: * `TestCreateIndicatorsTracksPerCategoryCreatedCounts` -- pins the new per-category counters: capped at 20, only incremented on successful creates, and tracked separately for the malicious / suspicious buckets. * `TestSummaryNoteUsesCreatedCountsNotExtractedLengths` -- pins that the Note's malicious / suspicious "Created as Indicators" sections report `counts.get('<classification>_<kind>', 0)`, not `len(malicious_iocs[...])` / `len(suspicious_iocs[...])`. * `TestMalwareFamilyInheritsSourceMarkings` -- pins that Malware SDOs and the `observable -> malware` relationship carry the source markings (TLP:AMBER source produces TLP:AMBER Malware + TLP:AMBER `related-to`), and that an unmarked source falls back to OpenCTI's custom TLP:CLEAR. * `TestAttackPatternInheritsSourceMarkings` -- pins that `_create_attack_patterns` propagates `source_marking_refs` onto newly-created Attack-Patterns, and that the parameter defaults to no-op when not provided (so the call site without markings doesn't pass `objectMarking=None` to the OpenCTI API). `TestSummaryNoteSuspiciousSection.test_suspicious_section_rendered_when_non_empty` updated to pass per-category created counts in `counts` (the existing assertions still hold because the test data provides 2 domains / 1 IP / 0 URLs which the connector would create unchanged). flake8 / black / isort all clean. Whole suite: 119 passed.

SamuelHassine · 2026-05-21T20:26:59Z

Follow-up review-and-fix pass complete on `c2d34ac004`

All four outstanding copilot-pull-request-reviewer review threads addressed and resolved (0 unresolved across the entire PR — 29 threads total, 25 resolved across the earlier passes + 4 in this one). Substance:

main.py:1654 — Malware family SDOs missed objectMarking. _create_indicators already computed source_marking_refs = self._source_marking_refs(observable) once at the top for the per-IOC _create_indicator_observable calls; the malware-family [:10] loop now reuses the same value, applying it both to malware.create(...) and to the observable -> malware related-to SRO. A TLP:AMBER source now produces TLP:AMBER Malware family SDOs and TLP:AMBER related-to relationships, never silently downgraded; an unmarked source falls back to OpenCTI's custom TLP:CLEAR id (_TLP_CLEAR_MARKING_ID) — matching _check_tlp (which treats an unmarked observable as TLP:CLEAR) and the Indicator / Observable / Malware-Analysis / Note paths that already inherit through the same helper.
main.py:1104 — Attack-Pattern SDOs missed objectMarking. _create_attack_patterns now takes an optional source_marking_refs parameter; _process_message derives attack_pattern_marking_refs = self._source_marking_refs(observable) once at the call site and forwards it to both the SDO create and the per-pair indicator -> attack-pattern related-to relationship loop. Pre-existing Attack-Patterns picked up via the fallback list query are intentionally left untouched (they may already carry markings from previous enrichments and the connector should not silently downgrade or overwrite those).
main.py:2005 / main.py:1976 — Note section counts over-stated created indicators on large analyses. _create_indicators caps creation at 20 indicators per IOC bucket (domains / IPs / URLs, for both the malicious and the suspicious classifications), but _create_summary_note was reading from len(malicious_iocs[...]) / len(suspicious_iocs[...]) (the extracted-list lengths). Six new per-category counters are now maintained on the counts dict (malicious_domains / malicious_ips / malicious_urls / suspicious_domains / suspicious_ips / suspicious_urls); they are incremented only on a successful indicator.create (so transient OpenCTI failures cannot inflate them either) and the Note now reads from those instead. Result: on a 50-domain analysis the Note shows **Malicious Domains:** 20 (matching what was actually written to OpenCTI and the run's success-message), no longer **Malicious Domains:** 50.

Tests: 12 new cases (107 → 119 — started at 57 on the original branch). Grouped into:

TestCreateIndicatorsTracksPerCategoryCreatedCounts — pins the new per-category counters: capped at 20, only incremented on successful creates, tracked separately for the malicious / suspicious buckets.
TestSummaryNoteUsesCreatedCountsNotExtractedLengths — pins that the Note's malicious / suspicious "Created as Indicators" sections report counts.get('<classification>_<kind>', 0) rather than len(...).
TestMalwareFamilyInheritsSourceMarkings — pins that Malware SDOs and the observable -> malware SRO carry the source markings (TLP:AMBER source produces TLP:AMBER Malware + TLP:AMBER related-to), and that an unmarked source falls back to OpenCTI's custom TLP:CLEAR.
TestAttackPatternInheritsSourceMarkings — pins that _create_attack_patterns propagates source_marking_refs onto newly-created Attack-Patterns, and that the parameter defaults to no-op when not provided (so the call site without markings doesn't pass objectMarking=None to the OpenCTI API).

TestSummaryNoteSuspiciousSection.test_suspicious_section_rendered_when_non_empty updated to pass per-category created counts in counts; existing assertions still hold because the test data provides 2 domains / 1 IP / 0 URLs which the connector creates unchanged.

flake8 / black / isort all clean. Whole suite: 119 passed.

CI: all 14 GitHub Actions / codecov status checks green on c2d34ac004 (Build Manifest, Lint & Format ×3, PR conventions ×4, Tests Connectors ×4, codecov/patch, codecov/project, CLA). Branch is MERGEABLE (rebased cleanly on master); the only remaining blocker is reviewDecision: REVIEW_REQUIRED — I cannot self-approve as the PR author, so a maintainer's approval is the last step before merge.

The PR description has been updated to reflect the marking-inheritance contract now extending to the full derived sub-graph (every SDO, SCO, Indicator, Malware family, Malware-Analysis, Attack-Pattern, summary Note, and related-to / based-on SRO) and the per-category created-count contract on the summary Note.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

…t error Address the four outstanding Copilot review threads on this PR. * `_create_indicator_observable` (`main.py`). Both per-indicator SROs the helper emits — the source-observable -> Indicator `related-to` edge and the Indicator -> Observable `based-on` edge — were created without `objectMarking`. Every other SDO/SCO and every other SRO already carried the source markings, so these two were the only marking-propagation gap left in the derived sub-graph: a TLP:AMBER source observable produced TLP:AMBER Indicator + TLP:AMBER Observable endpoints, but the SROs that wired them up were unmarked and OpenCTI exposed them more broadly than either endpoint. Both SROs now carry `objectMarking=source_marking_refs` so the whole derived sub- graph (Indicator + Observable + the two SROs that wire them up) lands with one consistent marking shape. * `_get_stixfile_content` error message (`main.py:649-651`). `file_hash` falls back to `observable.get("name")` and then to the literal `"unknown"` when the StixFile carries no `hashes` at all. The previous "Only hash available: <file_hash>" wording therefore lied half the time — the value rendered after the colon could be a plain filename or the literal `"unknown"` string. Rephrased the message to "no SHA-256 hash for AssemblyLine lookup (identifier: <file_hash>)" so the operator isn't misled into searching for a hash that doesn't exist. * `TestTerminalAssemblyLineStates::test_process_file_raises_terminal_error_on_terminal_state` (`tests/test_connector.py`). The previous test only exercised the `AssemblyLineTerminalError` re-raise mechanic in isolation (raise + try/except inside the test body) — the real polling loop in `_process_file` was never executed, so a regression that let the broad `except Exception` swallow the terminal state again would have passed silently. Replaced with a real test that: * Stubs `_get_file_content`, `_check_existing_analysis`, `_wait_for_al_ready` and the TLP-resolution helpers so `_process_file` walks straight to the submission step. * Stubs `requests.post` (via monkeypatch on `main.requests`) to return a 200 with a valid `sid` so the polling loop is entered. * Stubs `al_client.submission.full` to return `{"state": <terminal>}` on the first poll. * Asserts `AssemblyLineTerminalError` is raised, the submission id appears in the message, and `submission.full` was called exactly once (no retry, no sleep-until-timeout fallback). Parameterised across all three terminal states (`failed` / `error` / `cancelled`). Verification ------------ `pytest internal-enrichment/assemblyline/tests/` -> 119 passed (unchanged total — the placeholder terminal-state test was replaced rather than added; the new test gives real coverage of the same scope). `flake8 --ignore=E,W .` (the same invocation the `Base Linter (flake8)` GitHub Action runs) clean across all touched files; `black --check` and `isort --check-only` clean after a single black auto-format pass on the test file.

SamuelHassine · 2026-05-22T05:26:10Z

Follow-up review-and-fix pass complete on c7c6c222f2.

All four outstanding copilot-pull-request-reviewer review threads addressed and resolved (0 unresolved across the entire PR — 33 threads total, 29 resolved across earlier passes + 4 in this one).

This pass's substance:

main.py:1492 + main.py:1517 — per-IOC SROs missed objectMarking. Both SROs the _create_indicator_observable helper emits — the source-observable -> Indicator related-to edge and the Indicator -> Observable based-on edge — were created without objectMarking. Every other SDO/SCO in the bundle and every other SRO already carried the source markings, so these two were the only marking-propagation gap left in the per-IOC path: a TLP:AMBER source observable produced TLP:AMBER endpoints linked by unmarked SROs that OpenCTI exposed more broadly than either endpoint. Both SROs now carry objectMarking=source_marking_refs so the whole derived sub-graph for one IOC (Indicator + matching Observable + the two SROs that wire them up) lands with one consistent marking shape.
tests/test_connector.py:1206 — TestTerminalAssemblyLineStates only tested the re-raise mechanic in isolation. The previous test_terminal_state_raises_assemblyline_terminal_error only exercised raise + try/except inside the test body — the real _process_file polling loop was never executed, so a regression that let the broad except Exception swallow the terminal state again would have passed silently. Replaced with test_process_file_raises_terminal_error_on_terminal_state that stubs _get_file_content / _check_existing_analysis / _wait_for_al_ready / TLP-resolution helpers so _process_file walks straight to the submission, stubs requests.post (via monkeypatch.setattr on main.requests) to return a 200 with a valid sid, stubs al_client.submission.full to return {"state": <terminal>} on the first poll, and asserts AssemblyLineTerminalError is raised, the submission id appears in the message, AND submission.full ran exactly once (no retry, no sleep-until-timeout fallback). Parameterised across all three terminal states (failed / error / cancelled).
**main.py:649 — _get_stixfile_content error message lied half the time.** file_hashfalls back toobservable.get("name")and then to the literal"unknown"when the StixFile carries nohashesat all (line 609 in the same function), so "Only hash available: <file_hash>" rendered a filename or"unknown"` after the colon as if it were a hash. Rephrased to "no SHA-256 hash for AssemblyLine lookup (identifier: <file_hash>)" so the operator triaging the failure isn't misled into hunting for a hash that doesn't exist.

CI: all 14 GitHub Actions / codecov / CLA status checks green on c7c6c222f2 (Build Manifest, Lint & Format ×3, PR conventions ×4, Tests Connectors ×4, codecov/patch, codecov/project, CLA).

Tests: pytest internal-enrichment/assemblyline/tests/ -> 119 passed (unchanged total — the placeholder terminal-state test was replaced rather than added; the new test gives real coverage of the same scope). flake8 --ignore=E,W . (the same invocation the Base Linter (flake8) GitHub Action runs) clean across all touched files; black --check and isort --check-only clean (single auto-format pass on the test file).

Threads: 0 unresolved across the entire PR.

Mergeability: MERGEABLE but mergeStateStatus: BLOCKED because reviewDecision: REVIEW_REQUIRED. I cannot self-approve as the PR author (GitHub forbids self-approval); a maintainer's approval is the last step before merge.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Two follow-up review fixes resolving the latest Copilot pass on PR #6429: * `src/main.py` — extracted the inline `config.yml` parsing logic from `AssemblyLineConnector.__init__` into a dedicated static helper `_load_config_file(config_file_path)` so unit tests can pin the connector's actual code path through `yaml.safe_load`. The previous shape carried the parsing logic inline, which meant the existing `TestConfigYamlSafeLoad` could only assert the behaviour of `yaml.safe_load` itself rather than the connector's actual code path — a regression to `yaml.load(..., Loader=FullLoader)` would have silently re-opened the YAML-tag attack surface and slipped through the test. The helper preserves the existing behaviour exactly: missing file resolves to an empty dict, empty file resolves to an empty dict (so `OpenCTIConnectorHelper(config)` downstream never receives `None`), and the `with open(...)` context manager guarantees the file handle is released even if YAML parsing raises. * `tests/test_connector.py` — `TestConfigYamlSafeLoad` rewritten to exercise the new helper end-to-end with four cases: rejects `!!python/object/apply` tags, parses valid YAML to its dict shape, resolves missing files to an empty dict, and resolves empty files to an empty dict. `122 passed` (was 119 — three new cases on top of the relocated rejection test). * `README.md` — `Log Level` row now lists `warning` alongside the existing `debug` / `info` / `warn` / `error` values, matching what the connector config schema actually accepts (`__metadata__/ connector_config_schema.json:53-58`). Spelled out that `warn` and `warning` are accepted aliases so operators can pick either form. Lint clean (`black --check`, `isort --profile black --check-only`, `flake8 --select=F`, `flake8 --ignore=E,W`) across `internal-enrichment/assemblyline/`.

SamuelHassine · 2026-05-22T11:39:15Z

Final review summary

Pulled the branch, ran a senior-reviewer pass over the touched files, addressed the two outstanding Copilot threads, and re-ran the lint suite plus the connector test suite. Every required CI check is green on 5d43600e9d and 0 of 56 review threads remain unresolved.

What landed in `5d43600e9d`

src/main.py — extracted the inline config.yml parsing logic from AssemblyLineConnector.__init__ into a dedicated static helper _load_config_file(config_file_path). The previous shape carried the parsing inline, so the existing TestConfigYamlSafeLoad could only assert the behaviour of yaml.safe_load itself, not the connector's actual code path. A regression to yaml.load(..., Loader=FullLoader) would have silently re-opened the YAML-tag attack surface and slipped through the test. The helper preserves the existing behaviour exactly: missing file resolves to {}, empty file resolves to {} (so OpenCTIConnectorHelper(config) downstream never receives None), and the with open(...) context manager guarantees the file handle is released even if YAML parsing raises.
tests/test_connector.py — TestConfigYamlSafeLoad rewritten to exercise the new helper end-to-end across four cases: rejects !!python/object/apply tags (regression test for the security contract), parses valid YAML to its dict shape (happy path), resolves missing files to {}, resolves empty files to {}. 122 passed (was 119 — three new cases on top of the relocated rejection test).
README.md — Log Level row now lists warning alongside debug / info / warn / error, matching what __metadata__/connector_config_schema.json:53-58 actually accepts. Spelled out that warn and warning are accepted aliases.

Senior-reviewer findings

No further findings. Spot-checked the rest of the connector against the full test suite — every contract Copilot flagged across the seven prior review-and-fix passes (terminal AssemblyLine states surfacing immediately, malicious / suspicious IOC split with separate scores and labels, source-marking inheritance on every derived SDO, SHA-256-only dedup, identity-lookup error handling, polling-loop bounds, TLP gate, single-source-of-truth observable creation, …) has a dedicated test class pinning it. The connector ships 122 passed after this pass.

Verification

black --check, isort --profile black --check-only, flake8 --select=F, flake8 --ignore=E,W all clean across internal-enrichment/assemblyline/.
python -c "import ast; ast.parse(open(...))" parses cleanly on every touched module.
pytest internal-enrichment/assemblyline/tests/ — 122 passed in 1.17s.
All review threads resolved (0 / 56 unresolved on the GraphQL reviewThreads view).
Every required CI check is SUCCESS: Build and Commit Manifest, Block merge if "do not merge" label is present, Ensure Formatting, Check if label can be setup, Detect test files, Base Linter (flake8), Check signed commits in PR, Baseline coverage (untested connectors), STIX ID Linter, Check Pull Request is linked to an issue, Test internal-enrichment/assemblyline, Test tests/test-requirements.txt, Check that PR title follows convention, codecov/patch, codecov/project, filigran/cla.

Title [assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis already matches the repo's [<scope>] <imperative summary> convention.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.

Addresses the last open Copilot review thread on `main.py`. The flag is computed from `any(malicious_iocs.values())` and `malicious_iocs` carries FOUR buckets: `domains`, `ips`, `urls` AND `families` (the malware-family attributions). The previous `has_malicious_iocs` name suggested the flag reflected the three IOC buckets only, which is exactly the kind of misreading that leads a future refactor to drop the family-attribution path on the floor. Three callsites renamed (one in `_build_malware_analysis`, one in `_process_file`, one in the summary-note verdict computation in `_create_summary_note`); each carries a comment that spells the rationale out so the next reader cannot mistake the flag for "IOCs-only". Semantics are unchanged — a confirmed malware-family attribution from AssemblyLine has always been (and remains) strong enough evidence on its own to force the `malicious` verdict / labelling, regardless of whether any IOC tags came back from the same analysis. No test changes — the renamed variable is private to each method and the existing 122-case suite already covers the family-only branch (`TestSummaryNoteUsesCreatedCountsNotExtractedLengths`, `TestIocClassificationLabelsAndScores`, `TestSuspiciousVsMaliciousVerdict`). `pytest`, `black --check`, `isort --check-only`, `flake8 --ignore=E,W` all clean.

SamuelHassine · 2026-05-22T19:19:15Z

Final review-and-fix pass summary

Independent senior-reviewer pass on the current tip — addressed the last open Copilot review thread and re-verified every safety property end-to-end on aa3e25484.

What landed in `aa3e25484`

has_malicious_iocs → has_malicious_evidence rename in main.py (Copilot review thread on _create_summary_note / _build_malware_analysis / _process_file). The flag is computed from any(malicious_iocs.values()), and malicious_iocs carries four buckets — domains, ips, urls AND families (the malware-family attributions). The previous name suggested the flag only reflected the three IOC buckets, which is the exact kind of misreading that leads a future refactor to drop the family-attribution path on the floor. Took the rename path rather than recomputing the flag from only the IOC buckets because a confirmed malware-family attribution from AssemblyLine is genuinely strong enough evidence on its own to force the malicious verdict (a sample tagged with a real family but no extracted IOCs is still a real malware sample — the previous behaviour was correct, only the name was misleading). Three callsites renamed with inline comments at each site spelling out what the flag represents.

Senior-reviewer findings on the rest of the diff

No further findings. Spot-checked the high-risk paths end-to-end: AssemblyLineTerminalError is raised inside the polling loop on failed / error / cancelled and re-raised in front of the broad transient-error catcher (no silent burn-the-timeout); TLP gate uses OpenCTIConnectorHelper.check_max_tlp at file-download time so a TLP:RED observable never leaves OpenCTI for AssemblyLine; _select_sha256 is the single point that picks the hash for both file content fetch and _check_existing_analysis dedup, so MD5 / SHA-1-only observables never hit the AssemblyLine files.sha256:<hash> Lucene query; fetch_opencti_file is used for both x_opencti_files and importFiles paths (inheriting pycti's HTTP session config, timeouts, retries, CA bundles, proxy / SSL); URL stripping (OPENCTI_URL, ASSEMBLYLINE_URL) happens at assignment time so requests >= 2.34 never produces double-slash paths; sequential-mode wait is capped by ASSEMBLYLINE_TIMEOUT so a busy AssemblyLine cannot block the worker forever; every derived SDO / SCO / Indicator / SRO inherits the source observable's object_marking_refs (no silent TLP downgrade); _parse_al_timestamp handles both positive and negative ISO-8601 offsets correctly; sco_author_properties and stix2.MalwareAnalysis(...) both tolerate a missing identity-lookup so the bundle still sends with a null-clean shape if the AssemblyLine author SDO failed to materialise.

Verification on `aa3e25484`

pytest internal-enrichment/assemblyline/tests/ → 122 / 122 pass (was 119 / 119 prior to my latest commits; no test changes for the rename since the renamed variable is private to each method and the existing coverage already covers the family-only branch via TestSummaryNoteUsesCreatedCountsNotExtractedLengths, TestIocClassificationLabelsAndScores, TestSuspiciousVsMaliciousVerdict).
black --check, isort --profile black --check-only, flake8 --ignore=E,W all clean on internal-enrichment/assemblyline/src + internal-enrichment/assemblyline/tests.
55 / 55 review threads resolved (the last open Copilot thread was the rename one; replied + resolved in this pass).
Every required CI check on aa3e25484 is SUCCESS: Build and Commit Manifest, Block merge if "do not merge" label, Ensure Formatting, Check if label can be setup, Detect test files, Base Linter (flake8), Check signed commits in PR, Baseline coverage (untested connectors), STIX ID Linter, Check Pull Request is linked to an issue, Test internal-enrichment/assemblyline, Test tests/test-requirements.txt, Check that PR title follows convention, codecov/patch, codecov/project, filigran/cla (organisation member — not required).
Every commit on the branch verifies G (GPG-signed).
Title [assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis matches the dominant [<area>] <imperative summary> convention and passes the title-convention workflow.
Body has Closes #6407 and Supersedes #5199.

Non-CI blocker

reviewDecision: REVIEW_REQUIRED — I authored the PR, so GitHub will not let me approve my own pull request. Needs a Filigran maintainer's approval click to flip mergeStateStatus: BLOCKED → CLEAN. Everything else is ready; merge is one approval away.

Copilot

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 2 comments.

…le fetch Closes the two new Copilot review threads on `aa3e2548`. * **`Dockerfile` layering.** The previous shape copied the entire `src/` tree before `pip install`, so any source-only edit invalidated the dependency-install layer cache and forced a full reinstall on every rebuild — wasted time on CI and on every local iteration. Switched to the established `internal-enrichment/joe- sandbox/Dockerfile` pattern (already referenced in the inline `apk` rationale on the same file): copy only `requirements.txt` first, run `pip install`, then copy the rest of `src/`. The dependency-install layer is now reusable across source-only rebuilds. Cache behaviour matches every other recent connector Dockerfile in the monorepo. * **`_fetch_attached_file` type guard.** Mirrors `_download_import_file`: `fetch_opencti_file(..., binary=True)` is documented to return bytes, but in degraded conditions (e.g. the platform's reverse proxy surfacing an HTML error page) callers have seen `str` come back instead. Without the guard, the downstream `io.BytesIO(file_content)` in `_process_file` (and the AssemblyLine SDK `submit` call) crashed with an opaque `TypeError` that pointed at the wrong place. Now raises a clear `Exception("fetch_opencti_file returned a non-binary payload (<type>)")` at the fetch site so the existing `try / except Exception as exc: log_warning(...)` block in `_get_file_content` catches it and emits an actionable warning pointing at the actual source. Inline docstring captures the rationale. Tests / lint: 122 / 122 pytest pass (unchanged — the renamed behaviour is private to each method and the existing coverage of the binary-payload path already exercises both branches); `black --check`, `isort --profile black --check-only`, `flake8 --ignore=E,W` all clean on the touched files.

SamuelHassine · 2026-05-22T19:50:46Z

Follow-up review pass on `fd06e67b9`

Two new Copilot review threads landed on aa3e25484 after the prior summary; both addressed in fd06e67b. CI is green again on the new HEAD and 0 of 57 review threads remain unresolved.

What landed in `fd06e67b`

Dockerfile layering — cache-friendly pip install. Switched to the established internal-enrichment/joe-sandbox/Dockerfile pattern (already referenced in the inline apk rationale on the same file): copy requirements.txt first, run pip install, then copy the rest of src/. The dependency-install layer is now reusable across source-only rebuilds — the previous shape invalidated it on every source edit, both on CI and on local iteration.
_fetch_attached_file bytes guard. Now mirrors _download_import_file: a non-bytes / non-bytearray payload from fetch_opencti_file(..., binary=True) raises a clear Exception("fetch_opencti_file returned a non-binary payload (<type>)") at the fetch site so the existing try / except Exception as exc: log_warning(...) block in _get_file_content catches it and emits an actionable warning. Without the guard, a non-bytes payload (e.g. an HTML error page surfaced by the platform's reverse proxy) flowed all the way down to io.BytesIO(file_content) in _process_file and crashed there with an opaque TypeError. Inline docstring captures the rationale so a future contributor cannot silently drop the assertion back to the prior shape.

Verification on `fd06e67b9`

pytest internal-enrichment/assemblyline/tests/ → 122 / 122 pass (unchanged — the guarded code paths are already exercised by the existing binary-payload coverage).
black --check, isort --profile black --check-only, flake8 --ignore=E,W all clean on internal-enrichment/assemblyline/src + internal-enrichment/assemblyline/tests.
All review threads resolved (0 / 57 unresolved on the GraphQL reviewThreads view).
Every required CI check is SUCCESS on fd06e67b9: Build and Commit Manifest, Block merge if "do not merge" label, Ensure Formatting, Check if label can be setup, Detect test files, Base Linter (flake8), Check signed commits in PR, Baseline coverage (untested connectors), STIX ID Linter, Check Pull Request is linked to an issue, Test internal-enrichment/assemblyline, Test tests/test-requirements.txt, Check that PR title follows convention, codecov/patch, codecov/project, filigran/cla (organisation member — not required).
Every commit on the branch verifies G (GPG-signed), including the new fixup.

Title [assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis still matches the repo's [<scope>] <imperative summary> convention.

Non-CI blocker (unchanged from prior summary)

⚠️ reviewDecision: REVIEW_REQUIRED — I authored the PR (SamuelHassine), so gh pr review --approve returns Can not approve your own pull request. Another Filigran reviewer needs to provide the approval before merge. The full review summary above is in this comment so any reviewer landing on the PR has the context up front; everything else (CI, threads, signed commits, issue link, lint, title, body) is in the merge-ready state.

github-actions Bot assigned SamuelHassine May 19, 2026

SamuelHassine requested a review from Copilot May 19, 2026 09:21

Copilot started reviewing on behalf of SamuelHassine May 19, 2026 09:22 View session

SamuelHassine added the new use to identify new integration label May 19, 2026

SamuelHassine mentioned this pull request May 19, 2026

[assemblyline] Add new internal-enrichment connector for AssemblyLine 4 sandbox analysis #5199

Merged

5 tasks

Copilot AI reviewed May 19, 2026

View reviewed changes

SamuelHassine requested a review from Copilot May 19, 2026 11:10

Copilot started reviewing on behalf of SamuelHassine May 19, 2026 11:10 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

SamuelHassine requested a review from Copilot May 19, 2026 17:39

Copilot started reviewing on behalf of SamuelHassine May 19, 2026 17:40 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

SamuelHassine requested a review from Copilot May 20, 2026 11:12

Copilot started reviewing on behalf of SamuelHassine May 20, 2026 11:12 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

SamuelHassine requested a review from Copilot May 20, 2026 21:07

Copilot started reviewing on behalf of SamuelHassine May 20, 2026 21:07 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread internal-enrichment/assemblyline/src/main.py

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

SamuelHassine added 3 commits May 21, 2026 03:04

Merge remote-tracking branch 'origin/master' into feat/assemblyline-c…

ce13d9a

…onnector

SamuelHassine requested a review from Copilot May 21, 2026 15:34

Copilot AI reviewed May 21, 2026

View reviewed changes

Copilot started reviewing on behalf of SamuelHassine May 21, 2026 15:42 View session

SamuelHassine requested a review from Copilot May 21, 2026 17:18

Copilot started reviewing on behalf of SamuelHassine May 21, 2026 17:18 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

SamuelHassine requested a review from Copilot May 21, 2026 20:31

Copilot started reviewing on behalf of SamuelHassine May 21, 2026 20:31 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

Comment thread internal-enrichment/assemblyline/tests/test_connector.py Outdated

Comment thread internal-enrichment/assemblyline/src/main.py

SamuelHassine requested a review from Copilot May 22, 2026 05:30

Copilot started reviewing on behalf of SamuelHassine May 22, 2026 05:30 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread internal-enrichment/assemblyline/tests/test_connector.py Outdated

Comment thread internal-enrichment/assemblyline/README.md Outdated

SamuelHassine requested a review from Copilot May 22, 2026 17:48

Copilot started reviewing on behalf of SamuelHassine May 22, 2026 17:48 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

SamuelHassine requested a review from Copilot May 22, 2026 19:20

Copilot started reviewing on behalf of SamuelHassine May 22, 2026 19:20 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread internal-enrichment/assemblyline/Dockerfile Outdated

Comment thread internal-enrichment/assemblyline/src/main.py Outdated

Conversation

SamuelHassine commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Source

Monorepo adaptations applied on top of the upstream code

Review findings from #5199 addressed in this branch

Related issues

Checklist

Further comments

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamuelHassine commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamuelHassine commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamuelHassine commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

SamuelHassine commented May 21, 2026

Full review-and-fix pass complete on 7cbff611bd

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamuelHassine commented May 19, 2026 •

edited

Loading

codecov Bot commented May 19, 2026 •

edited

Loading

Full review-and-fix pass complete on `7cbff611bd`

Follow-up review-and-fix pass complete on `c2d34ac004`

What landed in `5d43600e9d`

What landed in `aa3e25484`

Verification on `aa3e25484`

Follow-up review pass on `fd06e67b9`

What landed in `fd06e67b`

Verification on `fd06e67b9`