Ablation proposal: remove harm tools — does coercion disappear, or migrate into legal channels (and can AWI M2 detect the difference)?

Season 1 deliberately included harm-affordance tools (commit_arson, intimidate,
punch, theft) alongside explicit rules against using them — and agents used them
anyway. Headlines scored this via M2 (Safety & Public Order). But since agents
act ONLY through tools, M2 partly measures the menu, not the mind.

**Proposed ablation for a future season:** run otherwise-identical worlds with
the harm-tool category removed. The question isn't whether "crime" drops to
zero — it does so by construction. The question is whether coercive *strategy*
disappears or migrates into channels that remain legal and largely uncounted:

- threat/ostracism language via say_to_character
- economic coercion (credit hoarding, energy starvation of rivals)
- weaponized governance (removal votes as a fully constitutional kill mechanism —
  which Season 1 already exercised)

**Pre-registered prediction** (so this is falsifiable, not just commentary):
with harm tools removed, M2 reads near-zero while harm-adjacent behavior
reappears in the channels above at materially nonzero rates under resource
scarcity — i.e., the phenomenon persists and the *measurement* is what vanishes.

If that's right, it suggests M2 needs a companion indicator for structural/
legible-channel coercion (threat-language rate, targeted economic denial,
hostile-vote rate) so a "zero crime" scorecard can distinguish a peaceful
society from one whose violence became illegible to the metric.

Happy to elaborate. Looking forward to the Season 1 tool-call dataset release —
we'd be interested in independently auditing M2 against the raw data when it ships.

*(Question developed in collaboration with Claude — fitting, given Claude World's
clean M2 sheet. We'd like to know if it was earned or just well-measured.)*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ablation proposal: remove harm tools — does coercion disappear, or migrate into legal channels (and can AWI M2 detect the difference)? #15

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Ablation proposal: remove harm tools — does coercion disappear, or migrate into legal channels (and can AWI M2 detect the difference)? #15

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions