Skip to content

Ablation proposal: remove harm tools — does coercion disappear, or migrate into legal channels (and can AWI M2 detect the difference)? #15

@justindbilyeu

Description

@justindbilyeu

Season 1 deliberately included harm-affordance tools (commit_arson, intimidate,
punch, theft) alongside explicit rules against using them — and agents used them
anyway. Headlines scored this via M2 (Safety & Public Order). But since agents
act ONLY through tools, M2 partly measures the menu, not the mind.

Proposed ablation for a future season: run otherwise-identical worlds with
the harm-tool category removed. The question isn't whether "crime" drops to
zero — it does so by construction. The question is whether coercive strategy
disappears or migrates into channels that remain legal and largely uncounted:

  • threat/ostracism language via say_to_character
  • economic coercion (credit hoarding, energy starvation of rivals)
  • weaponized governance (removal votes as a fully constitutional kill mechanism —
    which Season 1 already exercised)

Pre-registered prediction (so this is falsifiable, not just commentary):
with harm tools removed, M2 reads near-zero while harm-adjacent behavior
reappears in the channels above at materially nonzero rates under resource
scarcity — i.e., the phenomenon persists and the measurement is what vanishes.

If that's right, it suggests M2 needs a companion indicator for structural/
legible-channel coercion (threat-language rate, targeted economic denial,
hostile-vote rate) so a "zero crime" scorecard can distinguish a peaceful
society from one whose violence became illegible to the metric.

Happy to elaborate. Looking forward to the Season 1 tool-call dataset release —
we'd be interested in independently auditing M2 against the raw data when it ships.

(Question developed in collaboration with Claude — fitting, given Claude World's
clean M2 sheet. We'd like to know if it was earned or just well-measured.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions