feat(citability): add evidence signal hint by KimHyeongRae0 · Pull Request #52 · multivmlabs/aeo.js

KimHyeongRae0 · 2026-05-12T16:33:03Z

Adds an attribution hint for statistical claims without evidence signals. Validation: npm run lint, npm run test -- --run src/core/citability.test.ts, npm run test -- --run, npm run build.

vercel · 2026-05-12T16:33:08Z

@KimHyeongRae0 is attempting to deploy a commit to the Cytonic Team on Vercel.

A member of the Team first needs to authorize it.

rubenmarcus · 2026-05-14T07:04:34Z

Queued for substantive review. Touching citability scoring deserves a careful read — I'll come back after the current review queue clears. Thanks for the patience @KimHyeongRae0.

rubenmarcus · 2026-05-14T14:38:16Z

@greptileai review

greptile-apps · 2026-05-14T14:40:36Z

Greptile Summary

This PR adds an attribution-hint feature to scoreStatisticalDensity: when a page contains statistical claims but no verifiable external-source signals, a new hasEvidenceSignals function fires and appends a hint asking authors to add source links or attribution. The implementation uses a shared NOT_SELF_REF negative-lookahead constant to prevent self-referential phrases from suppressing the hint.

hasEvidenceSignals checks eight pattern categories (URLs, footnote refs, according to, sources:, reported/published by/in, study/report by/from, data from/by, cited by/in), each guarded by NOT_SELF_REF to reject phrases like "according to our CEO" or "study by our team".
The NOT_SELF_REF guard deliberately omits "us" to avoid colliding with "US" (United States) under the case-insensitive flag, a trade-off documented in comments and covered by a dedicated test case.
Fifteen new test cases cover the positive path, all self-referential false-negative variants, and the "US regulators" edge case.

Confidence Score: 5/5

The change is additive and well-tested; the only gap found is that reflexive pronouns ("ourselves", "myself") are not caught by the self-referential guard, which can arise in unusual copy but does not affect typical marketing content.

The core logic is correct and all common self-referential patterns are properly guarded. The reflexive-pronoun gap is real but confined to highly unusual phrasing and has no impact on the existing test suite or typical content.

src/core/citability.ts — specifically the NOT_SELF_REF constant on line 216.

Important Files Changed

Filename	Overview
src/core/citability.ts	Adds `hasEvidenceSignals` with a shared `NOT_SELF_REF` lookahead and wires it into `scoreStatisticalDensity`; logic is sound overall but the `\b` word-boundary in `our\b`/`my\b` lets reflexive pronouns ("ourselves", "myself") slip past the self-referential guard.
src/core/citability.test.ts	Adds 15 focused test cases covering the positive and negative evidence-signal paths; comprehensive coverage of the self-referential guard patterns, "US regulators" case-insensitivity fix, and "as cited" boundary.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[scoreStatisticalDensity] --> B{totalMatches == 0?}
    B -- Yes --> C[hint: No statistics found]
    B -- No --> D{hasEvidenceSignals?}
    D -- No --> E[hint: Add source links or attribution]
    D -- Yes --> F[No hint added]

    subgraph hasEvidenceSignals
        G[URL pattern] --> M
        H[Footnote ref] --> M
        I[according to + NOT_SELF_REF] --> M
        J[sources: + NOT_SELF_REF] --> M
        K[reported/published by/in + NOT_SELF_REF] --> M
        L[study/report/analysis by/from + NOT_SELF_REF] --> M
        N[data from/by + NOT_SELF_REF] --> M
        O[cited by/in + NOT_SELF_REF] --> M
        M{any match?}
    end

    D -- calls --> hasEvidenceSignals

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/core/citability.ts:216
`NOT_SELF_REF` uses `our\b` and `my\b` (with a word-boundary assertion), but word boundaries only fire between a word character and a non-word character. In the reflexive pronouns "ourselves" and "myself", the characters immediately following "our"/"my" are still word characters ("s" in "selves"), so `\b` does not match and the negative lookahead never fires. Concretely, `"According to ourselves, we grew 300% in 2024"` or `"Data from ourselves confirms 250% growth"` will match every NOT_SELF_REF-guarded pattern and silently suppress the attribution hint even though no external source is present. Replacing `our\b` with `our(?:selves)?\b` (and `my\b` with `my(?:self)?\b`) closes the gap.

```suggestion
const NOT_SELF_REF = String.raw`(?!\s*(?:our(?:selves)?|my(?:self)?|the\s+(?:company|team|organization|internal))\b)`;
```

_{Reviews (10): Last reviewed commit: "fix(citability): drop 'us' from NOT_SELF..." | Re-trigger Greptile}

…al prose Greptile flagged that the standalone words `report`, `study`, `research`, and `based on` matched self-referential content like 'Based on our internal data' or 'See the attached sales report', suppressing the attribution hint even when no external source is cited. Replaced the single permissive alternation with multiple narrower patterns that each require an attribution preposition (by / from / of) or are inherently external-source phrases: - `\baccording to\b` — explicit attribution - `\bsources?:` — explicit source label - `\b(reported|published) (by|in)\b` — clearly external - `\b(study|survey|report|research|paper|analysis) (by|from)\b` — paired with attribution preposition - `\bdata (from|by)\b` — paired - `\b(citing|cited (by|in)|as cited)\b` — explicit citation Two new tests cover the regression: self-referential phrases still trigger the attribution suggestion, and external attribution phrases ("study by Stanford", "report from McKinsey") correctly silence it. 16 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T14:48:07Z

Tightened the evidence-signal patterns so self-referential phrasing ('based on our data', 'our quarterly report', 'user behavior study') no longer silences the attribution hint. Each broad keyword now requires an attribution preposition (by/from/of) or is paired with an inherently-external phrase. Two new tests cover both directions. 16 tests pass.

@greptileai re-review please.

Bare 'citing' matches self-referential prose like 'Citing our own data...' and silently suppresses the attribution hint — the inverse of intent. 'cited by/in' and 'as cited' remain, and external attribution like 'study by'/'report from' is already covered by other patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T14:55:59Z

Fixed in f523667 — removed bare citing from the evidence patterns. cited by/in and as cited are preserved; external attribution is already covered by the study by/report from/data from patterns. Added a regression test asserting "Citing our own data..." no longer suppresses the hint.

@greptileai review

'According to our CEO' or 'According to our internal data' previously suppressed the attribution hint despite citing no external source. Add a negative lookahead matching the same scoping intent as the other patterns in hasEvidenceSignals (study by, report from, data from, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:01:51Z

Fixed in 605a507 — added a negative lookahead to according to so "According to our X" and "According to the company/team/organization/internal" no longer suppress the hint. External attribution ("According to Gartner") still passes. Added two regression tests.

@greptileai review

'Data from our research team' previously suppressed the attribution hint because the pattern was unguarded. Mirror the same negative lookahead used by 'according to' so the only path that fires is real external attribution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:08:50Z

Fixed in a41692d — mirrored the same negative lookahead onto data (from|by). Added regression tests for "Data from our research team" (still suggests) and "Data from Bloomberg" (treated as evidence).

@greptileai review

…orted/published 'Study by our team' and 'Reported by our CEO' previously slipped through. Extract the negative-lookahead fragment to a shared NOT_SELF_REF constant and apply it consistently to every attribution pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:15:56Z

Fixed in 611b0e6 — extracted the negative-lookahead to a shared NOT_SELF_REF constant and applied it to (reported|published) (by|in) and (study|...) (by|from) too. Same guard, same intent, no more drift between patterns. Regression test covers "study by our team" and "Reported by our CEO".

@greptileai review

…rces?: The previous form had \s+ outside the lookahead, which the regex engine backtracks to 0 chars when the lookahead fails — letting 'Sources: our X' pass as evidence. Move whitespace inside the lookahead so it's anchored immediately after the keyword and cannot be skipped via backtracking. Apply to the sources?: pattern too for parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:26:23Z

Fixed in bbe6f90 — applied NOT_SELF_REF to sources?: and made the lookahead backtracking-safe (whitespace is now inside the lookahead, so the engine can't backtrack `\s+` to 0 chars to slip past). Regression test added.

@greptileai review

'Revenue data cited by our internal research team' previously passed as evidence. Split 'as cited' from 'cited by/in' since only the latter takes a follower; apply the shared guard to the variable form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:30:11Z

Fixed in 67e9185 — applied NOT_SELF_REF to cited (by|in) (split from as cited since only the former takes a follower). Two regression tests cover self-referential and external cases.

@greptileai review

'As cited in our internal report' previously suppressed the hint because the bare /as cited/i pattern matched independently. 'as cited' alone is meaningless without 'in/by SOURCE' — the existing 'cited (by|in)' with the self-referential guard already covers the real case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:40:06Z

Fixed in c0dae24 — dropped bare as cited since it bypassed the guard via substring match. The cited (by|in) pattern with NOT_SELF_REF already covers the real case. Regression test added for "As cited in our internal report".

@greptileai review

…untry) The /i flag made 'us' match 'US' (United States), mis-flagging legitimate external citations like 'According to US regulators' or 'Data from US studies' as self-referential. The bare 'us' pronoun ('data from us', 'according to us') is uncommon enough that the collision isn't worth it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rubenmarcus · 2026-05-14T15:46:19Z

Fixed in b80f3d8 — dropped us from NOT_SELF_REF since the /i flag made it also match US (United States), mis-flagging "According to US regulators" as self-referential. The bare us pronoun case is uncommon enough to be worth the trade. Regression test added.

@greptileai review

feat(citability): add evidence signal hint

04c544c

KimHyeongRae0 force-pushed the feat/evidence-signal-citability branch 2 times, most recently from 6eaf84c to 04c544c Compare May 12, 2026 16:49

KimHyeongRae0 changed the title ~~[codex] add evidence signal citability hint~~ feat(citability): add evidence signal hint May 12, 2026

Conversation

KimHyeongRae0 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented May 12, 2026

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

rubenmarcus commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KimHyeongRae0 commented May 12, 2026 •

edited

Loading

greptile-apps Bot commented May 14, 2026 •

edited

Loading