Skip to content

feat(citability): add evidence signal hint#52

Draft
KimHyeongRae0 wants to merge 10 commits into
multivmlabs:mainfrom
KimHyeongRae0:feat/evidence-signal-citability
Draft

feat(citability): add evidence signal hint#52
KimHyeongRae0 wants to merge 10 commits into
multivmlabs:mainfrom
KimHyeongRae0:feat/evidence-signal-citability

Conversation

@KimHyeongRae0
Copy link
Copy Markdown
Contributor

@KimHyeongRae0 KimHyeongRae0 commented May 12, 2026

Adds an attribution hint for statistical claims without evidence signals. Validation: npm run lint, npm run test -- --run src/core/citability.test.ts, npm run test -- --run, npm run build.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

@KimHyeongRae0 is attempting to deploy a commit to the Cytonic Team on Vercel.

A member of the Team first needs to authorize it.

@KimHyeongRae0 KimHyeongRae0 force-pushed the feat/evidence-signal-citability branch 2 times, most recently from 6eaf84c to 04c544c Compare May 12, 2026 16:49
@KimHyeongRae0 KimHyeongRae0 changed the title [codex] add evidence signal citability hint feat(citability): add evidence signal hint May 12, 2026
@rubenmarcus
Copy link
Copy Markdown
Member

Queued for substantive review. Touching citability scoring deserves a careful read — I'll come back after the current review queue clears. Thanks for the patience @KimHyeongRae0.

@rubenmarcus
Copy link
Copy Markdown
Member

@greptileai review

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR adds an attribution-hint feature to scoreStatisticalDensity: when a page contains statistical claims but no verifiable external-source signals, a new hasEvidenceSignals function fires and appends a hint asking authors to add source links or attribution. The implementation uses a shared NOT_SELF_REF negative-lookahead constant to prevent self-referential phrases from suppressing the hint.

  • hasEvidenceSignals checks eight pattern categories (URLs, footnote refs, according to, sources:, reported/published by/in, study/report by/from, data from/by, cited by/in), each guarded by NOT_SELF_REF to reject phrases like "according to our CEO" or "study by our team".
  • The NOT_SELF_REF guard deliberately omits "us" to avoid colliding with "US" (United States) under the case-insensitive flag, a trade-off documented in comments and covered by a dedicated test case.
  • Fifteen new test cases cover the positive path, all self-referential false-negative variants, and the "US regulators" edge case.

Confidence Score: 5/5

The change is additive and well-tested; the only gap found is that reflexive pronouns ("ourselves", "myself") are not caught by the self-referential guard, which can arise in unusual copy but does not affect typical marketing content.

The core logic is correct and all common self-referential patterns are properly guarded. The reflexive-pronoun gap is real but confined to highly unusual phrasing and has no impact on the existing test suite or typical content.

src/core/citability.ts — specifically the NOT_SELF_REF constant on line 216.

Important Files Changed

Filename Overview
src/core/citability.ts Adds hasEvidenceSignals with a shared NOT_SELF_REF lookahead and wires it into scoreStatisticalDensity; logic is sound overall but the \b word-boundary in our\b/my\b lets reflexive pronouns ("ourselves", "myself") slip past the self-referential guard.
src/core/citability.test.ts Adds 15 focused test cases covering the positive and negative evidence-signal paths; comprehensive coverage of the self-referential guard patterns, "US regulators" case-insensitivity fix, and "as cited" boundary.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[scoreStatisticalDensity] --> B{totalMatches == 0?}
    B -- Yes --> C[hint: No statistics found]
    B -- No --> D{hasEvidenceSignals?}
    D -- No --> E[hint: Add source links or attribution]
    D -- Yes --> F[No hint added]

    subgraph hasEvidenceSignals
        G[URL pattern] --> M
        H[Footnote ref] --> M
        I[according to + NOT_SELF_REF] --> M
        J[sources: + NOT_SELF_REF] --> M
        K[reported/published by/in + NOT_SELF_REF] --> M
        L[study/report/analysis by/from + NOT_SELF_REF] --> M
        N[data from/by + NOT_SELF_REF] --> M
        O[cited by/in + NOT_SELF_REF] --> M
        M{any match?}
    end

    D -- calls --> hasEvidenceSignals
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/core/citability.ts:216
`NOT_SELF_REF` uses `our\b` and `my\b` (with a word-boundary assertion), but word boundaries only fire between a word character and a non-word character. In the reflexive pronouns "ourselves" and "myself", the characters immediately following "our"/"my" are still word characters ("s" in "selves"), so `\b` does not match and the negative lookahead never fires. Concretely, `"According to ourselves, we grew 300% in 2024"` or `"Data from ourselves confirms 250% growth"` will match every NOT_SELF_REF-guarded pattern and silently suppress the attribution hint even though no external source is present. Replacing `our\b` with `our(?:selves)?\b` (and `my\b` with `my(?:self)?\b`) closes the gap.

```suggestion
const NOT_SELF_REF = String.raw`(?!\s*(?:our(?:selves)?|my(?:self)?|the\s+(?:company|team|organization|internal))\b)`;
```

Reviews (10): Last reviewed commit: "fix(citability): drop 'us' from NOT_SELF..." | Re-trigger Greptile

Comment thread src/core/citability.ts Outdated
…al prose

Greptile flagged that the standalone words `report`, `study`, `research`,
and `based on` matched self-referential content like 'Based on our
internal data' or 'See the attached sales report', suppressing the
attribution hint even when no external source is cited.

Replaced the single permissive alternation with multiple narrower
patterns that each require an attribution preposition (by / from / of)
or are inherently external-source phrases:

- `\baccording to\b` — explicit attribution
- `\bsources?:` — explicit source label
- `\b(reported|published) (by|in)\b` — clearly external
- `\b(study|survey|report|research|paper|analysis) (by|from)\b` —
  paired with attribution preposition
- `\bdata (from|by)\b` — paired
- `\b(citing|cited (by|in)|as cited)\b` — explicit citation

Two new tests cover the regression: self-referential phrases still
trigger the attribution suggestion, and external attribution phrases
("study by Stanford", "report from McKinsey") correctly silence it.
16 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Tightened the evidence-signal patterns so self-referential phrasing ('based on our data', 'our quarterly report', 'user behavior study') no longer silences the attribution hint. Each broad keyword now requires an attribution preposition (by/from/of) or is paired with an inherently-external phrase. Two new tests cover both directions. 16 tests pass.

@greptileai re-review please.

Comment thread src/core/citability.ts Outdated
Bare 'citing' matches self-referential prose like 'Citing our own data...'
and silently suppresses the attribution hint — the inverse of intent.
'cited by/in' and 'as cited' remain, and external attribution like
'study by'/'report from' is already covered by other patterns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in f523667 — removed bare citing from the evidence patterns. cited by/in and as cited are preserved; external attribution is already covered by the study by/report from/data from patterns. Added a regression test asserting "Citing our own data..." no longer suppresses the hint.

@greptileai review

Comment thread src/core/citability.ts Outdated
'According to our CEO' or 'According to our internal data' previously
suppressed the attribution hint despite citing no external source. Add a
negative lookahead matching the same scoping intent as the other patterns
in hasEvidenceSignals (study by, report from, data from, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in 605a507 — added a negative lookahead to according to so "According to our X" and "According to the company/team/organization/internal" no longer suppress the hint. External attribution ("According to Gartner") still passes. Added two regression tests.

@greptileai review

Comment thread src/core/citability.ts Outdated
'Data from our research team' previously suppressed the attribution hint
because the pattern was unguarded. Mirror the same negative lookahead used
by 'according to' so the only path that fires is real external attribution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in a41692d — mirrored the same negative lookahead onto data (from|by). Added regression tests for "Data from our research team" (still suggests) and "Data from Bloomberg" (treated as evidence).

@greptileai review

Comment thread src/core/citability.ts Outdated
…orted/published

'Study by our team' and 'Reported by our CEO' previously slipped through.
Extract the negative-lookahead fragment to a shared NOT_SELF_REF constant
and apply it consistently to every attribution pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in 611b0e6 — extracted the negative-lookahead to a shared NOT_SELF_REF constant and applied it to (reported|published) (by|in) and (study|...) (by|from) too. Same guard, same intent, no more drift between patterns. Regression test covers "study by our team" and "Reported by our CEO".

@greptileai review

Comment thread src/core/citability.ts Outdated
…rces?:

The previous form had \s+ outside the lookahead, which the regex engine
backtracks to 0 chars when the lookahead fails — letting 'Sources: our X'
pass as evidence. Move whitespace inside the lookahead so it's anchored
immediately after the keyword and cannot be skipped via backtracking.
Apply to the sources?: pattern too for parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in bbe6f90 — applied NOT_SELF_REF to sources?: and made the lookahead backtracking-safe (whitespace is now inside the lookahead, so the engine can't backtrack `\s+` to 0 chars to slip past). Regression test added.

@greptileai review

Comment thread src/core/citability.ts Outdated
'Revenue data cited by our internal research team' previously passed as
evidence. Split 'as cited' from 'cited by/in' since only the latter takes
a follower; apply the shared guard to the variable form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in 67e9185 — applied NOT_SELF_REF to cited (by|in) (split from as cited since only the former takes a follower). Two regression tests cover self-referential and external cases.

@greptileai review

Comment thread src/core/citability.ts Outdated
'As cited in our internal report' previously suppressed the hint because
the bare /as cited/i pattern matched independently. 'as cited' alone is
meaningless without 'in/by SOURCE' — the existing 'cited (by|in)' with
the self-referential guard already covers the real case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in c0dae24 — dropped bare as cited since it bypassed the guard via substring match. The cited (by|in) pattern with NOT_SELF_REF already covers the real case. Regression test added for "As cited in our internal report".

@greptileai review

…untry)

The /i flag made 'us' match 'US' (United States), mis-flagging legitimate
external citations like 'According to US regulators' or 'Data from US
studies' as self-referential. The bare 'us' pronoun ('data from us',
'according to us') is uncommon enough that the collision isn't worth it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenmarcus
Copy link
Copy Markdown
Member

Fixed in b80f3d8 — dropped us from NOT_SELF_REF since the /i flag made it also match US (United States), mis-flagging "According to US regulators" as self-referential. The bare us pronoun case is uncommon enough to be worth the trade. Regression test added.

@greptileai review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants