feat(citability): add evidence signal hint#52
Conversation
|
@KimHyeongRae0 is attempting to deploy a commit to the Cytonic Team on Vercel. A member of the Team first needs to authorize it. |
6eaf84c to
04c544c
Compare
|
Queued for substantive review. Touching citability scoring deserves a careful read — I'll come back after the current review queue clears. Thanks for the patience @KimHyeongRae0. |
|
@greptileai review |
Greptile SummaryThis PR adds an attribution-hint feature to
Confidence Score: 5/5The change is additive and well-tested; the only gap found is that reflexive pronouns ("ourselves", "myself") are not caught by the self-referential guard, which can arise in unusual copy but does not affect typical marketing content. The core logic is correct and all common self-referential patterns are properly guarded. The reflexive-pronoun gap is real but confined to highly unusual phrasing and has no impact on the existing test suite or typical content. src/core/citability.ts — specifically the NOT_SELF_REF constant on line 216. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[scoreStatisticalDensity] --> B{totalMatches == 0?}
B -- Yes --> C[hint: No statistics found]
B -- No --> D{hasEvidenceSignals?}
D -- No --> E[hint: Add source links or attribution]
D -- Yes --> F[No hint added]
subgraph hasEvidenceSignals
G[URL pattern] --> M
H[Footnote ref] --> M
I[according to + NOT_SELF_REF] --> M
J[sources: + NOT_SELF_REF] --> M
K[reported/published by/in + NOT_SELF_REF] --> M
L[study/report/analysis by/from + NOT_SELF_REF] --> M
N[data from/by + NOT_SELF_REF] --> M
O[cited by/in + NOT_SELF_REF] --> M
M{any match?}
end
D -- calls --> hasEvidenceSignals
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
src/core/citability.ts:216
`NOT_SELF_REF` uses `our\b` and `my\b` (with a word-boundary assertion), but word boundaries only fire between a word character and a non-word character. In the reflexive pronouns "ourselves" and "myself", the characters immediately following "our"/"my" are still word characters ("s" in "selves"), so `\b` does not match and the negative lookahead never fires. Concretely, `"According to ourselves, we grew 300% in 2024"` or `"Data from ourselves confirms 250% growth"` will match every NOT_SELF_REF-guarded pattern and silently suppress the attribution hint even though no external source is present. Replacing `our\b` with `our(?:selves)?\b` (and `my\b` with `my(?:self)?\b`) closes the gap.
```suggestion
const NOT_SELF_REF = String.raw`(?!\s*(?:our(?:selves)?|my(?:self)?|the\s+(?:company|team|organization|internal))\b)`;
```
Reviews (10): Last reviewed commit: "fix(citability): drop 'us' from NOT_SELF..." | Re-trigger Greptile |
…al prose
Greptile flagged that the standalone words `report`, `study`, `research`,
and `based on` matched self-referential content like 'Based on our
internal data' or 'See the attached sales report', suppressing the
attribution hint even when no external source is cited.
Replaced the single permissive alternation with multiple narrower
patterns that each require an attribution preposition (by / from / of)
or are inherently external-source phrases:
- `\baccording to\b` — explicit attribution
- `\bsources?:` — explicit source label
- `\b(reported|published) (by|in)\b` — clearly external
- `\b(study|survey|report|research|paper|analysis) (by|from)\b` —
paired with attribution preposition
- `\bdata (from|by)\b` — paired
- `\b(citing|cited (by|in)|as cited)\b` — explicit citation
Two new tests cover the regression: self-referential phrases still
trigger the attribution suggestion, and external attribution phrases
("study by Stanford", "report from McKinsey") correctly silence it.
16 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Tightened the evidence-signal patterns so self-referential phrasing ('based on our data', 'our quarterly report', 'user behavior study') no longer silences the attribution hint. Each broad keyword now requires an attribution preposition (by/from/of) or is paired with an inherently-external phrase. Two new tests cover both directions. 16 tests pass. @greptileai re-review please. |
Bare 'citing' matches self-referential prose like 'Citing our own data...' and silently suppresses the attribution hint — the inverse of intent. 'cited by/in' and 'as cited' remain, and external attribution like 'study by'/'report from' is already covered by other patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in f523667 — removed bare @greptileai review |
'According to our CEO' or 'According to our internal data' previously suppressed the attribution hint despite citing no external source. Add a negative lookahead matching the same scoping intent as the other patterns in hasEvidenceSignals (study by, report from, data from, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in 605a507 — added a negative lookahead to @greptileai review |
'Data from our research team' previously suppressed the attribution hint because the pattern was unguarded. Mirror the same negative lookahead used by 'according to' so the only path that fires is real external attribution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in a41692d — mirrored the same negative lookahead onto @greptileai review |
…orted/published 'Study by our team' and 'Reported by our CEO' previously slipped through. Extract the negative-lookahead fragment to a shared NOT_SELF_REF constant and apply it consistently to every attribution pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in 611b0e6 — extracted the negative-lookahead to a shared @greptileai review |
…rces?: The previous form had \s+ outside the lookahead, which the regex engine backtracks to 0 chars when the lookahead fails — letting 'Sources: our X' pass as evidence. Move whitespace inside the lookahead so it's anchored immediately after the keyword and cannot be skipped via backtracking. Apply to the sources?: pattern too for parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in bbe6f90 — applied NOT_SELF_REF to @greptileai review |
'Revenue data cited by our internal research team' previously passed as evidence. Split 'as cited' from 'cited by/in' since only the latter takes a follower; apply the shared guard to the variable form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in 67e9185 — applied NOT_SELF_REF to @greptileai review |
'As cited in our internal report' previously suppressed the hint because the bare /as cited/i pattern matched independently. 'as cited' alone is meaningless without 'in/by SOURCE' — the existing 'cited (by|in)' with the self-referential guard already covers the real case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in c0dae24 — dropped bare @greptileai review |
…untry)
The /i flag made 'us' match 'US' (United States), mis-flagging legitimate
external citations like 'According to US regulators' or 'Data from US
studies' as self-referential. The bare 'us' pronoun ('data from us',
'according to us') is uncommon enough that the collision isn't worth it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Fixed in b80f3d8 — dropped @greptileai review |
Adds an attribution hint for statistical claims without evidence signals. Validation:
npm run lint,npm run test -- --run src/core/citability.test.ts,npm run test -- --run,npm run build.