Skip to content

feat(sources): add Discord source with live API indexing + knowledge graph#728

Open
aaf2tbz wants to merge 3 commits into
mainfrom
alex/discord-source
Open

feat(sources): add Discord source with live API indexing + knowledge graph#728
aaf2tbz wants to merge 3 commits into
mainfrom
alex/discord-source

Conversation

@aaf2tbz
Copy link
Copy Markdown
Collaborator

@aaf2tbz aaf2tbz commented May 17, 2026

Summary

  • Adds a "discord" source kind that indexes Discord guilds, channels, and threads into Signet's recall system via live Discord REST API v10.
  • Follows the Obsidian source pattern: direct embedding into embeddings table + knowledge graph construction — not the connector/document pipeline pattern.
  • Bot token resolved via Signet Secrets, server/channel/thread hierarchy in knowledge graph, conversation chunking by time proximity.

Validation

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/discord-source-embeddings.test.ts platform/daemon/src/discord-source-graph.test.ts
  • bunx biome check platform/daemon/src/discord-source-fetch.ts platform/daemon/src/discord-source-embeddings.ts platform/daemon/src/discord-source-graph.ts platform/daemon/src/discord-source-bridge.ts platform/core/src/sources-config.ts surfaces/cli/src/commands/sources.ts surfaces/cli/src/features/sources.ts platform/daemon/src/routes/sources-routes.ts platform/daemon/src/daemon.ts
  • 16 tests pass (8 config, 4 embeddings, 3 graph, 1 pre-existing)
  • All 3 code quality bot findings resolved in 7ec5e51

Notes

  • No schema or migration changes — Discord source entries are stored in the existing sources.json config with a settings field.
  • No discord.js dependency — raw fetch() against Discord REST v10.
  • root field left empty for Discord sources (no filesystem root, like GitHub sources).
  • Existing discord-parser.ts left untouched — this is a live API path, not static DiscordChatExporter parsing.

PR Readiness (MANDATORY)

  • Spec alignment validated (INDEX.md + dependencies.yaml)
  • Agent scoping verified on all new/changed data queries
  • Input/config validation and bounds checks added
  • Error handling and fallback paths tested (no silent swallow)
  • Security checks applied to admin/mutation endpoints
  • Docs updated for API/spec/status changes
  • Regression tests added for each bug fix
  • Lint/typecheck/tests pass locally

Migration Notes (if applicable)

  • Migration is idempotent
  • Daemon Rust parity reviewed or explicitly N/A
  • Rollback / compatibility note included in PR description

Rollback / compatibility: no migration or persisted data change; rollback is removing the source config entry and reverting the commit.

…graph

Adds a 'discord' source kind that connects to Discord's REST API v10
using a bot token stored in Signet Secrets. Indexes opted-in guilds,
channels, and threads as source-labeled conversation context with
server/channel progression in the knowledge graph.

Architecture follows the Obsidian source pattern (direct embedding into
embeddings table + knowledge graph construction) and mirrors the GitHub
source PR #727 structure:

- discord-source-fetch.ts: Discord REST API client (guilds, channels,
  messages, threads) with rate limit handling via X-RateLimit-* headers.
  No external discord.js dependency — raw fetch() only.
- discord-source-embeddings.ts: Chunks conversations into embeddable
  segments grouped by reply chains and time proximity. Source type
  'source_discord_chunk'.
- discord-source-graph.ts: Knowledge graph hierarchy:
  source -> guild (community) -> channel -> thread/conversation.
  Dependencies for containment and participant cross-references.
- discord-source-bridge.ts: Sync orchestration, resolves bot token from
  Signet Secrets, walks channels/threads, indexes on daemon startup.

Config: DiscordSourceSettings stored in settings field on SignetSourceEntry.
CLI: signet sources add discord --guild-id ID --token-ref REF
API: POST /api/sources/discord, DELETE /api/sources/:sourceId (with purge)
Tests: 8 new tests (config validation, chunking, graph structure)
Comment thread platform/daemon/src/discord-source-bridge.ts Fixed
Comment thread platform/daemon/src/daemon.ts Fixed
Comment thread platform/daemon/src/discord-source-embeddings.test.ts Fixed
- Remove unused purgeDiscordSource import from daemon.ts
- Remove unused DISCORD_CHUNK_SOURCE_TYPE import from test
- Change let guildName to declaration without initializer in bridge
@aaf2tbz
Copy link
Copy Markdown
Collaborator Author

aaf2tbz commented May 17, 2026

All three CodeQL findings resolved in 7ec5e51:

  1. discord-source-bridge.ts:70 — Removed useless let guildName initializer, now declared without value before the fetchGuild assignment
  2. daemon.ts:51 — Removed unused purgeDiscordSource import
  3. discord-source-embeddings.test.ts:2 — Removed unused DISCORD_CHUNK_SOURCE_TYPE import

@aaf2tbz aaf2tbz marked this pull request as ready for review May 17, 2026 23:56
@PR-Reviewer-Ant
Copy link
Copy Markdown
Collaborator

Hi @aaf2tbz - I'm taking a look at the feature work in feat(sources): add Discord source with live API indexing + knowledge graph (commit e925ff83) and will follow up shortly.

This comment is updated in place by pr-reviewer.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: e925ff83

I found two correctness issues that undermine the claimed live Discord indexing behavior: active threads use a decommissioned API route, and the advertised since filter is never applied. There is also a data-quality issue where participant entities are keyed by mutable display names rather than stable Discord user IDs.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The active-thread URL is visible in discord-source-fetch.ts and conflicts with Discord API v10 docs, which list /guilds/{guild.id}/threads/active and note /channels/{channel.id}/threads/active was decommissioned. The unused since setting is directly visible because syncDiscordSource parses settings but calls fetchChannelMessages without passing any since-related bound.

}

export async function fetchActiveThreads(
config: DiscordFetchConfig,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This active-thread endpoint is wrong for Discord API v10. Discord decommissioned GET /channels/{channel.id}/threads/active in favor of GET /guilds/{guild.id}/threads/active (see the official Discord threads docs: https://docs.discord.com/developers/topics/threads). As written, active threads will fail to fetch, so the PR's claim that it indexes threads via live REST API is only partially true.


for (const channel of filteredChannels) {
const channelName = channel.name ?? channel.id;
try {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.since is never used during sync. The CLI exposes --since <date> as "Only index messages after this ISO date" and the source config stores since, but the bridge always calls fetchChannelMessages(config, channel.id, settings.maxMessagesPerChannel) with no lower bound. That makes the option a silent no-op and can unexpectedly index the full channel history.

} catch (err) {
logger.warn("discord-source", "Failed to sync thread", {
threadId: thread.id,
error: err instanceof Error ? err.message : String(err),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Participant identity is reduced to global_name ?? username before graph indexing, even though each message has a stable author.id. That will merge distinct Discord users who share a display name and split the same user if they rename themselves, corrupting the source graph over time. Please carry the user ID through and use display names only as labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants