feat(sources): add Discord source with live API indexing + knowledge graph#728
feat(sources): add Discord source with live API indexing + knowledge graph#728aaf2tbz wants to merge 3 commits into
Conversation
…graph Adds a 'discord' source kind that connects to Discord's REST API v10 using a bot token stored in Signet Secrets. Indexes opted-in guilds, channels, and threads as source-labeled conversation context with server/channel progression in the knowledge graph. Architecture follows the Obsidian source pattern (direct embedding into embeddings table + knowledge graph construction) and mirrors the GitHub source PR #727 structure: - discord-source-fetch.ts: Discord REST API client (guilds, channels, messages, threads) with rate limit handling via X-RateLimit-* headers. No external discord.js dependency — raw fetch() only. - discord-source-embeddings.ts: Chunks conversations into embeddable segments grouped by reply chains and time proximity. Source type 'source_discord_chunk'. - discord-source-graph.ts: Knowledge graph hierarchy: source -> guild (community) -> channel -> thread/conversation. Dependencies for containment and participant cross-references. - discord-source-bridge.ts: Sync orchestration, resolves bot token from Signet Secrets, walks channels/threads, indexes on daemon startup. Config: DiscordSourceSettings stored in settings field on SignetSourceEntry. CLI: signet sources add discord --guild-id ID --token-ref REF API: POST /api/sources/discord, DELETE /api/sources/:sourceId (with purge) Tests: 8 new tests (config validation, chunking, graph structure)
- Remove unused purgeDiscordSource import from daemon.ts - Remove unused DISCORD_CHUNK_SOURCE_TYPE import from test - Change let guildName to declaration without initializer in bridge
|
All three CodeQL findings resolved in
|
|
Hi @aaf2tbz - I'm taking a look at the feature work in This comment is updated in place by pr-reviewer. |
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
e925ff83
I found two correctness issues that undermine the claimed live Discord indexing behavior: active threads use a decommissioned API route, and the advertised since filter is never applied. There is also a data-quality issue where participant entities are keyed by mutable display names rather than stable Discord user IDs.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The active-thread URL is visible in discord-source-fetch.ts and conflicts with Discord API v10 docs, which list /guilds/{guild.id}/threads/active and note /channels/{channel.id}/threads/active was decommissioned. The unused since setting is directly visible because syncDiscordSource parses settings but calls fetchChannelMessages without passing any since-related bound.
| } | ||
|
|
||
| export async function fetchActiveThreads( | ||
| config: DiscordFetchConfig, |
There was a problem hiding this comment.
This active-thread endpoint is wrong for Discord API v10. Discord decommissioned GET /channels/{channel.id}/threads/active in favor of GET /guilds/{guild.id}/threads/active (see the official Discord threads docs: https://docs.discord.com/developers/topics/threads). As written, active threads will fail to fetch, so the PR's claim that it indexes threads via live REST API is only partially true.
|
|
||
| for (const channel of filteredChannels) { | ||
| const channelName = channel.name ?? channel.id; | ||
| try { |
There was a problem hiding this comment.
settings.since is never used during sync. The CLI exposes --since <date> as "Only index messages after this ISO date" and the source config stores since, but the bridge always calls fetchChannelMessages(config, channel.id, settings.maxMessagesPerChannel) with no lower bound. That makes the option a silent no-op and can unexpectedly index the full channel history.
| } catch (err) { | ||
| logger.warn("discord-source", "Failed to sync thread", { | ||
| threadId: thread.id, | ||
| error: err instanceof Error ? err.message : String(err), |
There was a problem hiding this comment.
Participant identity is reduced to global_name ?? username before graph indexing, even though each message has a stable author.id. That will merge distinct Discord users who share a display name and split the same user if they rename themselves, corrupting the source graph over time. Please carry the user ID through and use display names only as labels.
Summary
"discord"source kind that indexes Discord guilds, channels, and threads into Signet's recall system via live Discord REST API v10.embeddingstable + knowledge graph construction — not the connector/document pipeline pattern.Validation
bun test platform/core/src/sources-config.test.ts platform/daemon/src/discord-source-embeddings.test.ts platform/daemon/src/discord-source-graph.test.tsbunx biome check platform/daemon/src/discord-source-fetch.ts platform/daemon/src/discord-source-embeddings.ts platform/daemon/src/discord-source-graph.ts platform/daemon/src/discord-source-bridge.ts platform/core/src/sources-config.ts surfaces/cli/src/commands/sources.ts surfaces/cli/src/features/sources.ts platform/daemon/src/routes/sources-routes.ts platform/daemon/src/daemon.ts7ec5e51Notes
sources.jsonconfig with asettingsfield.discord.jsdependency — rawfetch()against Discord REST v10.rootfield left empty for Discord sources (no filesystem root, like GitHub sources).discord-parser.tsleft untouched — this is a live API path, not static DiscordChatExporter parsing.PR Readiness (MANDATORY)
INDEX.md+dependencies.yaml)Migration Notes (if applicable)
Rollback / compatibility: no migration or persisted data change; rollback is removing the source config entry and reverting the commit.