Skip to content

Add related episode embeddings feature #2019

@wesbos

Description

@wesbos

Product Requirements Document (PRD)

Title:

Related Episodes, AI-Generated Tags & 2D Visualization for Syntax.fm


Overview:

We want to enhance the discoverability and exploration of episodes on syntax.fm by implementing:

  1. AI-powered related episode recommendations.
  2. Auto-generated episode tags.
  3. A tag-driven episode directory.
  4. A 2D graph visualization of episode clusters.

Goals:

  • Improve episode discoverability.
  • Increase user engagement and time on site.
  • Provide visual insight into content clustering over time.

Features & Requirements:

1. Related Episodes

  • Description: Recommend similar episodes based on content.

  • Implementation:

    • Use the OpenAI Embeddings API (e.g., text-embedding-3-small) on each episode’s title + show notes.
    • Store the resulting vector in the database alongside episode metadata.
    • During request time or via batch jobs, calculate the cosine similarity between the current episode and all others.
    • Return top 3-5 similar episodes.
  • Frontend:

    • New Svelte component <RelatedEpisodes> placed below the show notes.
    • Show thumbnail, title, and link to each related episode.
  • Backend:

    • Add new column embedding_vector to episodes table.
    • Add endpoint: GET /api/episodes/:id/related

2. AI-Generated Tags

  • Description: Create a set of tags for each episode using an AI summarization/tagging model.

  • Implementation:

    • Use OpenAI GPT-4 or similar to analyze episode show notes + title.
    • Return 3–10 concise tags per episode.
    • Run via a weekly cron job to process new episodes or backfill.
    • Store tags in DB in a tags field (array of strings).
  • Infrastructure:

    • Add cron job (e.g., GitHub Actions or Vercel Scheduled Functions).
    • Optional admin override via CMS interface.

3. Tags Page

  • Description: Allow users to explore episodes by tag.

  • Implementation:

    • Create /tags route in SvelteKit.
    • Display all tags as clickable pills or a tag cloud.
    • Clicking a tag leads to /tags/[tag] showing all episodes with that tag.
    • Sort by most recent or relevance.

4. 2D Graph Visualization

  • Description: Visualize all episodes on a 2D plane to reflect tag or content similarity.

  • Implementation:

    • Use dimensionality reduction on embeddings (e.g., t-SNE, UMAP).
    • Preprocess offline and store x,y coordinates per episode.
    • Update weekly alongside tag generation.
    • Use canvas/SVG/WebGL to render the graph (e.g., D3, Plotly, or custom SvelteCanvas).
    • Hover/Click reveals episode info or links.
  • Frontend:

    • New /visualize route.
    • Responsive layout with zoom/pan support.
  • Backend:

    • Store x and y floats in the episodes table or a new episode_graph table.

Technical Considerations:

  • Data Storage:

    • Add embedding_vector, tags, and optional coordinates fields to episodes.
  • OpenAI API:

    • Use secure server-side calls only.
    • Rate limit embedding and tag generation to avoid exceeding token quota.
  • Performance:

    • Cache related episodes for each episode ID.
    • Use IndexedDB or localStorage if the 2D graph grows too large.
  • Design Consistency:

    • New components should match current design system.
    • Mobile support for graph is desirable but optional for v1.

Milestones:

Milestone Task Owner ETA
M1 Setup embedding pipeline & DB fields Dev TBD
M2 Related episodes component Dev/Design TBD
M3 AI tagging cron job Dev TBD
M4 Tag browser page Dev TBD
M5 2D graph pipeline + frontend Dev TBD

Would you like this exported to a text doc or Notion-ready format?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions