diff --git a/src/ai-tools/README.md b/src/ai-tools/README.md index b527174726c5..4c9d7ef7225f 100644 --- a/src/ai-tools/README.md +++ b/src/ai-tools/README.md @@ -1,56 +1,183 @@ # AI-powered tools -A CLI tool for using AI to edit documentation according to defined prompts. +The ai-tools subject provides CLI tools for using AI to edit and refine documentation based on prompt-driven guidelines. It integrates with GitHub Models API to apply automated improvements to content files. -This tool refines content files using AI based on an (extensible) set of prompt-driven guidelines. The default is versioning refinement. In the future we might add: scannability, readability, style, technical accuracy. +## Purpose & Scope -This script calls the [Models API](https://docs.github.com/en/rest/models/inference?apiVersion=2022-11-28#run-an-inference-request). It requires a personal access token with Models scopes in your `.env` file. +This subject is responsible for: +- AI-powered content refinement (versioning, intro, etc.) +- Prompt-driven content editing with LLMs +- Integration with GitHub Models API +- Copilot Spaces export and conversion to prompts +- Automated content quality improvements +- Extensible prompt system for different refinement types -## Usage +Current refinements: versioning, intro. Future: scannability, readability, style, technical accuracy. -```sh -<<<<<<< HEAD:src/ai-editors/README.md -tsx src/ai-editors/scripts/ai-edit.ts --editor --response --files -||||||| 5ae4ec0f5cb:src/ai-editors/README.md -tsx src/ai-editors/scripts/ai-edit.js --editor --response --files -======= +## Architecture & Key Assets + +### Key capabilities and their locations + +- `scripts/ai-tools.ts` - Main CLI tool for running AI refinements +- `lib/call-models-api.ts` - Client for GitHub Models API inference +- `lib/prompt-utils.ts` - Loads prompts and executes refinements +- `prompts/*.md` - Prompt templates for different refinement types + +## Setup & Usage + +### Requirements + +Add GitHub token with Models scopes to `.env`: + +```bash +GITHUB_TOKEN=ghp_your_token_here +``` + +### Running refinements + +```bash # Direct command -tsx src/ai-tools/scripts/ai-tools.ts --refine --files +tsx src/ai-tools/scripts/ai-tools.ts --refine versioning --files content/path/to/file.md -# Or via npm script -npm run ai-tools -- --refine --files ->>>>>>> origin/main:src/ai-tools/README.md +# Via npm script +npm run ai-tools -- --refine versioning --files content/path/to/file.md ``` -* `--files, -f`: One or more content file paths to process (required). -* `--refine, -r`: Specify one or more refinement types (default: `versioning`). +### Options -**Examples:** +- `--files, -f`: One or more content file paths (required) +- `--refine, -r`: Refinement type(s) - `versioning`, `intro` (default: `versioning`) +- `--write, -w`: Write changes to files (default: false, shows diff only) +- `--verbose, -v`: Verbose output for debugging +- `--space, -s`: Use Copilot Space as prompt +- `--exportSpace`: Export Copilot Space to prompt file -```sh -<<<<<<< HEAD:src/ai-editors/README.md -tsx src/ai-editors/scripts/ai-edit.ts --files content/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo.md --editor versioning --response list -||||||| 5ae4ec0f5cb:src/ai-editors/README.md -tsx src/ai-editors/scripts/ai-edit.js --files content/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo.md --editor versioning --response list -======= -# Direct command -tsx src/ai-tools/scripts/ai-tools.ts --files content/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo.md --refine versioning +### Examples -# Via npm script -npm run ai-tools -- --files content/copilot/tutorials/coding-agent/get-the-best-results.md --refine intro ->>>>>>> origin/main:src/ai-tools/README.md +Refine versioning in a file: +```bash +npm run ai-tools -- --files content/copilot/tutorials/coding-agent/get-the-best-results.md --refine versioning ``` -## Requirements +Refine intro: +```bash +npm run ai-tools -- --files content/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo.md --refine intro +``` + +Multiple files: +```bash +npm run ai-tools -- --files file1.md file2.md file3.md --refine versioning +``` + +Write changes (not just preview): +```bash +npm run ai-tools -- --files content/path/to/file.md --refine versioning --write +``` + +## Data & External Dependencies + +### Data inputs +- Content markdown files with frontmatter +- Prompt templates in `prompts/` directory +- GitHub Models API for inference + +### Dependencies +- GitHub Models API - Requires `GITHUB_TOKEN` with Models scopes +- `commander` - CLI argument parsing +- `dotenv` - Environment variable loading +- Copilot Spaces (optional) - Can export/import prompts + +### Data outputs +- Refined markdown content (preview or written to files) +- Diffs showing proposed changes +- Merged frontmatter updates + +## Cross-links & Ownership + +### Related subjects +- [`src/content-render`](../content-render/README.md) - Content parsing and rendering +- Content files in `content/` - Target of refinements + +### Internal documentation +- [GitHub Models API docs](https://docs.github.com/en/rest/models/inference) +- Copilot Spaces for prompt management + +### Ownership +- Team: Docs Content (for use and development) +- Note: These tools are for the docs-content team. We welcome them to use Copilot to support and develop these tools, but docs-engineering is largely hands off. + +## Current State & Next Steps + +### Available refinement types + +Current prompts: +- `versioning` - Refines version-related content +- `intro` - Improves article introductions + +Each prompt defines: +- Instructions for the LLM +- Expected output format +- Quality criteria + +### Adding new refinements + +1. Create prompt file in `prompts/` (e.g., `readability.md`). +2. Write prompt instructions and examples. +3. Test with Models UI first. +4. Use `--refine readability` to apply. + +Prompt template in `prompts/prompt-template.yml`. + +### Copilot Spaces integration + +Export Space to prompt: +```bash +npm run ai-tools -- --exportSpace space-id --output prompts/my-prompt.md +``` + +Use Space as prompt: +```bash +npm run ai-tools -- --space space-id --files content/path/to/file.md +``` + +### Known limitations +- Requires GitHub token with Models scopes +- API rate limits apply +- Quality depends on prompt engineering +- Currently manual execution (not in CI) +- No automated testing/evals yet + +### Best practices + +**Prompt engineering:** +- Test prompts in GitHub Models UI first +- Include clear examples in prompts +- Define expected output format +- Iterate on prompts based on results + +**File selection:** +- Start with single files to test +- Use glob patterns for batch processing +- Preview changes before writing + +**Quality assurance:** +- Always review AI suggestions +- Don't blindly accept all changes +- Consider subject matter expertise needed +- Test refined content for correctness + +### Troubleshooting -* A valid `GITHUB_TOKEN` with Models scopes in your local `.env` file. +**Missing token error:** +Add `GITHUB_TOKEN` to `.env` with Models scopes. -## Future development ideas +**API errors:** +- Check token permissions +- Verify rate limits +- Check Models API status -* Add prompts to support all available editors. -* Test prompts in Models UI and add evals to prevent regressions. -* Enable running in CI. -* Explore the new `llm` plugin for GitHub Models (see https://github.com/github/copilot-productivity/discussions/5937). -* Add MCP for more comprehensive context. -* Integrate with Copilot Edit mode in VS Code. -* Add unit tests. +**Poor refinement quality:** +- Refine the prompt template +- Add more examples +- Test in Models UI first +- Consider different model/parameters diff --git a/src/app/README.md b/src/app/README.md new file mode 100644 index 000000000000..02da61ada4ef --- /dev/null +++ b/src/app/README.md @@ -0,0 +1,71 @@ +# App Router (`src/app`) + +This directory contains the [Next.js App Router](https://nextjs.org/docs/app) implementation for GitHub Docs. It currently serves as the application shell, handling the root layout, global providers, and 404 error pages, while coexisting with the Pages Router implementation. + +## Purpose & Scope + +The `src/app` directory is the entry point for the Next.js App Router. Its primary responsibilities are: +- Defining the root HTML structure and metadata. +- Initializing global client-side context providers (Theme, Locale, Languages). +- Handling global "Not Found" (404) scenarios. +- Providing a bridge between the modern App Router architecture and the `MainContext` used by existing components. + +We began this migration because we anticipate `@primer/react` will eventually drop support for the Pages Router. If that happens, we will be unable to upgrade `@primer/react`, effectively blocking us from receiving any future design system updates. Moving to the App Router prevents this and aligns us with the broader Next.js ecosystem. + +## Architecture & Key Assets + +### Directory Structure + +- `layout.tsx`: The server-side Root Layout. It sets up the `` and `` tags, loads global styles, and defines metadata/viewport settings. +- `client-layout.tsx`: A client component (`'use client'`) that wraps the application in necessary React Context providers. This allows server components to compose client-side logic for theming and state management. +- `not-found.tsx`: The UI for 404 errors within the App Router. +- `lib/`: Utilities for context adaptation and routing logic. + - `app-router-context.ts`: Generates context data based on the current request path. + - `main-context-adapter.ts`: Adapts App Router data structures to match the `MainContext` shape, ensuring backward compatibility for components. +- `components/`: Client-side components specific to the App Router shell (e.g., wrappers for 404 pages, context providers). + +### Key Concepts + +- **Context Adaptation**: Since much of the codebase relies on a monolithic `MainContext`, this directory implements adapters to construct a compatible context object from App Router primitives. This allows us to reuse existing components without rewriting them immediately. +- **Hybrid Routing**: The application currently operates in a hybrid mode. While `src/app` defines the outer shell, many specific routes and page rendering logic may still reside in the Pages Router (`src/pages`) or are being incrementally migrated. + +## Setup & Usage + +### Development + +Standard Next.js App Router conventions apply. To add a new route using the App Router, create a folder with a `page.tsx` file inside `src/app`. + +Useful documentation: +- [Next.js App Router Documentation](https://nextjs.org/docs/app) +- [Migrating from Pages to App Router](https://nextjs.org/docs/app/building-your-application/upgrading/app-router-migration) + +### Testing + +Tests for App Router logic should be placed alongside the components if applicable. + +Tests that verify Next.js behavior (like 404 handling) can be found in `src/frame/tests/next.ts`. + +## Data & External Dependencies + +- **Data Sources**: + - Consumes UI strings and language data from `src/data-directory` (via `getUIDataMerged`). + - Uses `src/languages` for locale definitions. +- **External Libraries**: + - `@primer/react`: Used for the design system and theming provider. + - `next`: The core framework. + +## Cross-links & Ownership + +- **Owner**: Docs Engineering +- **Related Directories**: + - `src/pages`: The Pages Router implementation. + - `src/frame`: Server and middleware logic that interacts with routing. + - `src/data-directory`: Source of static data used in layouts. + +## Current State & Next Steps + +- **Current State**: The App Router handles the root layout and 404s. It provides a compatibility layer for existing contexts. +- **Next Steps**: + - Migrate individual page routes from `src/pages` to `src/app`. + - Refactor components to reduce dependency on the monolithic `MainContext`. + - Improve data fetching patterns to use React Server Components (RSC) more effectively. \ No newline at end of file diff --git a/src/deployments/README.md b/src/deployments/README.md index 9efbd1f1b0df..f5129374476e 100644 --- a/src/deployments/README.md +++ b/src/deployments/README.md @@ -1,6 +1,122 @@ # Deployments -Documentation and build files for our deployments. +The deployments subject contains documentation and build scripts for deploying docs.github.com to production and staging environments. This includes Docker build configuration, repository fetching scripts, and deployment procedures. + +## Purpose & Scope + +This subject is responsible for: +- Production deployment configuration and automation +- Staging server deployment processes +- Docker image build scripts for fetching repos (early-access, translations) +- Deployment documentation and procedures +- Integration with Moda (GitHub's internal deployment platform) + +## Architecture & Key Assets + +### Key capabilities and their locations + +- `production/build-scripts/fetch-repos.sh` - Main script that orchestrates fetching docs-early-access and all translation repos during Docker build +- `production/build-scripts/clone-or-use-cached-repo.sh` - Utility function to clone repos or use cached versions +- `production/build-scripts/merge-early-access.sh` - Merges early-access content into production content +- `Dockerfile` (root) - Multi-stage Docker build for production deployments +- `.github/workflows/moda-ci.yaml` - CI/CD workflow that builds and deploys via Moda + +## Setup & Usage + +### Production deployments + +Production deploys happen automatically: +- Pushing to `main` branch triggers automatic deployment to production +- Status updates posted in `#docs-ops` Slack channel +- Deployment uses Moda (GitHub's internal deployment platform) + +### Building production Docker image locally + +```bash +docker build -t docs:latest . --secret id=DOCS_BOT_PAT_BASE,src=<(echo "") +``` + +Requirements for PAT: +- Must have `contents: read` access to: + 1. All `docs-internal.` translation repos + 2. `docs-early-access` repo + +Run the built image locally. + +### Staging deployments + +To deploy to a staging server: +1. Push your branch to `docs-internal` +2. Open a draft PR +3. Wait for `docs-internal Moda CI` checks to pass +4. In `#docs-ops` Slack, run: `.deploy docs-internal/ to staging-` +5. Access at `https://docs-staging-.service.iad.github.net` (requires Developer VPN) + +### Running tests + +No subject-specific tests exist currently. CI/CD validation happens through Moda workflows. + +## Data & External Dependencies + +### Data inputs +- Dockerfile and build scripts +- GitHub PAT for private repo access (`DOCS_BOT_PAT_BASE`) +- Early access repo (`docs-early-access`) + +### Dependencies +- Docker and Docker BuildKit +- Moda platform (GitHub internal) +- GitHub Vault for secrets management +- Node.js (installed during Docker build) +- Git (for cloning repos during build) + +### Build process +1. Clone docs-internal content, assets, data +2. Fetch and merge docs-early-access +3. Fetch all translation repos in parallel +4. Install production dependencies +5. Build Next.js application +6. Create production Docker image + +### Data outputs +- Docker image with full site content (public + early-access + translations) +- Deployed application on Moda infrastructure +- Deployment notifications in Slack + +## Cross-links & Ownership + +### Related subjects +- [`src/early-access`](../early-access/README.md) - Early access content merged during build +- [`src/languages`](../languages/README.md) - Translation repos fetched during build +- Root `Dockerfile` - Docker build configuration +- `.github/workflows/moda-ci.yaml` - CI/CD workflow + +### Internal documentation +For detailed internal documentation, see: +- `Moda` directory in internal Docs Engineering repo +- Production deploy procedures (internal docs) + +### Ownership +- Team: Docs Engineering +- Slack: #build, #deploy-support (for deployment status and commands) + +## Current State & Next Steps + +### Known limitations +- Build scripts are shell scripts, not TypeScript (different from rest of codebase) +- Requires VPN access for staging server access +- Local Docker builds require manual PAT management + +### Deployment monitoring +- Deployment status posted to `#docs-ops` Slack channel +- Moda provides deployment dashboards and logs (internal) +- Rollback procedures documented in internal Docs Engineering repo + +### Required secrets +- `DOCS_BOT_PAT_BASE` - GitHub PAT with access to private repos +- Managed via GitHub Vault +- Passed securely to Docker build via `--secret` flag + +### Rollback procedures +Rollback procedures are documented in the internal Docs Engineering repository. Contact @docs-engineering or #deploy-support assistance. -- For production deploys: [src/deployments/production](./production/) -- For staging deploys: [src/deployments/staging](./staging/) diff --git a/src/dev-toc/README.md b/src/dev-toc/README.md index 5820ec371030..0a5a0d60fb34 100644 --- a/src/dev-toc/README.md +++ b/src/dev-toc/README.md @@ -1,33 +1,143 @@ # Developer table of contents -This directory generates a full table of contents for the docs.github.com site. +The dev-toc subject generates a static, browsable table of contents (TOC) for local development. It creates HTML files showing the complete documentation site structure across all versions, making it easy to navigate and verify the site hierarchy during development. + +## Purpose & Scope + +This subject is responsible for: +- Generating static HTML TOC files for all versions +- Rendering page titles with Liquid templating +- Creating expandable/collapsible navigation tree +- Supporting auto-expansion of specific product sections +- Opening generated TOC in browser automatically +- Developer tool for site navigation and verification -The table of contents is generated locally within the `static` subdirectory as a series of `index.html` files, within version subdirectories such as `free-pro-team@latest` and `enterprise-cloud@latest` etc. +## Architecture & Key Assets + +### Key capabilities and their locations + +- `generate.ts` - Main script that builds TOC HTML files for all versions +- `layout.html` - HTML template with expandable/collapsible tree structure +- `static/` - Output directory containing generated `index.html` files per version (not committed) -## Generating the table of contents +## Setup & Usage -To generate the table of contents, run the following command from the Terminal: +### Generating the TOC + +Generate the complete table of contents: ```bash npm run dev-toc ``` -After generating the files, the ToC should open in your default browser. If it doesn't, open your browser and navigate to `file:///PATH/TO/docs-internal/src/dev-toc/static/free-pro-team@latest/index.html`. +This will: +1. Generate HTML files in `src/dev-toc/static/` for each version +2. Render all page titles (including Liquid templates) +3. Open the TOC in your default browser at the FPT version + +If it doesn't open automatically, navigate to the generated file in your browser. -## Generating the ToC with one or more sections auto-expanded +### Auto-expanding specific sections -Alternatively, you can generate the table of contents with a specific top-level section of the docs auto-expanded by running the following command: +Generate TOC with specific product sections pre-expanded: ```bash -tsx src/dev-toc/generate.ts -o PRODUCT-ID [PRODUCT-ID PRODUCT-ID ...] +tsx src/dev-toc/generate.ts -o actions ``` -where `PRODUCT-ID` is the first part of the URL for the top-level section of the docs. For example, the `actions` section of the docs has the URL `https://docs.github.com/en/actions`, so the `PRODUCT-ID` is `actions`. So the command would be: - +Multiple sections: ```bash -tsx src/dev-toc/generate.ts -o actions +tsx src/dev-toc/generate.ts -o actions copilot pull-requests ``` -Note: if you generate the table more than once, with a different product ID flag you will need to refresh the page to see the changes. +Where product IDs match the URL structure (e.g., `actions` for `https://docs.github.com/en/actions`). + +Note: If regenerating with different flags, refresh your browser to see changes. + +### What gets generated + +For each version, the script: +1. Loads the site tree for that version +2. Recursively renders all page titles (handles Liquid templating) +3. Builds an HTML tree with expandable/collapsible sections +4. Writes to `static/{version}/index.html` + +## Data & External Dependencies + +### Data inputs +- Site tree from content files (all pages and their hierarchy) +- Page frontmatter (titles, which may include Liquid) +- Version information from `@/versions/lib/all-versions` + +### Dependencies +- `@/frame/middleware/context/context` - Context initialization +- `@/content-render` - Liquid template rendering +- `@/versions` - Version enumeration +- HTML template in `layout.html` +- Browser (for opening generated TOC) + +### Data outputs +- Static HTML files in `static/` directory +- One `index.html` per version +- Expandable tree navigation with page links +- Links that can open in VS Code or browser + +## Cross-links & Ownership + +### Related subjects +- [`src/frame`](../frame/README.md) - Site tree structure and context +- [`src/content-render`](../content-render/README.md) - Liquid rendering for titles +- [`src/versions`](../versions/README.md) - Version enumeration +- Content files - Source of page hierarchy + +### Ownership +- Team: Docs Content (with engineering support and reviews as needed) + +## Current State & Next Steps + +### Use cases + +The dev-toc is useful for: +- **Navigation** - Quickly browse entire site structure during development +- **Verification** - Check that pages appear in correct hierarchy +- **Title debugging** - See rendered titles (including Liquid output) +- **Structure review** - Review content organization across versions +- **Link checking** - Verify navigation structure makes sense + +### Known limitations +- Generated files are static (don't auto-update when content changes) +- Must regenerate when content structure changes +- Only shows structure, not content +- Requires re-running script to see updates +- Links open file paths, not the dev server + +### Output directory +The `static/` directory is gitignored because it contains generated files that vary by local environment. + +### Development workflow + +Common pattern: +1. Make content structure changes +2. Run `npm run dev-toc` +3. Review structure in generated TOC +4. Verify pages appear in correct locations +5. Check that parent/child relationships are correct + +### Troubleshooting + +**TOC doesn't open:** +Manually navigate to `src/dev-toc/static/free-pro-team@latest/index.html` in your file browser. + +**Missing pages:** +- Ensure content files have proper frontmatter +- Check that pages are included in parent's `children` array +- Regenerate TOC after content changes + +**Liquid rendering errors:** +Check console output when running `generate.ts` for template errors. + +**Wrong sections expanded:** +Regenerate with correct `-o` flags, then refresh browser (hard refresh may be needed). + diff --git a/src/events/README.md b/src/events/README.md index 5f973804633c..1a8681c5ce7f 100644 --- a/src/events/README.md +++ b/src/events/README.md @@ -1,21 +1,147 @@ # Events -We record events from the browser into our data pipeline to aggregate anonymous data about how folks are using the Docs. +The events subject handles client-side analytics by recording user interactions from the browser and sending them to GitHub's data pipeline. Events track anonymous usage data to help understand how users interact with docs.github.com and identify areas for improvement. -## Why events +## Purpose & Scope -Data helps us to understand where our Docs are successful, and where we need to improve. +This subject is responsible for: +- Recording browser events (page views, clicks, searches, surveys, etc.) +- Validating event data against JSON schemas +- Sending events to Hydro (GitHub's data warehouse) +- Analyzing survey comments with sentiment analysis +- Providing React components for event tracking +- Server-side event endpoint (`POST /events`) -## How to view events +## Architecture & Key Assets -1. We send a `POST /events` request from the browser. -2. Any data sent we check against our JSON schema. -3. After passing the schema check, we send the data along the path to the warehouse. +### Key capabilities and their locations -## How to work on event +- `middleware.ts` - Express router handling `POST /events` endpoint, validates and publishes events +- `lib/schema.ts` - JSON Schema definitions for all event types using AJV validation +- `components/events.ts` - Client-side utilities for sending events from the browser -When adding or changing properties, make sure to update the schema in both the JS file as well as the schema for the warehouse. +## Setup & Usage -## How to get help for events +### Event flow + +1. Browser sends `POST /events` request with event data +2. Middleware validates against JSON schema +3. If valid, event is sent to Hydro data warehouse +4. If invalid, validation error is logged (not sent to warehouse) + +### Event types + +Supported event types (see `EventType` enum): +- `page` - Page view +- `exit` - User leaving page +- `link` - Link click +- `search` - Search query +- `survey` - Survey response + +### Sending events from the browser + +```typescript +import { sendEvent } from '@/events/components/events' + +sendEvent({ + type: 'link', + link_url: 'https://example.com', +}) +``` + +### Event schema structure + +All events require a `context` object with: +- `event_id` (UUID) +- `user` (UUID) - Anonymous user identifier +- `version` - Schema version +- `created` - Timestamp +- `path` - Current page path +- Browser metadata (user agent, viewport size, etc.) + +Each event type has additional required/optional fields defined in `lib/schema.ts`. + +### Local testing + +Test event validation locally: +```bash +npm run test -- src/events/tests +``` + +Test comment analysis: +```bash +tsx src/events/scripts/analyze-comment-cli.ts "This is a great article!" +``` + +## Data & External Dependencies + +### Data inputs +- Browser events from client-side JavaScript +- Survey responses and comments +- User context (language, version, product, path) +- Browser metadata (user agent, viewport, etc.) + +### Dependencies +- Hydro API - GitHub's data warehouse +- AJV - JSON schema validation +- AI comment analysis service (internal) +- `@/versions`, `@/products`, `@/languages` - For enum validation + +### Schema validation + +Schemas enforce: +- Required fields for each event type +- Enum values (languages, versions, products, tools) +- Format validation (UUID, date-time, URI) +- Additional properties not allowed + +### Data outputs +- Events sent to Hydro data warehouse +- Validation errors logged to Failbot (production) +- Survey sentiment analysis results + +## Cross-links & Ownership + +### Related subjects +- [`src/observability`](../observability/README.md) - Error logging and monitoring +- [`src/versions`](../versions/README.md) - Version enum validation +- [`src/products`](../products/README.md) - Product enum validation +- [`src/languages`](../languages/README.md) - Language enum validation +- [`src/tools`](../tools/README.md) - Tool enum validation + +### Internal documentation +For detailed internal documentation about the data pipeline and Hydro, see the internal Docs Engineering repository. + +### Ownership +- Team: Docs Engineering (code and analytics), Data Engineering (data pipeline) + +## Current State & Next Steps + +### Known limitations +- Survey comment sentiment analysis requires network call (adds latency) +- Event validation errors are deduplicated with LRU cache to prevent spam +- In production, events are fire-and-forget (don't wait for response) +- Validation errors sent to Hydro to track schema mismatches + +### Adding a new event type + +1. Add event type to `EventType` enum in `types.ts` +2. Add type-specific properties to `EventPropsByType` in `types.ts` +3. Add schema definition to `lib/schema.ts` +4. Update warehouse schema (internal process) +5. Add client-side tracking code in components as needed +6. Test validation with unit tests + +### Survey comment analysis + +Survey responses with comments are analyzed for sentiment: +- Positive/negative/neutral rating assigned +- Language detection for comment text +- Results stored in `survey_rating` and `survey_comment_language` fields + +### Monitoring and debugging + +- Validation errors appear in server logs +- Production validation errors sent to Hydro for tracking +- Use `analyze-comment-cli.ts` to test sentiment analysis locally -For hubbers, see the internal docs in the internal engineering repository. diff --git a/src/frame/README.md b/src/frame/README.md index a8d5302ace56..3ba5bce80ee6 100644 --- a/src/frame/README.md +++ b/src/frame/README.md @@ -1,5 +1,138 @@ # Frame -This is the outlining directory that makes the site work, the spine of the site. +The frame subject provides the foundational infrastructure for the docs.github.com application. It serves as the "spine" of the site, handling core Express server setup, middleware orchestration, shared React components, and fundamental utilities that don't belong to a specific subject. + +## Purpose & Scope + +This subject is responsible for: +* Server initialization and Express app creation +* Middleware orchestration across all subjects +* Context management (`req.context` object building) +* Shared React components (layouts, navigation, error pages) +* Fundamental utilities (path parsing, frontmatter validation, page data) +* Next.js integration and page routing + +Philosophy: The preference is to move code into more specific subject folders when possible. Frame should contain only cross-cutting concerns that truly span multiple subjects or don't have a clear subject-specific home. + +## Architecture & Key Assets + +``` +src/frame/ +├── components/ # Shared React components (DefaultLayout, Link, article, page-header/footer) +├── lib/ # Core utilities (app.ts, page.ts, frontmatter.ts, path-utils.ts) +├── middleware/ # Express middleware pipeline and context builders +├── pages/ # Next.js pages directory (legacy) +├── stylesheets/ # Global CSS and SCSS +├── server.ts # Server entry point +└── start-server.ts # Server startup logic +``` + +### Key files and functions + +- `lib/app.ts` - `createApp()`: Creates and configures the Express application with all middleware +- `lib/warm-server.ts` - `warmServer()`: Pre-loads pages, redirects, and site tree on startup +- `lib/page.ts` - `Page` class: Represents a content page with rendering and metadata methods +- `lib/frontmatter.ts` - AJV schema: Validates frontmatter structure for all markdown files +- `lib/path-utils.ts` - Path parsing functions: Extract version, product, language from URLs +- `middleware/index.ts` - Orchestrates the full middleware pipeline across all subjects +- `middleware/context/context.ts` - `contextualize()`: Initializes base `req.context` object +- `middleware/find-page.ts` - Locates the matching page in the site tree +- `middleware/render-page.ts` - Renders page content and sends response +- `components/DefaultLayout.tsx` - Main layout wrapper for all pages + +## Setup & Usage + +### Running the server locally + +```bash +npm run dev +# Server starts at http://localhost:4000 +``` + +### Running tests + +```bash +npm run test -- src/frame/tests +``` + +### Middleware pipeline order + +The middleware in `middleware/index.ts` executes in a specific order. Key stages: +1. Connection management (timeout, abort handling) +2. Security and headers (helmet, CORS) +3. Language detection and context initialization +4. URL normalization and redirects +5. Page finding and subject-specific middleware +6. Context enrichment (breadcrumbs, TOC, features) +7. Page rendering +8. Error handling + +### Adding new middleware + +When adding middleware, consider: +- Does it belong in a specific subject? If so, add it there and import into `middleware/index.ts` +- Where does it fit in the pipeline order? +- Does it need to modify `req.context`? +- Add to `middleware/index.ts` in the appropriate position + +## Data & External Dependencies + +### Data inputs +- Content files (`content/` directory) parsed and loaded into pages +- Data files (`data/` directory) loaded for variables, features, versions +- Frontmatter schema defines required/optional fields for all pages + +### Dependencies +- Express.js for HTTP server and middleware +- Next.js for some routing and SSR (transitioning away from pages/ directory) +- AJV for frontmatter validation +- Various subject middleware (versions, languages, redirects, etc.) + +### Data outputs +- `req.context` object: Populated and passed to all downstream middleware and components +- Site tree: Navigation structure built from content files +- Rendered HTML: Final page output sent to clients + +## Cross-links & Ownership + +### Related subjects +Nearly every subject interacts with frame: +- [`src/versions`](../versions/README.md) - Version detection and middleware +- [`src/languages`](../languages/README.md) - Language detection and translation +- [`src/redirects`](../redirects/README.md) - URL redirect handling +- [`src/content-render`](../content-render/README.md) - Markdown rendering +- [`src/landings`](../landings/README.md) - Landing page layouts +- [`src/learning-track`](../learning-track/README.md) - Learning track navigation + +### Ownership +- Team: Docs Engineering + +## Current State & Next Steps + +### Known limitations +- Middleware pipeline complexity: Many middleware pieces interact, making debugging challenging +- Context object size: `req.context` accumulates many properties across middleware +- Mixed patterns: Some components are in `components/`, others in subject folders +- Legacy pages directory: Transitioning from Next.js pages/ to app/ router + +### Migration in progress +- Moving from Next.js pages router to app router +- Refactoring subject-specific code out of frame when possible +- Consolidating similar patterns across middleware + +### When to add code to frame +Add code here only if: +- It's truly cross-cutting (used by 3+ subjects) +- It's fundamental infrastructure (server, middleware orchestration) +- No specific subject is a clear fit + +Otherwise, prefer adding to a subject-specific directory. + +### Future improvements +### Possible extractions +- Extract subject-specific middleware to their own directories (e.g., move language detection middleware to `src/languages/middleware`) +- Extract redirect logic to `src/redirects/middleware` +- Extract version detection to `src/versions/middleware` +- Extract search-specific middleware to `src/search/middleware` +- Extract context object building to individual subjects -The preference would be to put these files into a more specific subject folder when possible. diff --git a/src/metrics/README.md b/src/metrics/README.md index e41d9bc3702a..6f288d50bd6b 100644 --- a/src/metrics/README.md +++ b/src/metrics/README.md @@ -1,76 +1,210 @@ -# Kusto tooling +# Metrics -CLI tools to fetch data from the Kusto API. +The metrics subject provides CLI tools for fetching analytics data from Kusto (Azure Data Explorer) about GitHub Docs usage. These tools help content strategists, writers, and engineers understand page performance, user behavior, and content effectiveness. -## Installation and authentication +## Purpose & Scope -1. Install the Azure CLI with `brew install azure-cli`. - * If you have the option to **not** update all your brew packages, choose that, or it will take a really long time. -1. Run `az login`. - * You'll have to run `az login` whenever your session expires. The sessions are fairly long lasting. -1. Enter your `@githubazure.com` credentials. - * These will get cached for future logins. -1. At the prompt in Terminal asking which subscription you want to use, just press Enter to choose the default. -1. Open or create an `.env` file in the root directory of your checkout (this file is already in `.gitignore` so it won't be tracked by Git). -1. Add the `KUSTO_CLUSTER` and `KUSTO_DATABASE` values to the `.env` (_these values are pinned in slack_): - ``` - KUSTO_CLUSTER='' - KUSTO_DATABASE='' - ``` +This subject is responsible for: +- Providing CLI tools to query Kusto for docs analytics +- `docstat` - Get metrics for a single URL (views, users, bounces, etc.) +- `docsaudit` - Get metrics for an entire content directory +- Kusto query abstractions for common metrics +- Authentication and connection to Azure Kusto +- Date range calculations for time-series queries -## docstat usage +## Architecture & Key Assets -Run `npm run docstat -- ` on any GitHub Docs URL to gather a set of default metrics about it, including 30d views, users, view duration, bounces, helpfulness score, and exits to support. +### Key capabilities and their locations -Notes: -* If the URL doesn't include a version, `docstat` will return data that includes **all versions** (so FPT, Cloud, Server, etc.). - * If you want data for FPT only, pass the `--fptOnly` option. -* `docstat` only accepts URLs with an `en` language code or no language code, and it only fetches English data. +- `lib/kusto-client.ts` - `getKustoClient()`: Creates authenticated Kusto client using Azure CLI +- `lib/kusto-client.ts` - `runQuery()`: Executes Kusto queries and returns results +- `scripts/docstat.ts` - CLI tool: Fetches metrics for a single docs URL +- `scripts/docsaudit.ts` - CLI tool: Audits entire content directories with CSV output +- `queries/*.ts` - Pre-defined Kusto queries for specific metrics -To see all the options: -``` -npm run docstat -- --help -``` -You can combine options like this: -``` -npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --compare --range 60 -``` -Use `--redirects` to include `redirect_from` frontmatter paths in the queries (this is helpful if the article may have moved recently): -``` -npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --redirects -``` -Use the `--json` (or `-j`) option to output JSON: -``` -npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --json -``` -If you want to pass the results of the JSON to `jq`, you need to use `silent` mode: -``` -npm run --silent docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --json | jq .data.users -``` +## Setup & Usage + +### Installation and authentication + +1. Install Azure CLI: + ```bash + brew install azure-cli + ``` + +2. Login with Azure credentials: + ```bash + az login + ``` + Use your `@githubazure.com` credentials. -## docsaudit usage +3. Add Kusto configuration to `.env` file (values pinned in Slack): + ``` + KUSTO_CLUSTER='' + KUSTO_DATABASE='' + ``` -Run `npm run docsaudit` on a top-level content directory to gather data about its files—including title, path, versions, 30d views, and 30d users—and output it to a CSV file. +### docstat usage -To see all the options: +Get metrics for a single URL: + +```bash +npm run docstat -- ``` -npm run docsaudit -- --help + +Example: +```bash +npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code ``` -Run the script on any top-level content directory: + +Default metrics returned: +- 30-day views +- 30-day unique users +- Average view duration +- Bounce rate +- Helpfulness score (survey data) +- Exits to support + +#### Options + +```bash +# Compare with previous period +npm run docstat -- --compare + +# Custom date range (60 days) +npm run docstat -- --range 60 + +# Include redirects from frontmatter +npm run docstat -- --redirects + +# FPT data only (default includes all versions) +npm run docstat -- --fptOnly + +# JSON output +npm run docstat -- --json + +# Combine options +npm run docstat -- --compare --range 60 --redirects ``` -npm run docsaudit -- + +#### JSON output with jq + +```bash +npm run --silent docstat -- --json | jq .data.users ``` -For example: + +### docsaudit usage + +Audit an entire content directory: + +```bash +npm run docsaudit -- ``` + +Example: +```bash npm run docsaudit -- actions ``` -## Future development +Output includes: +- Title +- Path +- Versions +- 30-day views +- 30-day unique users + +Results are saved to a CSV file in the project root. + +## Data & External Dependencies + +### Data sources +- Kusto (Azure Data Explorer) - GitHub's data warehouse for analytics +- Docs event data - Page views, user interactions, surveys +- Content frontmatter - For path resolution and redirect detection + +### Dependencies +- `azure-kusto-data` - Official Azure Kusto SDK +- Azure CLI - For authentication (`az login`) +- Environment variables: `KUSTO_CLUSTER`, `KUSTO_DATABASE` + +### Authentication +- Uses Azure CLI identity via `withAzLoginIdentity()` +- Sessions are long-lasting but expire periodically +- Re-run `az login` when session expires + +### Queries +Pre-defined queries in `queries/` directory: +- `views.ts` - Total page views +- `users.ts` - Unique users +- `view-duration.ts` - Average session duration +- `bounces.ts` - Percentage of single-page sessions +- `survey-score.ts` - Helpfulness rating from surveys +- `exits-to-support.ts` - Clicks on support links + +## Cross-links & Ownership + +### Related subjects +- [`src/events`](../events/README.md) - Source of analytics event data +- [`src/frame`](../frame/README.md) - Frontmatter reading for path resolution +- Kusto database - Contains aggregated event data + +### Internal documentation +For Kusto cluster details and database schema, see internal Docs Engineering documentation. Credentials are pinned in the #docs-engineering Slack channel. + +### Ownership +- Team: Docs Content (with engineering support and reviews) +- Data questions: #docs-data + +## Current State & Next Steps + +### Known limitations +- Date range only accepts start date (end date is always current) +- Only English (`en`) language data is supported +- Queries are hardcoded in `queries/` directory +- URLs without version include all versions (FPT, GHEC, GHES combined) + +### Metrics available +Current metrics: +- Views (page view count) +- Users (unique user count) +- View duration (average time on page) +- Bounces (single-page sessions) +- Survey score (helpfulness rating) +- Exits to support (support link clicks) + +### Adding a new query + +1. Create new file in `src/metrics/queries/` +2. Export a function that returns a Kusto query string +3. Import and call in `docstat.ts` or `docsaudit.ts` +4. Update CLI options if needed + +Example: +```typescript +// queries/my-metric.ts +export function getMyMetric(path: string, startDate: string, endDate: string): string { + return ` + PageViews + | where Timestamp between (datetime(${startDate}) .. datetime(${endDate})) + | where Path == "${path}" + | summarize Count = count() + ` +} +``` + +### Troubleshooting + +**Azure login expired:** +```bash +az login +``` -Applies to all scripts: +**Missing environment variables:** +Check `.env` file has `KUSTO_CLUSTER` and `KUSTO_DATABASE` (values in Slack) -* The date range option only accepts a start date (via `-r `, where the number means "`` days ago"). The end date will always be the current date. - * In the future, we can add an option to set a custom end date. +**No data found:** +- Verify URL is correct and includes `https://docs.github.com` +- Check date range (older content may have limited data) +- Try `--redirects` if article was recently moved -* The only Kusto queries available are hardcoded in the `kusto/queries` directory. - * In the future, we can hardcode more queries, add the ability to send custom queries, or perhaps create pre-defined sets of queries. \ No newline at end of file +**Permission errors:** +Ensure your Azure account has read access to the Kusto database. Contact #docs-data if needed. diff --git a/src/observability/README.md b/src/observability/README.md index 2a2ffb5069e3..4fb9f995a06c 100644 --- a/src/observability/README.md +++ b/src/observability/README.md @@ -1,11 +1,208 @@ # Observability -Observability, for lack of simpler term, is our ability to collect data about how the Docs operates. These tools allow us to monitor the health of our systems, catch any errors, and get paged if a system stops working. +The observability subject provides logging, error tracking, and monitoring infrastructure for docs.github.com. These tools help monitor system health, catch errors, and provide operational visibility through structured logging and alerting. -In this directory we have files that connect us to our observability tools, as well as high-level error handling that helps keep our systems resilient. +## Purpose & Scope -We collect data in our observability systems to track the health of the Docs systems, not to track user behaviors. User behavior data collection is under the `src/events` directory. +This subject is responsible for: +- Structured logging with logfmt format in production +- Logger abstraction over `console.log` for server-side code +- Error handling and resilience (catch and report errors) +- Integration with Sentry for error tracking +- Integration with StatsD for metrics +- Integration with Failbot for alerts +- Automatic request logging middleware +- Request context tracking via `requestUuid` -## Logging +Note: This tracks system health, not user behavior. User behavior tracking is in [`src/events`](../events/README.md). + +## Architecture & Key Assets + +### Key capabilities and their locations + +- `logger/index.ts` - `createLogger()`: Creates logger instance for a module +- `logger/middleware/get-automatic-request-logger.ts` - Express middleware for automatic request logging +- `middleware/handle-errors.ts` - Global Express error handler that logs and reports errors +- `middleware/catch-middleware-error.ts` - Wraps async middleware to catch errors +- `lib/failbot.ts` - Reports errors to Failbot for alerting +- `lib/statsd.ts` - Sends metrics to StatsD for monitoring + +## Setup & Usage + +### Using the logger + +Instead of `console.log`, use the logger: + +```typescript +import { createLogger } from '@/observability/logger' + +// Pass import.meta.url to include filename in logs +const logger = createLogger(import.meta.url) + +// Log levels: error, warn, info, debug +logger.info('Processing request', { userId: '123' }) +logger.error('Failed to process', { error }) +``` + +Log levels (highest to lowest): +1. `error` - Errors that need attention +2. `warn` - Warnings that may need attention +3. `info` - Informational messages +4. `debug` - Detailed debugging information + +Set `LOG_LEVEL` environment variable to filter logs: +```bash +LOG_LEVEL=info npm run dev # Filters out debug logs +``` + +### Benefits of structured logging + +1. **Logfmt format in production** - Easy to query in Splunk with key-value pairs +2. **Log level grouping** - Filter by severity (`error`, `warn`, `info`, `debug`) +3. **Request context** - Every log includes `path` and `requestUuid` +4. **Sentry integration** - Errors in Sentry include `requestUuid` to find related logs +5. **Development clarity** - Simple string logs in development, structured in production + +### Automatic request logging + +Request logging happens automatically via middleware: +- Development: `GET /en 200 2ms` +- Production: Logfmt with full context including `requestUuid` + +All application logs from the same request share the same `requestUuid`. + +### Error handling + +Wrap async middleware to catch errors: + +```typescript +import catchMiddlewareError from '@/observability/middleware/catch-middleware-error' + +router.get('/path', catchMiddlewareError(async (req, res) => { + // Errors here are caught and handled + const data = await fetchData() + res.json(data) +})) +``` + +Global error handler in `middleware/handle-errors.ts` catches all Express errors. + +## Data & External Dependencies + +### Data inputs +- Application logs from `logger.()` calls +- Request metadata (path, method, status, duration) +- Error objects with stack traces +- Request context (`requestUuid`, user agent, etc.) + +### Dependencies +- **Splunk** - Log aggregation and querying (index: `docs-internal`) +- **Sentry** - Error tracking and alerting +- **StatsD** - Metrics collection +- **Failbot** - Error reporting and alerting +- **Logfmt** - Log format library + +### Data outputs +- Structured logs sent to Splunk +- Errors reported to Sentry with context +- Metrics sent to StatsD +- Alerts sent via Failbot + +## Cross-links & Ownership + +### Related subjects +- [`src/events`](../events/README.md) - User behavior analytics (separate from observability) +- [`src/frame`](../frame/README.md) - Middleware pipeline where error handlers run +- All subjects - All should use `createLogger()` instead of `console.log` + +### Internal documentation +- Splunk dashboard: https://splunk.githubapp.com/en-US/app/gh_reference_app/search +- For detailed logging guide, see `logger/README.md` in this directory +- Sentry dashboard: (internal link) +- On-call runbooks: (internal Docs Engineering repo) + +### Ownership +- Team: Docs Engineering +- Note: We don't own Datadog or the observability infrastructure itself - we're working with what the observability team provides. + +## Current State & Next Steps + +### Querying logs in Splunk + +All queries should specify index: +```splunk +index=docs-internal +``` + +Find logs by request: +```splunk +index=docs-internal requestUuid="abc-123" +``` + +Find errors: +```splunk +index=docs-internal level=error +``` + +Find logs from specific module: +```splunk +index=docs-internal module="src/search/middleware/general-search.ts" +``` + +### Request context + +Every log includes: +- `requestUuid` - Unique ID for the request +- `path` - Request path +- `method` - HTTP method +- `statusCode` - Response status +- `duration` - Request duration +- `module` - Source file (from `import.meta.url`) + +### Error reporting flow + +1. Error occurs in application code +2. Caught by `catchMiddlewareError` or global error handler +3. Logged with `logger.error()` including stack trace +4. Reported to Sentry with `requestUuid` +5. Critical errors trigger Failbot alerts + +### Adding observability to new code + +1. Import and create logger at top of file: + ```typescript + import { createLogger } from '@/observability/logger' + const logger = createLogger(import.meta.url) + ``` + +2. Log important events: + ```typescript + logger.info('Cache hit', { key }) + logger.warn('Rate limit approaching', { count }) + logger.error('Database connection failed', { error }) + ``` + +3. Wrap async middleware: + ```typescript + import catchMiddlewareError from '@/observability/middleware/catch-middleware-error' + router.use(catchMiddlewareError(myMiddleware)) + ``` + +### Known limitations +- Logs are verbose in production (logfmt includes full context) +- `requestUuid` tracking requires middleware initialization +- Development logs are simplified strings (less structured) + +### Planned work +- We have an epic to improve our logging + +### Monitoring and alerting + +Active monitoring: +- Error rates tracked in Sentry +- Performance metrics tracked in StatsD +- Critical errors trigger Failbot alerts to #docs-ops +- On-call rotation notified for production incidents + +For on-call procedures and escalation, see internal Docs Engineering runbooks. -Please see the [logger README](./logger/README.md). diff --git a/src/pages/README.md b/src/pages/README.md index ef61355b45bd..f0c364a983bc 100644 --- a/src/pages/README.md +++ b/src/pages/README.md @@ -1,9 +1,143 @@ # Pages -This is the Next.js pages directory. +The pages subject is the Next.js pages directory that defines route structure for docs.github.com. This directory acts as a thin routing layer that delegates to actual page implementations in subject-specific directories. -See +## Purpose & Scope -There is almost no code in this directory, instead the actual pages live with their subject siblings. These files directly export from the page files in the relative subjects. +This subject is responsible for: +* Defining Next.js routes using file-system routing +* Re-exporting page components from subject directories +* Custom `_app.tsx` for application wrapper +* Custom `_document.tsx` for HTML document structure +* Custom `_error.tsx` for error page handling +* Route structure matching content hierarchy + +Note: Actual page implementations live in subject directories (e.g., `src/landings/pages/`, `src/rest/pages/`). This directory contains mostly re-exports and special Next.js files. + +## Architecture & Key Assets + +### Key capabilities and their locations + +- `_app.tsx` - Application wrapper, imports global styles, re-exports from `@/frame/pages/app` +- `_document.tsx` - Custom HTML document with styled-components SSR and color scheme defaults +- `_error.tsx` - Error page that reports to Failbot on server-side errors +- `index.tsx` - Homepage, re-exports from `@/landings/pages/home` +- `[versionId]/[productId]/index.tsx` - Product/category pages, re-exports from `@/landings/pages/product` + +## Setup & Usage + +### File-system routing + +Next.js uses file-system routing where file paths map to URLs: +- `pages/index.tsx` → `/` +- `pages/search.tsx` → `/search` +- `pages/[versionId]/index.tsx` → `/free-pro-team@latest`, `/enterprise-server@3.11`, etc. +- `pages/[versionId]/[productId]/index.tsx` → `/free-pro-team@latest/actions`, etc. + +Dynamic segments use brackets: `[versionId]`, `[productId]` + +### Page delegation pattern + +Most files in this directory are simple re-exports: + +```typescript +// pages/index.tsx +export { default, getServerSideProps } from '@/landings/pages/home' +``` + +This keeps routing logic in `src/pages/` while page implementation stays with its subject. + +### Special Next.js files + +- `_app.tsx` - Wraps every page, initializes global state, imports styles +- `_document.tsx` - Customizes HTML structure, handles styled-components SSR +- `_error.tsx` - Renders error pages, reports server-side errors to Failbot + +### Adding a new route + +1. Determine the URL structure +2. Create file in `src/pages/` matching the route +3. Implement page component in appropriate subject directory +4. Re-export from `src/pages/` file: + ```typescript + export { default, getServerSideProps } from '@/my-subject/pages/my-page' + ``` + +## Data & External Dependencies + +### Dependencies +- Next.js pages router (being migrated to app router) +- Subject page implementations (`@/landings`, `@/rest`, `@/search`, etc.) +- `@/frame` - Application wrapper and global styles +- styled-components - CSS-in-JS for server-side rendering + +### Route resolution +1. Next.js matches incoming URL to file in `src/pages/` +2. Imports re-exported component from subject directory +3. Calls `getServerSideProps` if present +4. Renders page with data + +## Cross-links & Ownership + +### Related subjects +- [`src/frame`](../frame/README.md) - Provides `_app` implementation and global infrastructure +- [`src/landings`](../landings/README.md) - Homepage and product/category pages +- [`src/rest`](../rest/README.md) - REST API documentation pages +- [`src/graphql`](../graphql/README.md) - GraphQL API documentation pages +- [`src/webhooks`](../webhooks/README.md) - Webhooks documentation pages +- [`src/search`](../search/README.md) - Search pages +- All subjects with `pages/` directories + +### Ownership +- Team: Docs Engineering + +## Current State & Next Steps + +### Migration in progress +We are migrating from Next.js pages router to app router: +- New pages should use app router in `src/app/` +- Pages router in `src/pages/` is legacy +- Migration tracked in internal issue + +### Known limitations +- Pages router is deprecated by Next.js in favor of app router +- Some code still exists in `_error.tsx` and `_document.tsx` that should be moved +- Route structure tightly coupled to content hierarchy + +### When to edit files here + +Edit `_app.tsx`: +- Never (it's a re-export from `@/frame/pages/app`) + +Edit `_document.tsx`: +- Only for global HTML document changes +- styled-components SSR configuration +- Default color scheme values + +Edit `_error.tsx`: +- Only for global error handling changes +- Failbot reporting configuration + +Add new route files: +- When defining new URL structures +- Usually just re-export from subject directory + +### App router migration + +For new features, use app router: +- Routes defined in `src/app/` instead of `src/pages/` +- Layouts instead of `_app.tsx` +- Error boundaries instead of `_error.tsx` +- New routing conventions with `page.tsx`, `layout.tsx`, etc. + +See Next.js documentation for app router migration guide. + +### Testing route changes + +```bash +npm run dev +# Access routes in browser to verify they work +``` + +Routes should load without errors and render correct content from subject directories. -TODO migrate code out of `_error.tsx`, `_document.tsx`, and `404.tsx`. diff --git a/src/secret-scanning/README.md b/src/secret-scanning/README.md index 27cfb63b00d6..98fe1823dd53 100644 --- a/src/secret-scanning/README.md +++ b/src/secret-scanning/README.md @@ -1,18 +1,153 @@ # Secret scanning -This secret scanning pipeline automates a table displayed on the [Supported secret scanning patterns](https://docs.github.com/code-security/secret-scanning/introduction/supported-secret-scanning-patterns#supported-secrets) page. +The secret-scanning subject automates the generation and maintenance of the "Supported secret scanning patterns" table on docs.github.com. It fetches data from upstream sources, transforms it, and renders it as a Liquid-powered Markdown table. -Each day a workflow checks if the [data](src/secret-scanning/data/public-docs.yml) is up-to-date. When there are changes, the workflow automatically creates a pull request to update the `src/secret-scanning/data/public-docs.yml` file. The workflow runs `npm run sync-secret-scanning` to check for updates. +## Purpose & Scope -This pipeline uses middleware to check if the path of the URL matches the page that contains the table. The middleware decorates the context with the data, which is displayed on the page using a Markdown table and Liquid. For example: +This subject is responsible for: +- Syncing secret scanning pattern data from upstream sources +- Storing pattern data in YAML files by version +- Middleware that injects data into the supported patterns page +- Rendering the patterns table with Liquid in Markdown +- Automated daily workflow to check for and update pattern changes + +The table appears on: [Supported secret scanning patterns](https://docs.github.com/code-security/secret-scanning/introduction/supported-secret-scanning-patterns#supported-secrets) + +## Architecture & Key Assets + +### Key capabilities and their locations + +- `middleware/secret-scanning.ts` - Middleware that loads YAML data and adds to `req.context.secretScanningData` +- `scripts/sync.ts` - Script that syncs pattern data from upstream sources and updates YAML files +- `lib/config.json` - Configuration specifying which page gets the data (`targetFilename`) +- `data/pattern-docs/*.yml` - YAML files containing pattern data per version + +## Setup & Usage + +### Daily automated sync + +A GitHub Actions workflow runs daily to check for pattern updates: +1. Runs `npm run sync-secret-scanning` +2. If changes detected, creates a PR to update YAML files +3. Team reviews and merges PR + +### Manual sync + +To manually sync pattern data: + +```bash +npm run sync-secret-scanning +``` + +This fetches latest pattern data and updates YAML files in `data/pattern-docs/`. + +### How the table is rendered + +1. Middleware checks if current page matches `targetFilename` from config +2. Loads appropriate YAML file based on version (FPT/GHEC/GHES) +3. Adds data to `req.context.secretScanningData` +4. Markdown uses Liquid to render table rows + +Example Markdown with Liquid: ```markdown - {% ifversion fpt %} | Provider | Token | Partner | User | Push protection | Base64 | |----|:----|:----:|:----:|:----:| {%- for entry in secretScanningData %} -| {{ entry.provider }} | {{ entry.secretType }} | {% if entry.isPublic %}{% octicon "check" aria-label="Supported" %}{% else %}{% octicon "x" aria-label="Unsupported" %}{% endif %} | {% if entry.isPrivateWithGhas %}{% octicon "check" aria-label="Supported" %}{% else %}{% octicon "x" aria-label="Unsupported" %}{% endif %} | {% if entry.hasPushProtection %}{% octicon "check" aria-label="Supported" %}{% else %}{% octicon "x" aria-label="Unsupported" %}{% endif %} | {% if entry.base64Supported %}{% octicon "check" aria-label="Supported" %}{% else %}{% octicon "x" aria-label="Unsupported" %}{% endif %} | +| {{ entry.provider }} | {{ entry.secretType }} | {% if entry.isPublic %}{% octicon "check" %}{% else %}{% octicon "x" %}{% endif %} | ... {%- endfor %} ``` + +### Data structure + +Each pattern entry includes: +- `provider` - Service provider name +- `secretType` - Type of secret/token +- `isPublic` - Available on public repos +- `isPrivateWithGhas` - Available on private repos with GHAS +- `hasPushProtection` - Has push protection enabled +- `hasValidityCheck` - Has validity checking +- `base64Supported` - Supports base64-encoded secrets + +## Data & External Dependencies + +### Data inputs +- Upstream secret scanning pattern sources (internal APIs) +- Existing YAML files in `data/pattern-docs/` +- Version information from `@/versions/lib/all-versions` + +### Dependencies +- `js-yaml` - YAML parsing and generation +- `@/content-render` - Liquid rendering for table +- `@/versions` - Version detection and mapping +- GitHub Actions workflow for automated sync + +### Data outputs +- Updated YAML files in `data/pattern-docs/` +- `req.context.secretScanningData` - Array of pattern objects +- Rendered Markdown table on docs page + +## Cross-links & Ownership + +### Related subjects +- [`src/content-render`](../content-render/README.md) - Liquid rendering for table +- [`src/versions`](../versions/README.md) - Version detection for loading correct data file +- Content page: `content/code-security/secret-scanning/introduction/supported-secret-scanning-patterns.md` + +### Internal documentation +For upstream data source details and API access, see internal Docs Engineering documentation. + +### Ownership +- Team: Docs Engineering +- Content: Code Security team + +## Current State & Next Steps + +### Automated workflow + +GitHub Actions workflow (`.github/workflows/sync-secret-scanning.yml`) runs daily: +- Checks for pattern updates +- Creates PR if changes found +- Runs `npm run sync-secret-scanning` + +### Version-specific data + +Different data files per version: +- `dotcom.yml` - Free, Pro, Team (FPT) +- `ghec.yml` - GitHub Enterprise Cloud +- `ghes-{version}.yml` - GitHub Enterprise Server versions + +Middleware automatically selects correct file based on `req.context.currentVersion`. + +### Known limitations +- Manual review required for auto-generated PRs +- Pattern data schema must match between upstream and our YAML +- Changes to upstream API may break sync script +- Table only appears on one specific page (configured in `config.json`) + +### Expanding to more pages + +To display secret scanning data on additional pages: +1. Update `config.json` with new target filenames (as array) +2. Update middleware to handle multiple pages +3. Add Liquid table rendering to those pages + +### Troubleshooting sync issues + +**Sync fails:** +- Check upstream API access and credentials +- Verify YAML file permissions +- Check for schema changes in upstream data + +**Table not rendering:** +- Verify page path matches `targetFilename` in `config.json` +- Check that `secretScanningData` is in context +- Verify Liquid syntax in Markdown + +**Wrong data version:** +- Check version detection logic in middleware +- Verify correct YAML file exists for version +- Check version mapping in middleware + diff --git a/src/shielding/README.md b/src/shielding/README.md index b7b5cfbed529..48af08909ffd 100644 --- a/src/shielding/README.md +++ b/src/shielding/README.md @@ -1,41 +1,168 @@ # Shielding -## Overview +The shielding subject protects docs.github.com from junk requests, abuse, and unnecessary server load. It implements various middleware to detect and handle suspicious traffic patterns, invalid requests, and rate limiting. -Essentially code in our server that controls the prevention of "junk requests" is scripted HTTP requests to endpoints that are _not_ made by regular browser users. +## Purpose & Scope -For example, there's middleware code that sees if a `GET` request -comes in with a bunch of random looking query strings keys. This would cause a PASS on the CDN but would not actually matter to the rendering. In this -case, we spot this early and return a redirect response to the same URL -without the unrecognized query string keys so that if the request follows -redirects, the eventual 200 would be normalized by a common URL so the CDN -can serve a HIT. +This subject is responsible for: +- Detecting and handling invalid or suspicious requests +- Rate limiting suspicious traffic patterns +- Normalizing URLs to improve CDN cache hit rates +- Preventing abuse from scripted/bot traffic +- Redirecting malformed requests +- Protecting backend servers from unnecessary work -Here's an in-time discussion post that summaries the _need_ and much of the -recent things we've done to fortify our backend servers to avoid unnecessary -work loads: +Shielding code controls the prevention of "junk requests" - scripted HTTP requests that are not made by regular browser users. -**[How we have fortified Docs for better resiliency and availability (June 2023)](https://github.com/github/docs-engineering/discussions/3262)** +## Architecture & Key Assets -## How it works +### Key capabilities and their locations -At its root, the `src/shielding/frame/middleware/index.ts` is injected into our -Express server. From there, it loads all its individual middleware handlers. +- `middleware/index.ts` - Main entry point that orchestrates all shielding middleware and rate limiting +- Individual middleware files - Each focuses on a single abuse pattern identified from log analysis +- Rate limiting logic - Uses `createRateLimiter()` for suspicious and API routes -Each middleware is one file that focuses on a single use-case. The -use-cases are borne from studying log files to -spot patterns of request abuse. +## Setup & Usage -> [!NOTE] -> Some shielding "tricks" appear in other places throughout the code -> base such as controlling the 404 response for `/assets/*` URLs. +### How it works -## Rate limiting +1. `src/shielding/middleware/index.ts` is injected into the Express server +2. Loads all individual middleware handlers +3. Each middleware focuses on a single use-case/abuse pattern +4. Abuse patterns discovered by studying log files -We rate limit at multiple levels: +### Rate limiting + +Three levels of rate limiting: + +1. **CDN (Fastly)** - First line of defense +2. **Suspicious routes** - Via shielding middleware + - Only rate limited if deemed suspicious based on checked parameters + - Implemented in `middleware/index.ts` with `createRateLimiter()` +3. **API routes** - Via API declaration + - Limited to certain # of requests per minute, regardless of request characteristics + - Implemented in `src/frame/middleware/api.ts` + +### Common shielding patterns + +**Invalid query strings:** +- Request: `GET /path?random=abc&weird=xyz` +- Action: Redirect to `/path` (normalized URL) +- Benefit: CDN can serve cached response for normalized URL + +**Malformed URLs:** +- Invalid characters or patterns in URL +- Action: Return 400 or redirect to corrected URL +- Benefit: Prevent errors propagating to application code + +**Invalid paths:** +- Suspicious path patterns (probes, exploits) +- Action: Reject with appropriate status code +- Benefit: Prevent unnecessary processing + +### Running tests + +```bash +npm run test -- src/shielding/tests +``` + +## Data & External Dependencies + +### Data inputs +- HTTP request metadata (path, query strings, headers) +- Known good/bad patterns from log analysis +- CDN cache behavior data + +### Dependencies +- Express middleware +- Rate limiting library (likely `express-rate-limit` or similar) +- `@/frame` - Express server integration +- CDN configuration (Fastly) + +### Data outputs +- HTTP responses (redirects, 400s, 429s for rate limit) +- Cache-friendly normalized URLs +- Reduced backend server load + +## Cross-links & Ownership + +### Related subjects +- [`src/frame`](../frame/README.md) - Express middleware pipeline integration +- [`src/observability`](../observability/README.md) - Logging suspicious traffic patterns +- CDN configuration - Fastly edge rules + +### Internal documentation +For detailed discussion on resilience and availability improvements, see: +- [How we have fortified Docs for better resiliency and availability (June 2023)](https://github.com/github/docs-engineering/discussions/3262) + +### Ownership +- Team: Docs Engineering + +## Current State & Next Steps + +### Shielding strategies + +Each middleware implements a specific strategy based on observed abuse: +- Query string normalization for CDN optimization +- Path validation to reject probes/exploits +- Header validation to detect bot traffic +- Next.js path handling for framework-specific patterns + +### Known limitations +- Shielding is reactive (based on observing abuse patterns) +- Some legitimate traffic may be affected if patterns overlap with abuse +- Rate limits are tuned based on historical data +- Some shielding logic exists outside this subject (e.g., `/assets/*` 404 handling) + +### Adding new shielding middleware + +1. Identify abuse pattern from logs +2. Create new middleware file in `src/shielding/middleware/` +3. Implement detection and handling logic +4. Add to orchestrator in `index.ts` +5. Add tests in `tests/` +6. Monitor impact on CDN cache hit rate and server load + +### Monitoring shielding effectiveness + +Key metrics: +- CDN cache hit rate (should increase) +- Backend server load (should decrease) +- 4xx/5xx error rates (monitor for false positives) +- Rate limit triggers (logged in observability) + +Check #docs-ops and monitoring dashboards for ongoing effectiveness. + +### Configuration + +Rate limit configuration: +- Thresholds tuned based on traffic patterns +- Different limits for different route types +- Suspicious request detection parameters + +CDN integration: +- Works with Fastly configuration +- Ensures normalized URLs maximize cache hits +- Some shielding happens at CDN edge +- Dashboard for real-time shielding metrics + +### Troubleshooting + +**Legitimate traffic blocked:** +- Check shielding logs in Splunk +- Identify which middleware triggered +- Adjust pattern matching or rate limits +- Consider allowlist for specific use cases + +**Abuse still getting through:** +- Analyze logs for new patterns +- Add new middleware to handle pattern +- Adjust existing middleware thresholds +- Consider CDN-level blocking + +**CDN cache hit rate not improving:** +- Verify URL normalization is working +- Check that redirects are followed +- Analyze cache miss patterns +- Coordinate with CDN configuration -1. CDN (Fastly) -2. All routes via [src/shielding/frame/index.ts](./middleware/index.ts) and the `createRateLimiter()` middleware. - - These routes are _only_ rate limited if they are deemed suspicious based on parameters we check. -3. API routes via their declaration in [src/frame/middleware/api.ts](../frame/middleware/api.ts) using the `createRateLimiter()` middleware. - - These routes are limited to a certain # of requests per minute, regardless of what the request looks like. diff --git a/src/tests/README.md b/src/tests/README.md index d144226b2f2f..d5f931d5b48a 100644 --- a/src/tests/README.md +++ b/src/tests/README.md @@ -1,128 +1,228 @@ # Tests -This directory contains utilities to support our automated testing efforts. +The tests subject contains utilities, helpers, and infrastructure to support automated testing across docs.github.com. This includes test helpers, mock servers, schema validation, and shared testing patterns. -**This directory should not include test suites.** Please use the best subject folder available. +> [!NOTE] +> This directory should not include test suites. Test files belong in their respective subject directories (e.g., `src/search/tests/`, `src/frame/tests/`). -It's not strictly necessary to run tests locally while developing. You can -always open a pull request and rely on the CI service to run tests for you, -but it's helpful to run tests locally before pushing your changes to -GitHub. +## Purpose & Scope -Tests are written using [vitest](https://vitest.dev/). +This subject is responsible for: +- Test utilities and helper functions shared across subjects +- Mock server infrastructure for integration tests +- Schema validation utilities (AJV) +- Test data and fixtures management +- Vitest configuration and setup +- TypeScript declarations for test tooling +- Shared testing patterns and conventions -`vitest` runs tests and handles assertions. +Tests are written using [Vitest](https://vitest.dev/) for unit and integration tests, and [Playwright](https://playwright.dev/) for end-to-end browser tests. -## Install optional dependencies +## Architecture & Key Assets -We typically rely on CI to run our tests, so some large test-only -dependencies are considered **optional**. To run the tests locally, you'll -need to make sure optional dependencies are installed by running: +### Key capabilities and their locations -```shell +- `lib/validate-json-schema.ts` - AJV validator factory for JSON schema validation +- `mocks/start-mock-server.ts` - Creates mock HTTP server for integration tests +- `helpers/e2etest.ts` - Utilities for end-to-end testing scenarios +- `vitest.setup.ts` - Global Vitest configuration and hooks + +## Setup & Usage + +### Installing test dependencies + +Test-only dependencies are optional to keep standard installs faster: + +```bash npm ci --include=optional ``` -## Running all the tests - -Once you've followed the development instructions above, you can run the entire -test suite locally: +### Running all tests -```shell +```bash npm test ``` -## Watching all the tests +### Running tests in watch mode -You can run a script that continually watches for changes and -re-runs the tests whenever a change is made. This command notifies you -when tests change to and from a passing or failing state, and it prints -out a test coverage report so you can see what files need testing. +Continuously re-runs tests on file changes: -```shell +```bash npm run test-watch ``` -## Running individual tests +### Running specific tests -You can run specific tests in two ways: - -```shell -# The TEST_NAME can be a filename, partial filename, or path to a file or directory +```bash +# By filename or path npm test -- -vitest path/to/tests/directory -``` +# By directory +vitest src/search/tests -## Allowing logging in tests +# Single test file +vitest src/versions/tests/versions.ts +``` -If you set up a `console.log` in the code and want to see the output, simply append the `--silent false` flag to your test to see console output. +### Viewing console output -## Failed Local Tests +By default, console.log is suppressed. To see output: -If the tests fail locally with an error like this: +```bash +npm test -- --silent=false +``` -`Could not find a production build in the '/Users/username/repos/docs-internal/.next' directory.` +### Building before tests -You may need to run this before every test run: +Some tests require a production build: -```shell +```bash npx next build +npm test ``` -## Linting +Error: `Could not find a production build` means you need to run `next build`. -To validate all your JavaScript code (and auto-format some easily reparable mistakes), -run the linter: +## Data & External Dependencies -```shell -npm run lint -``` - -## Keeping the server running +### Dependencies +- **Vitest** - Test runner and assertion library +- **Playwright** - Browser automation for E2E tests +- **AJV** - JSON schema validation +- Mock server libraries for HTTP mocking -When you run `vitest` tests that depend on making real HTTP requests -to `localhost:4000`, the `vitest` tests have a hook that starts the -server before running all/any tests and stops the server when done. +### Test data +- Fixture content in `src/fixtures/` +- Schema files in `helpers/schemas/` +- Mock responses in `mocks/` -You can disable this, which might make it easier when debugging tests -since the server won't need to start and stop every time you run tests. +### Server management -In one terminal, type: +Tests that make HTTP requests to `localhost:4000`: +- Vitest automatically starts/stops server via hooks +- Disable with `START_VITEST_SERVER=false` for manual server control -```shell +Manual server for debugging: +```bash +# Terminal 1 NODE_ENV=test PORT=4000 tsx src/frame/server.ts + +# Terminal 2 +START_VITEST_SERVER=false vitest src/versions/tests ``` -In another terminal, type: +## Cross-links & Ownership + +### Related subjects +- [`src/fixtures`](../fixtures/README.md) - Fixture-based testing with minimal content +- All subjects with `/tests/` directories - Test consumers +- CI workflows in `.github/workflows/` - Automated test execution + +### Testing documentation +- [Fixture content](../fixtures/README.md) - Fixture-based testing patterns +- [Playwright E2E tests](../fixtures/PLAYWRIGHT.md) - Headless browser testing + +### Ownership +- Team: Docs Engineering + +## Current State & Next Steps + +### Testing patterns -```shell -START_VITEST_SERVER=false vitests src/versions/tests +**Unit tests** - Test individual functions/modules: +```typescript +import { describe, test, expect } from 'vitest' + +describe('myFunction', () => { + test('returns expected value', () => { + expect(myFunction('input')).toBe('output') + }) +}) ``` -Or whatever the testing command you use is. +**Integration tests** - Test HTTP endpoints: +```typescript +import { get } from '@/tests/helpers/e2etest' -The `START_VITEST_SERVER` environment variable needs to be set to `false`, -or else `vitest` will try to start a server on `:4000` too. +test('GET /search returns results', async () => { + const res = await get('/search?query=test') + expect(res.statusCode).toBe(200) +}) +``` -## Debugging middleware errors +**Playwright tests** - Browser automation: +```typescript +test('search works in browser', async ({ page }) => { + await page.goto('/search') + await page.fill('input[name="query"]', 'test') + // ...assertions +}) +``` -By default, errors handled by the middleware are dealt with just like -any error in production. It's common to have end-to-end tests that expect -a page to throw a 500 Internal Server Error response. +### Debugging middleware errors -If you don't expect that and you might struggle to see exactly where the -error is happening, set `$DEBUG_MIDDLEWARE_TESTS` to `true`. For example: +Middleware errors are suppressed by default in tests. To see full errors: -```shell +```bash export DEBUG_MIDDLEWARE_TESTS=true -vitest src/shielding/tests -b +vitest src/shielding/tests ``` -## Fixture based testing +### Linting tests + +```bash +npm run lint +``` + +### Known limitations +- Optional dependencies must be installed for local testing +- Some tests require production build (`next build`) +- Server startup/shutdown adds overhead to test runs +- Fixtures may lag behind actual content structure + +### Test organization + +Tests should be co-located with their subject: +- ✅ `src/search/tests/api-search.ts` +- ✅ `src/versions/tests/middleware.ts` +- ❌ `src/tests/search-tests.ts` (wrong - not in subject) + +Shared utilities belong in `src/tests/`: +- Helper functions used across subjects +- Mock servers and fixtures +- Schema validation utilities +- Test infrastructure + +### Adding test helpers + +1. Create a file in `src/tests/helpers/` +2. Export reusable functions +3. Document usage with JSDoc +4. Add tests for the helper itself +5. Import in test files across subjects + +### CI integration + +Tests run automatically in GitHub Actions: +- On pull requests +- On pushes to main +- Various test suites in parallel for speed + +See `.github/workflows/` for CI configuration. + +### Troubleshooting + +**Tests fail with missing build:** +Run `npx next build` before tests. + +**Tests hang or timeout:** +Check if server started correctly. Use `DEBUG_MIDDLEWARE_TESTS=true`. -See [Fixture content](src/fixtures/README.md). +**Flaky tests:** +- Check for race conditions +- Ensure proper test isolation +- Verify mocks are properly reset -## Headless tests with Playwright +**Mock server issues:** +Check `src/tests/mocks/start-mock-server.ts` is running and configured correctly. -See [Headless tests with Playwright](src/fixtures/PLAYWRIGHT.md) diff --git a/src/types/README.md b/src/types/README.md new file mode 100644 index 000000000000..2104f5f92a1c --- /dev/null +++ b/src/types/README.md @@ -0,0 +1,118 @@ +# Types + +The **types** subject provides centralized TypeScript type definitions used throughout the docs.github.com codebase. This includes both application-specific types and TypeScript declaration files (`.d.ts`) for third-party libraries that lack native TypeScript support. + +## Purpose & Scope + +This subject is responsible for: +- Defining core types for the application (`Context`, `Page`, `ExtendedRequest`, etc.) +- Providing TypeScript definitions for third-party libraries without official types +- Maintaining frontmatter schema types that align with our validation logic +- Exporting shared types for consistent use across all subjects + +**Note**: The types defined here are consumed by nearly every subject in the codebase. Changes to core types like `Context` or `Page` can have wide-reaching impacts. + +## Architecture & Key Assets + +### Key capabilities and their locations + +- **`types.ts`**: The primary file containing all application-specific TypeScript types and interfaces. This is manually maintained and includes: + * `Context` - Request context object extended throughout middleware + * `Page` - Page object with content, metadata, and rendering methods + * `ExtendedRequest` - Express Request with custom properties + * `PageFrontmatter` - Frontmatter schema type aligned with validation + * `Site`, `Tree`, `SiteTree` - Site structure and navigation types + * `Version`, `AllVersions` - Version-related types + * Many more domain-specific types + +- **`index.ts`**: Simple re-export module for backward compatibility. Imports should use `@/types` which resolves to this file. + +- **`.d.ts` files**: TypeScript declaration files for third-party libraries that don't provide their own types. These allow TypeScript to type-check usage of these libraries throughout the codebase. + +## Setup & Usage + +### Importing types + +Use the absolute import path with the `@/types` alias: + +```typescript +import type { Context, Page, ExtendedRequest } from '@/types' +``` + +### Adding a new application type + +1. Add the type definition to `types.ts` +2. Export it if it should be available to other subjects +3. Add JSDoc comments to explain complex types +4. Consider if the type should be co-located with a specific subject instead + +### Adding a declaration file for a third-party library + +1. Create a new `.d.ts` file named after the package (e.g., `package-name.d.ts`) +2. Declare the module and its exports with appropriate types +3. Use `any` sparingly, but it's acceptable when the library structure is truly dynamic +4. Add comments explaining why types are using `any` if necessary + +Example: +```typescript +declare module 'some-package' { + export function someFunction(param: string): void + export interface SomeType { + property: string + } +} +``` + +## Data & External Dependencies + +### Type sources +- **Frontmatter schema**: `PageFrontmatter` type is manually maintained to align with the AJV schema in `src/frame/lib/frontmatter.ts` +- **Third-party libraries**: Declaration files provide types for libraries without native TypeScript support +- **Domain models**: Types reflect the structure of content files, site tree, version data, etc. + +### Dependencies +- **TypeScript compiler**: All types are processed during the TypeScript compilation step +- **Subject imports**: Types import from specific subjects (e.g., `@/landings/types`, `@/versions/lib/enterprise-server-releases.d`) +- **Express types**: `ExtendedRequest` extends Express's `Request` type + +### Type consumers +Nearly every subject in `src/` imports types from this directory. Common consumers include: +- Middleware (frame, versions, languages, landings, etc.) +- Rendering logic (content-render, landings) +- Content linter rules +- API routes and scripts + +## Cross-links & Ownership + +### Related subjects +- **[`src/frame`](../frame/README.md)**: Defines frontmatter validation schema that aligns with `PageFrontmatter` type +- **[`src/content-render`](../content-render/README.md)**: Uses `Context`, `Page` types extensively for rendering +- **[`src/content-linter`](../content-linter/README.md)**: Uses declaration files for markdownlint libraries +- **All subjects**: Nearly every subject imports types from this directory + +### Ownership +- **Team**: Docs Engineering + +## Current State & Next Steps + +### Known limitations +- **Manual maintenance**: `PageFrontmatter` type must be manually kept in sync with `src/frame/lib/frontmatter.ts` schema + * We don't auto-generate from the schema because: (1) it's dynamically constructed with version-specific properties, (2) build tooling complexity, (3) manual control provides better documentation +- **Wide-reaching changes**: Modifications to core types like `Context` or `Page` affect many subjects +- **Third-party types**: Declaration files require updates when upgrading the corresponding packages + +### Type coverage goals +- Continue adding declaration files as new third-party libraries are introduced +- Consider moving subject-specific types to their respective subject directories (e.g., journey types could move to `src/journeys/types.ts`) +- Improve JSDoc comments on complex types for better IDE experience + +### Testing approach +- Types are validated during `npm run tsc` (TypeScript compilation) +- No runtime tests exist for types themselves +- Breaking type changes are caught by TypeScript errors in consuming code + +### Contribution guidance +- **For new types**: Consider whether the type belongs here (shared across subjects) or in a specific subject directory +- **For type changes**: Search for usage across the codebase (`grep -r "TypeName" src/`) to assess impact +- **For declaration files**: Match the package name and version you're typing +- **Style**: Use `type` for simple aliases, `interface` for objects that may be extended diff --git a/src/versions/README.md b/src/versions/README.md index 24f3e01ed740..8644a5cb96f7 100644 --- a/src/versions/README.md +++ b/src/versions/README.md @@ -1,5 +1,156 @@ # Versions -Product oriented code to handle versions of the Docs, such as Enterprise Cloud and Enterprise Server. +The versions subject handles product versioning for GitHub Docs, including Free/Pro/Team (FPT), Enterprise Cloud (GHEC), and Enterprise Server (GHES). It provides version detection, resolution, feature flags, and version-aware content rendering. + +## Purpose & Scope + +This subject is responsible for: +- Defining all available product versions (plans and releases) +- Detecting current version from URL paths +- Providing version-aware Liquid conditionals (e.g., `{% if ghes %}`) +- Managing feature flags that vary by version +- Version resolution for content applicability +- Version picker UI component +- Deprecation banners for old versions + +Related subjects: +- `src/archives/` - Handles archived versions of documentation +- `src/ghes-releases/` - Manages GHES release and deprecation processes + +## Architecture & Key Assets + +### Key capabilities and their locations + +- `lib/all-versions.ts` - `allVersions` object: Defines all version plans (fpt, ghec, ghes) with releases +- `lib/enterprise-server-releases.ts` - GHES version data: supported, deprecated, latest releases +- `lib/get-applicable-versions.ts` - `getApplicableVersions()`: Determines which versions apply to content based on frontmatter +- `middleware/short-versions.ts` - Adds version shortcuts (e.g., `ghes`, `fpt`) to `req.context` +- `middleware/features.ts` - Loads feature flags from `data/features/` and adds to context +- `components/VersionPicker.tsx` - UI component for switching between versions + +## Setup & Usage + +### Version structure + +Versions follow the format: `plan@release` +- FPT: `free-pro-team@latest` (stripped from URLs) +- GHEC: `enterprise-cloud@latest` +- GHES: `enterprise-server@3.11`, `enterprise-server@3.10`, etc. + +### Using versions in Liquid + +Middleware adds version shortcuts to context for use in Liquid templates: + +```liquid +{% if fpt %} +This content only appears for Free/Pro/Team. +{% endif %} + +{% if ghes %} +This content appears for all GHES versions. +{% endif %} + +{% if ghes > 3.9 %} +This content appears for GHES 3.10 and later. +{% endif %} +``` + +### Feature flags + +Feature flags in `data/features/*.yml` control content visibility: + +```yaml +# data/features/my-feature.yml +versions: + fpt: '*' + ghec: '*' + ghes: '>= 3.10' +``` + +In Liquid: +```liquid +{% if my-feature %} +This content only shows when my-feature is enabled. +{% endif %} +``` + +### Version frontmatter + +Content files specify applicable versions in frontmatter: + +```yaml +versions: + fpt: '*' + ghec: '*' + ghes: '>= 3.8' +``` + +### Running tests + +```bash +npm run test -- src/versions/tests +``` + +## Data & External Dependencies + +### Data inputs +- `data/features/*.yml` - Feature flag definitions +- `lib/enterprise-server-releases.ts` - GHES version data +- Content frontmatter - `versions` field specifies applicable versions +- URL paths - Version extracted from path (e.g., `/enterprise-server@3.11/`) + +### Dependencies +- [`@/frame`](../frame/README.md) - Path utilities for version extraction +- [`@/data-directory`](../data-directory/README.md) - Loads feature flag data +- [`@/archives`](../archives/README.md) - Archived version handling +- [`@/ghes-releases`](../ghes-releases/README.md) - GHES release management + +### Data outputs +- `req.context.currentVersion` - String like `enterprise-server@3.11` +- `req.context.currentVersionObj` - Full version object with metadata +- `req.context[shortName]` - Boolean flags: `fpt`, `ghec`, `ghes` +- `req.context[featureName]` - Boolean flags for each feature +- `req.context.allVersions` - All available versions + +## Cross-links & Ownership + +### Related subjects +- [`src/archives`](../archives/README.md) - Handles archived/deprecated version proxying +- [`src/ghes-releases`](../ghes-releases/README.md) - GHES release notes and deprecation +- [`src/frame`](../frame/README.md) - Path utilities used for version detection +- [`src/redirects`](../redirects/README.md) - Version-aware redirects + +### Ownership +- Team: Docs Engineering + +## Current State & Next Steps + +### Version fallback hierarchy + +When no version in URL, fallback order: FPT → GHEC → GHES latest. Implemented in `lib/redirects/permalinks.ts`. + +### GHES versioning + +- Supported versions defined in `enterprise-server-releases.ts` +- New GHES releases added via scripts in `src/ghes-releases/scripts/` +- Deprecated versions archived via `src/archives/` + +### Known limitations +- FPT version is stripped from URLs but exists internally +- Feature flag data loaded on every request (cached per version) +- Version comparison logic only supports GHES numbered releases +- REST API versions handled separately via config in `src/rest/lib/config.json` + +### Adding a new GHES version + +1. Update `src/versions/lib/enterprise-server-releases.ts` +2. Run GHES release scripts (see `src/ghes-releases/scripts/`) +3. Update REST API config if needed +4. Create release notes in `data/release-notes/` + +### Deprecating a GHES version + +1. Move version from `supported` to `deprecated` in `enterprise-server-releases.ts` +2. Run archive scripts to freeze content (see `src/archives/`) +3. Update redirects as needed -The directory `archives/` handles archived version of the docs. `ghes-releases/` handles version releases and deprecations processes.