Skip to content

Conversation

@tylerbutler
Copy link
Member

@tylerbutler tylerbutler commented Oct 30, 2025

Fluid Build Cache Implementation

This PR implements a shared file system build cache for FluidFramework to significantly improve build times by caching and reusing build artifacts across different repository clones on the same machine.

Overview

The shared build cache allows build tasks to:

  • Cache outputs to a shared file system location (remote to the specific repo but local to the machine)
  • Restore cached outputs when inputs haven't changed
  • Skip redundant builds across different repository clones on the same machine
  • Track time saved through cache hits

Note: While the cache is "remote" from the perspective of an individual repository, it currently uses a shared file system cache on the local machine. Future expansion may include truly remote storage options like Azure Blob Storage.

Key Features

Cache Infrastructure

  • Cache configuration validation - Ensures proper setup before using the cache
  • Debug logging and build output - Comprehensive logging for troubleshooting
  • Cache management commands - CLI commands for cache operations (clean, stats, etc.)
  • Global cache key components - Centralized cache key generation based on file hashes, dependencies, and environment

Task Support

Added cache support for multiple task types:

  • BiomeTask (linting)
  • ApiExtractorTask (API extraction)
  • TypeScript compilation tasks
  • Generate entrypoints task

User Experience

  • Visual status indicators - Clear symbols showing cache hit/miss/upload status
  • Interactive legend - Explains cache status symbols during builds
  • Time saved tracking - Displays cumulative time saved from cache hits (formatted as hours:minutes:seconds)
  • Detailed failure reasons - Specific messages for cache write failures
  • Ctrl-C handling - Graceful shutdown on interruption
  • Smart warnings - Suppresses unnecessary warnings for tasks with no expected outputs

Performance

  • Always includes donefile in remote cache outputs for consistency
  • Optimized cache key generation
  • Efficient artifact storage and retrieval

Benchmarks

Benchmark results are included showing significant build time improvements with cache enabled across various packages.

Testing

The implementation has been tested with:

  • Multiple build scenarios (clean builds, incremental builds, cache hits)
  • Different task types
  • Error handling and edge cases
  • Cache invalidation scenarios

Implements Task 4.2 of the shared cache feature, adding comprehensive
configuration validation for cache directory setup.

New module: configValidation.ts provides:
- validateCacheDirectory(): checks path validity, system directory protection
- ensureCacheDirectoryExists(): creates cache directory with recursive option
- validateCacheDirectoryPermissions(): verifies read/write/execute access
- validateDiskSpace(): warns on low disk space (Unix systems)
- validateCacheConfiguration(): orchestrates all validation checks
- formatValidationMessage(): produces user-friendly error messages

Integration:
- SharedCacheManager.initialize() now validates before setup
- Validation errors prevent cache initialization but don't break builds
- Warnings logged but don't prevent operation

Testing:
- 26 comprehensive unit tests in configValidation.test.ts
- Platform-specific tests skip appropriately
- Tests cover path validation, permissions, directory creation, error formatting

Updates:
- Added BuildResult.CachedSuccess case to buildResultString()
- Updated IMPLEMENTATION_STATUS.md: Phase 4 now 33% complete (26/38 tasks)
Add comprehensive debug logging and build output messages for the shared
cache implementation.

Debug Logging:
- Implemented 6 debug trace namespaces using debug package
  - fluid-build:cache:init - Initialization and validation
  - fluid-build:cache:lookup - Cache lookups with hit/miss reasons
  - fluid-build:cache:store - Storage operations with timing
  - fluid-build:cache:restore - Restoration operations
  - fluid-build:cache:stats - Statistics tracking
  - fluid-build:cache:error - Error logging with context
- All operations include timing, file counts, and sizes
- Short cache keys (first 12 chars) for readability
- Detailed mismatch reasons (platform, Node version, lockfile)

Build Output:
- Cache hits display with magenta ↻ symbol (BuildResult.CachedSuccess)
- Cache statistics summary at end of build showing:
  - Hit/miss counts and hit rate percentage
  - Total cache entries and size in MB
- Summary only shown if cache was used (totalLookups > 0)
- Integrated via BuildGraph.cacheStatsSummary property

Documentation:
- Created DEBUG_LOGGING.md with usage examples, example output,
  performance analysis guidance, and troubleshooting tips

Tasks completed: 4.3 (Debug Logging), 4.4 (Build Output Messages)
Add comprehensive cache management functionality with four new commands
for maintaining and troubleshooting the shared cache.

Cache Management Commands:
- --cache-stats: Display cache statistics (hits, misses, size, timing)
- --cache-clean: Remove all cache entries and reset statistics
- --cache-prune: LRU-based pruning with size/age thresholds
- --cache-verify: Verify cache integrity with optional auto-fix

Implementation:
- Added displayStatistics() method showing hit rate, cache size, and
  performance metrics
- Added cleanCache() method for complete cache reset
- Added pruneCache() method with configurable thresholds:
  - Default max size: 5000 MB (5 GB)
  - Default max age: 30 days
  - Sorts by last access time and removes oldest first
- Added verifyCache() method for integrity checks:
  - Validates manifests and file hashes
  - Optional fix mode removes corrupted entries
  - Reports total/valid/corrupted/fixed counts

CLI Integration:
- All commands exit immediately after execution
- Require --cache-dir to be specified
- Optional flags: --cache-prune-size, --cache-prune-age, --cache-verify-fix
- Comprehensive error handling with user-friendly messages

Documentation:
- Created CACHE_MANAGEMENT.md with usage examples, performance
  recommendations, CI/CD integration patterns, and troubleshooting guide

All code formatted with Biome and TypeScript compilation verified.

Tasks completed: 4.5 (Cache Management Commands)
Implements getCacheInputFiles() and getCacheOutputFiles() methods in base
task classes to enable shared cache support for multiple task types.

Changes:
- LeafWithFileStatDoneFileTask: Added cache methods that leverage existing
  getInputFiles()/getOutputFiles() methods. This automatically enables
  caching for BiomeTask, CopyfilesTask, TypeValidationTask, DepCruiseTask,
  and Ts2EsmTask.

- TscDependentTask: Added cache methods that include config files and
  TypeScript source files from dependent tsc tasks. Enables caching for
  ApiExtractorTask, EsLintTask, TsLintTask, and GenerateEntrypointsTask.

- ApiExtractorTask: Added output file detection for API extractor-specific
  outputs (*.api.md, *.api.json, *.d.ts files).

- DeclarativeTask: Removed duplicate cache implementation (44 lines) since
  it now inherits identical functionality from LeafWithFileStatDoneFileTask.

Impact:
- 14 out of 20 task types now support caching (70%)
- Net change: +97 lines added, -44 lines removed (+53 total)
- All tests pass (216 passing)

Tasks now cached:
- BiomeTask (NEW)
- ApiExtractorTask (NEW)
- EsLintTask, TsLintTask (NEW)
- CopyfilesTask, TypeValidationTask, DepCruiseTask, Ts2EsmTask (NEW)
- GenerateEntrypointsTask (NEW)
- DeclarativeLeafTask (already cached, now cleaner)
- TscTask (already cached)
- Centralized global cache key components into GlobalCacheKeyComponents interface
- Added cacheSchemaVersion (1) for cache format versioning
- Added arch (process.arch) to handle x64/arm64 differences
- Added nodeEnv (NODE_ENV) to distinguish dev/prod builds
- Added cacheBustVars to support FLUID_BUILD_CACHE_BUST* env vars for manual cache invalidation

Components are computed once at startup in fluidBuild.ts and accessed via
SharedCacheManager.getGlobalKeyComponents() throughout the codebase.

All cache validation now checks these fields for proper cache hit/miss detection.
Tests updated to include new required fields.
- Change UpToDate symbol from - (dash) to ○ (empty circle, cyan)
- Change remote cache hit symbol from ↻ (reload) to ⇩ (down arrow, blue)
- Change cache write symbol from ✓+ to ⇧ (up arrow, green)
- Change local cache hit symbol from ⌂ (house) to ■ (filled square, green)
- Add status symbol legend displayed after build completion
- Legend shows all status symbols with descriptions
- Only displayed when tasks were actually built
- Create buildStatusSymbols.ts with STATUS_SYMBOLS constant
- Import and use constants in leafTask.ts and fluidBuild.ts
- Add JSDoc comments documenting each symbol's meaning
- Makes symbols easy to change in one place
- Add timeSavedMs field to CacheStatistics
- Track time saved on each cache hit using manifest.executionTimeMs
- Display time saved in cache stats summary
- Update legend to compact table format (3 rows, 2 columns)
- Update tests to include timeSavedMs field
- Add warning when no input files can be determined
- Add warning when no output files can be determined
- Add warning when no output files exist after execution
- Add warning when skipCacheWrite is enabled
- Improve error messages in store() with specific reasons:
  - Disk full (ENOSPC)
  - Permission denied (EACCES/EPERM)
  - Cache directory not found (ENOENT)
  - Directory write errors (EISDIR)
- Include package name in all cache write warnings
- Only warn when output files were expected but missing
- Lint tasks (eslint, tslint, biome) have no output files by design
- Prevents noise in build output for normal lint task behavior
- Still warns for build tasks that should produce files but don't
- Donefile is now cached along with regular output files
- Enables sharing lint/build results across workspaces
- ESLint results can now be cached remotely (donefile proves pass)
- Works uniformly for all tasks - no special cases
- Minimal overhead (donefiles are tiny)

This allows Workspace A to run eslint and upload the donefile,
then Workspace B can skip eslint entirely by downloading it.
Result: Remote cache hit (⇩) instead of local-only (■)
- Display time saved in human-readable format
- Shows hours if >= 1 hour (e.g., '2h 15m 30s')
- Shows minutes if >= 1 minute (e.g., '5m 42s')
- Shows only seconds if < 1 minute (e.g., '45s')

Example outputs:
  '12.5s saved' -> '12s saved'
  '125s saved' -> '2m 5s saved'
  '7325s saved' -> '2h 2m 5s saved'
- Add StoreResult type to communicate cache write success/failure with reasons
- Display cache skip reasons in task status output (e.g., 'cache not uploaded: disk full')
- Fix lint tasks (eslint/tslint) to properly declare empty output files instead of undefined
- Track whether cache lookup was performed to ensure accurate statistics
- Count tasks that execute without lookup as misses (hitCount + missCount now equals leafBuiltCount)
- Add lookupWasPerformed parameter to store() to handle tasks that can't perform lookup

This ensures cache statistics accurately reflect all built tasks and provides
better visibility into why cache uploads are skipped.
Copilot AI review requested due to automatic review settings October 30, 2025 19:30
@github-actions github-actions bot added area: build Build related issues base: main PRs targeted against main branch and removed area: build Build related issues labels Oct 30, 2025
@tylerbutler tylerbutler changed the title fluid build cache feat(fluid-build): add shared file system caching Oct 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements a shared cache system for fluid-build that enables caching and reusing task outputs across build invocations, dramatically reducing build times for repeated builds with identical inputs.

Key Changes:

  • Introduces a comprehensive shared cache system with cache key computation, file operations, manifest management, and statistics tracking
  • Adds support for storing and restoring task outputs with SHA-256 integrity verification
  • Integrates cache operations into the task execution flow with automatic lookup, restore, and storage
  • Implements test scripts and performance benchmarks showing ~3.26x speedup on container-runtime builds

Reviewed Changes

Copilot reviewed 65 out of 65 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
build-tools/scripts/test-cache-key-stability.ts Standalone test script to validate cache key determinism and consistency
build-tools/scripts/baseline-metrics.sh Bash script for measuring baseline build performance metrics
build-tools/packages/build-tools/src/test/sharedCache/*.test.ts Comprehensive test suites for cache components
build-tools/packages/build-tools/src/fluidBuild/sharedCache/*.ts Core cache implementation modules
build-tools/packages/build-tools/src/fluidBuild/tasks/**/*.ts Integration of cache operations into task execution
build-tools/packages/build-tools/src/fluidBuild/*.ts Build result enhancements and context updates
benchmark-results-container-runtime.md Performance benchmark results documentation

async store(
inputs: CacheKeyInputs,
outputs: TaskOutputs,
packageRoot: string, // eslint-disable-line @typescript-eslint/no-unused-vars
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The packageRoot parameter is marked as unused with an eslint-disable comment. If this parameter is truly unused and reserved for future use, consider documenting why it's included in a comment, or remove it if it's not needed. Having unused parameters can be confusing for maintainers.

Copilot uses AI. Check for mistakes.

import { createHash } from "node:crypto";
import { readFileSync } from "node:fs";
import { join } from "node:path";
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import join.

Copilot uses AI. Check for mistakes.
for name, data in results.items():
if 'without-cache' in name or 'cold' in name or 'full' in name:
uncached = data
uncached_name = name
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable uncached_name is not used.

Suggested change
uncached_name = name

Copilot uses AI. Check for mistakes.
uncached_name = name
elif 'with-cache' in name or 'warm' in name or 'no-change' in name:
cached = data
cached_name = name
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable cached_name is not used.

Suggested change
cached_name = name

Copilot uses AI. Check for mistakes.
Comment on lines +193 to +194
bars1 = ax.bar(x - width/2, data[:, 0], width, label='With Cache', color='#2ecc71')
bars2 = ax.bar(x + width/2, data[:, 1], width, label='Without Cache', color='#e74c3c')
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable bars1 is not used.

Suggested change
bars1 = ax.bar(x - width/2, data[:, 0], width, label='With Cache', color='#2ecc71')
bars2 = ax.bar(x + width/2, data[:, 1], width, label='Without Cache', color='#e74c3c')
ax.bar(x - width/2, data[:, 0], width, label='With Cache', color='#2ecc71')
ax.bar(x + width/2, data[:, 1], width, label='Without Cache', color='#e74c3c')

Copilot uses AI. Check for mistakes.
Comment on lines +193 to +194
bars1 = ax.bar(x - width/2, data[:, 0], width, label='With Cache', color='#2ecc71')
bars2 = ax.bar(x + width/2, data[:, 1], width, label='Without Cache', color='#e74c3c')
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable bars2 is not used.

Suggested change
bars1 = ax.bar(x - width/2, data[:, 0], width, label='With Cache', color='#2ecc71')
bars2 = ax.bar(x + width/2, data[:, 1], width, label='Without Cache', color='#e74c3c')
ax.bar(x - width/2, data[:, 0], width, label='With Cache', color='#2ecc71')
ax.bar(x + width/2, data[:, 1], width, label='Without Cache', color='#e74c3c')

Copilot uses AI. Check for mistakes.
"""

import json
import sys
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'sys' is not used.

Suggested change
import sys

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added the area: build Build related issues label Oct 30, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

🔗 Found some broken links! 💔

Run a link check locally to find them. See
https://github.com/microsoft/FluidFramework/wiki/Checking-for-broken-links-in-the-documentation for more information.

linkcheck output


> [email protected] ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> [email protected] serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> [email protected] check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

 ELIFECYCLE  Command failed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: build Build related issues base: main PRs targeted against main branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant