Skip to content
102 changes: 97 additions & 5 deletions shared-actions/setup-node-with-cache/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Standardized Node.js setup with enhanced yarn caching and GitHub packages regist
- ✅ Restore-keys for fallback caching (85%+ hit rate)
- ✅ GitHub packages registry configuration
- ✅ Automatic dependency installation on cache miss
- ✅ **Cache integrity verification** to prevent stale cache issues
- ✅ **Comprehensive debug logging** for troubleshooting
- ✅ Support for local testing with `act`

## Usage
Expand Down Expand Up @@ -46,8 +48,18 @@ None
- `~/.cache/yarn` - Yarn global cache (always enabled for better performance)
- `~/.asdf/installs` - asdf tool installations (when using asdf)
5. **Uses restore-keys** for fallback caching when exact match not found
6. **Logs cache status** for visibility (HIT/MISS)
7. **Installs dependencies** automatically on cache miss
6. **Verifies cache integrity** on cache hit:
- Checks if `node_modules/` exists and has content
- Validates `.yarn-integrity` file presence
- Verifies workspace packages have `node_modules/`
- Forces fresh install if cache is incomplete or corrupted
7. **Logs cache status** with detailed debug information
8. **Installs dependencies** automatically on cache miss or verification failure
9. **Smart install detection** on cache hit (after verification):
- **Yarn workspaces**: ALWAYS runs install (needs workspace symlink creation)
- **Lerna monorepos**: Runs install (needs lerna bootstrap)
- **Postinstall hooks**: Runs install (needs hook execution)
- **Note**: Even Turbo monorepos need install for workspace linking

## Cache Strategy

Expand All @@ -68,17 +80,59 @@ This ensures:
- **Monorepo support**: `**/yarn.lock` pattern handles nested workspaces
- **Consistent keys**: Removed `.tool-versions` dependency for reliability

### Cache Integrity Verification

On cache hit, the action performs **automatic verification** to prevent stale cache issues:

1. **node_modules existence check**: Verifies directory exists and has packages
2. **Yarn integrity validation**: Checks for `.yarn-integrity` file
3. **Workspace structure validation**: Ensures workspace packages have `node_modules/`

If any verification fails, the action **forces a fresh install** to rebuild the cache correctly.

**Why this matters**: Prevents issues where:
- Cache is restored but incomplete (network interruption during save)
- Workspace dependencies added but not in cached `node_modules/`
- Cache corruption or partial restoration

### Monorepo Handling

The action intelligently handles different monorepo types (after cache verification):

| Monorepo Type | Cache Hit Behavior | Reason |
|---------------|-------------------|---------|
| **Yarn workspaces** | ⚠️ ALWAYS runs install | Workspace symlinks NOT preserved in cache |
| **Lerna** (has `lerna.json`) | ⚠️ Runs install | Needs `lerna bootstrap` for package linking |
| **Postinstall hooks** | ⚠️ Runs install | Needs to execute postinstall scripts |
| **Failed verification** | ⚠️ Runs install | Cache incomplete or corrupted |

**Critical: Yarn Workspace Symlinks**

Even Turbo monorepos need `yarn install` on cache hit because:
- GitHub Actions cache does NOT preserve symlinks
- Yarn creates symlinks between workspace packages during install
- Without symlinks, workspace dependencies aren't found (e.g., "jest: not found")
- Turbo handles BUILD caching, not workspace linking

**Performance impact**:
- Adds ~10-20 seconds to cache hits for workspace symlink creation
- This is unavoidable for Yarn workspace monorepos
- The install is fast because packages are already cached

## Performance Impact

| Scenario | Before | After | Improvement |
|----------|--------|-------|-------------|
| Cache hit (setup-node) | 2-3 min | 10-15 sec | 85% faster |
| Cache hit (asdf) | 2-3 min | 5-10 sec | 90% faster |
| Cache hit (setup-node) | 2-3 min | 30-40 sec | 75% faster |
| Cache hit (asdf) | 2-3 min | 25-35 sec | 80% faster |
| Cache miss | 2-3 min | 1-2 min | 40% faster (with yarn cache) |
| Cache hit rate | 60% | 85%+ | 42% better |
| asdf Node.js install | 1-2 min | 5 sec | 95% faster (when cached) |

**Note**: When using asdf-vm, the action caches both the asdf installations and node_modules, providing even better performance.
**Note**:
- Cache hits for Yarn workspaces include ~10-20s for workspace symlink creation
- When using asdf-vm, the action caches both the asdf installations and node_modules
- The install on cache hit is fast because packages are already cached

## Local Testing with act

Expand Down Expand Up @@ -157,13 +211,51 @@ If you see "❌ Cache MISS" on every run:
- Look for "✅ Cache HIT" or "❌ Cache MISS" in workflow logs
- Check the cache key being used

### Cache Hits But Still Runs Install?

If you see cache hit but install still runs, check the debug logs:

1. **Cache verification failed**:
```
⚠️ node_modules appears empty - forcing install
⚠️ Missing .yarn-integrity file - forcing install
⚠️ Workspace packages exist but no workspace node_modules found - forcing install
```
This means the cache was incomplete. The action will rebuild it correctly.

2. **Monorepo requires install**:
```
📦 Detected Yarn workspaces - install needed for workspace linking
📦 Detected Lerna monorepo - install needed for lerna bootstrap
🔧 Detected postinstall hook - install needed to execute it
```
This is expected behavior for Yarn workspace monorepos (including Turbo-based ones).
The install recreates workspace symlinks that aren't preserved in cache.

3. **Review debug output**:
The action logs detailed cache information:
- Root `node_modules/` package count
- Yarn integrity file status
- Workspace package count
- Workspace `node_modules/` count

### Still Slow After Cache Hit?

If cache hits but setup still takes >1 minute:

1. **Large node_modules**: Consider using artifacts instead of cache
2. **Slow runner disk**: Check runner performance
3. **Network latency**: Cache download may be slow
4. **Verification forcing install**: Check debug logs for verification failures

### Stale Cache Issues?

If builds fail with "Cannot find module" after cache hit:

1. **Check debug logs** for cache verification results
2. **Manually delete old caches** at: `Settings → Actions → Caches`
3. **The action should auto-detect** incomplete caches and rebuild them
4. **If issue persists**, open an issue with the debug logs

## Related Actions

Expand Down
133 changes: 113 additions & 20 deletions shared-actions/setup-node-with-cache/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,44 @@ runs:
restore-keys: |
${{ runner.os }}-${{ runner.arch }}-yarn-

- name: Debug cache contents
id: debug-cache
if: ${{ !env.ACT && steps.yarn-cache.outputs.cache-hit == 'true' }}
shell: bash
run: |
set +e # Don't exit on error - we want to continue even if find fails

echo "=== 🔍 Cache Debug Info ==="
echo "Cache key: ${{ runner.os }}-${{ runner.arch }}-yarn-${{ hashFiles('**/yarn.lock', '**/package.json') }}"
echo ""

echo "=== 📦 Root node_modules Status ==="
if [ -d "node_modules" ]; then
MODULE_COUNT=$(ls -1 node_modules 2>/dev/null | wc -l | tr -d ' ')
echo "✅ node_modules exists"
echo "📊 Package count: $MODULE_COUNT"
echo "🔒 Yarn integrity: $([ -f 'node_modules/.yarn-integrity' ] && echo '✅ present' || echo '❌ missing')"
else
echo "❌ node_modules directory not found"
fi
echo ""

echo "=== 🏢 Workspace Structure ==="
# Use find with || true to prevent failures when directories don't exist
WORKSPACE_PACKAGES=$(find packages/*/package.json apps/*/package.json -type f 2>/dev/null || true | wc -l | tr -d ' ')
WORKSPACE_MODULES=$(find packages/*/node_modules apps/*/node_modules -maxdepth 0 -type d 2>/dev/null || true | wc -l | tr -d ' ')
echo "📦 Workspace packages: $WORKSPACE_PACKAGES"
echo "🔗 Workspace node_modules: $WORKSPACE_MODULES"

if [ "$WORKSPACE_PACKAGES" -gt 0 ]; then
echo ""
echo "Workspace packages found:"
find packages/*/package.json apps/*/package.json -type f 2>/dev/null || true | sed 's|/package.json||' | sed 's|^| - |'
fi
echo ""

set -e # Re-enable exit on error

- name: Check if yarn install needed on cache hit
id: check-install-needed
if: ${{ !env.ACT }}
Expand All @@ -94,48 +132,103 @@ runs:
# - Cache restores node_modules, but doesn't run postinstall hooks
# - Monorepos need workspace linking even with cached node_modules
# - Some projects have critical postinstall scripts (e.g., building native modules)
# - Cached node_modules might be incomplete or corrupted
#
# We detect three scenarios that require yarn install on cache hit:
# We detect scenarios that require yarn install on cache hit:

NEEDS_INSTALL=false
CACHE_HIT="${{ steps.yarn-cache.outputs.cache-hit }}"

# 1. Yarn workspaces: Need symlink creation between workspace packages
if grep -q '"workspaces"' package.json 2>/dev/null; then
echo "📦 Detected Yarn workspaces - install needed for workspace linking"
NEEDS_INSTALL=true
fi

# 2. Lerna monorepo: Need lerna bootstrap (usually in postinstall hook)
if [ -f "lerna.json" ]; then
echo "📦 Detected Lerna monorepo - install needed for lerna bootstrap"
NEEDS_INSTALL=true
# On cache hit, verify cache integrity before trusting it
if [ "$CACHE_HIT" == "true" ]; then
echo "=== 🔍 Verifying Cache Integrity ==="

# Verification 1: Check if node_modules exists and has content
if [ ! -d "node_modules" ]; then
echo "❌ node_modules directory missing - forcing install"
NEEDS_INSTALL=true
else
MODULE_COUNT=$(ls -1 node_modules 2>/dev/null | wc -l | tr -d ' ')
if [ "$MODULE_COUNT" -lt 5 ]; then
echo "⚠️ node_modules appears empty ($MODULE_COUNT packages) - forcing install"
NEEDS_INSTALL=true
fi
fi

# Verification 2: Check yarn integrity file
if [ "$NEEDS_INSTALL" == "false" ] && [ ! -f "node_modules/.yarn-integrity" ]; then
echo "⚠️ Missing .yarn-integrity file - cache may be incomplete, forcing install"
NEEDS_INSTALL=true
fi

# Verification 3: For workspaces, verify workspace packages have node_modules
if [ "$NEEDS_INSTALL" == "false" ]; then
WORKSPACE_PACKAGES=$(find packages/*/package.json apps/*/package.json -type f 2>/dev/null || true | wc -l | tr -d ' ')
if [ "$WORKSPACE_PACKAGES" -gt 0 ]; then
WORKSPACE_MODULES=$(find packages/*/node_modules apps/*/node_modules -maxdepth 0 -type d 2>/dev/null || true | wc -l | tr -d ' ')
if [ "$WORKSPACE_MODULES" -eq 0 ]; then
echo "⚠️ Workspace packages exist but no workspace node_modules found - forcing install"
NEEDS_INSTALL=true
fi
fi
fi

if [ "$NEEDS_INSTALL" == "false" ]; then
echo "✅ Cache integrity verified"
fi
fi

# 3. Postinstall hook: Any project with postinstall needs it executed
# Examples: building native modules, generating files, running setup scripts
if grep -q '"postinstall"' package.json 2>/dev/null; then
echo "🔧 Detected postinstall hook - install needed to execute it"
NEEDS_INSTALL=true
# Check for monorepo configurations that need install
if [ "$NEEDS_INSTALL" == "false" ]; then
# 1. Yarn workspaces: ALWAYS need symlink creation between workspace packages
# CRITICAL: Workspace symlinks are NOT preserved in GitHub Actions cache!
# Even Turbo monorepos need this - Turbo handles build caching, not workspace linking.
if grep -q '"workspaces"' package.json 2>/dev/null; then
echo "📦 Detected Yarn workspaces - install needed for workspace linking"
echo " ⚠️ Workspace symlinks are not preserved in cache"
echo " ⚠️ Skipping install will cause 'module not found' errors"
NEEDS_INSTALL=true
fi

# 2. Lerna monorepo: Need lerna bootstrap (usually in postinstall hook)
if [ -f "lerna.json" ]; then
echo "📦 Detected Lerna monorepo - install needed for lerna bootstrap"
NEEDS_INSTALL=true
fi

# 3. Postinstall hook: Any project with postinstall needs it executed
# Examples: building native modules, generating files, running setup scripts
if grep -q '"postinstall"' package.json 2>/dev/null; then
echo "🔧 Detected postinstall hook - install needed to execute it"
NEEDS_INSTALL=true
fi
fi

echo "needs-install=$NEEDS_INSTALL" >> $GITHUB_OUTPUT

- name: Log cache status
- name: Log cache status and decision
if: ${{ !env.ACT }}
shell: bash
run: |
echo "=== 📊 Cache Status Summary ==="
if [ "${{ steps.yarn-cache.outputs.cache-hit }}" == "true" ]; then
echo "✅ Cache HIT - Dependencies restored from cache"
echo "📦 Cache key: ${{ runner.os }}-${{ runner.arch }}-yarn-${{ hashFiles('**/yarn.lock', '**/package.json') }}"
echo ""
if [ "${{ steps.check-install-needed.outputs.needs-install }}" == "true" ]; then
echo "🔗 Will run yarn install for workspace linking and/or postinstall hooks"
echo "🔄 Decision: WILL run yarn install"
echo "Reasons: Cache verification failed, workspace linking needed, or postinstall hooks detected"
else
echo "⚡ Skipping yarn install - not a monorepo and no postinstall hooks"
echo "⚡ Decision: SKIP yarn install"
echo "Reason: Cache verified and no workspace linking or postinstall hooks required"
fi
else
echo "❌ Cache MISS - Installing dependencies"
echo "❌ Cache MISS - Fresh installation required"
echo "🔍 Looking for key: ${{ runner.os }}-${{ runner.arch }}-yarn-${{ hashFiles('**/yarn.lock', '**/package.json') }}"
echo ""
echo "🔄 Decision: WILL run yarn install"
fi
echo ""

- name: Install Node.js dependencies
if: ${{ !env.ACT && (steps.yarn-cache.outputs.cache-hit != 'true' || steps.check-install-needed.outputs.needs-install == 'true') }}
Expand Down