Skip to content

INFR: Migrate to github artifacts for gh-pages (deleting gh-pages branch) #282

@mmcky

Description

@mmcky

GitHub Pages Deployment Strategy: Analysis and Migration Guide

Prepared for QuantEcon Infrastructure Team
Date: January 2026
Updated: March 2026 (with lessons from lecture-python-programming migration)
Related Issues: #261


Migration Status Tracker

Repository Status PR Size Before Size After Date
lecture-python-programming Complete #499 358 MB 15 MB 2026-03-26
lecture-python.myst Not started
lecture-julia.myst Not started
lecture-python-intro Not started

Table of Contents

  1. Executive Summary
  2. Background
  3. Deployment Approaches Compared
  4. Pros and Cons Analysis
  5. Recommendation
  6. Technical Deep Dive
  7. Migration Guide
  8. Post-Migration Procedures
  9. Appendix: Complete Workflow Examples

1. Executive Summary

This report compares two approaches to GitHub Pages deployment:

  • Traditional gh-pages branch method (currently used by lecture-python.myst)
  • Modern artifact-based deployment with deployment environments (used by 2026-tom-course)

Decision: We recommend migrating to artifact-based deployment for all QuantEcon lecture repositories.

Key reasons:

  • Eliminates repository bloat from accumulated deployment history
  • Maintains fast clone times regardless of deployment frequency
  • Uses official GitHub-maintained actions with first-party support
  • Deployed site persists indefinitely (artifacts expiring does NOT affect the live site)

Migration impact: Zero downtime when following the documented procedure. The gh-pages branch and all its history can be safely deleted after migration, reclaiming significant repository space.

Proven results: lecture-python-programming migrated on 2026-03-26 — repo size reduced from 358 MB to 15 MB (96% reduction).


2. Background

2.1 Current State

lecture-python.myst uses the traditional approach:

  • peaceiris/actions-gh-pages@v4 pushes built HTML to a gh-pages branch
  • GitHub Pages serves content from this branch
  • Each deployment creates a new commit, accumulating history
  • Repository: 727+ commits on main, 182 releases

2026-tom-course uses the modern approach:

  • actions/upload-pages-artifact and actions/deploy-pages
  • Content deployed via GitHub's deployment environment system
  • No persistent branch; artifacts stored separately
  • Clean repository structure

2.2 The Problem with gh-pages Branches

The gh-pages branch bloat is a well-documented issue across the GitHub ecosystem:

Repository Reported Size Cause
Mozilla VPN Client 1.5 GB+ WASM binaries in gh-pages history
Eclipse Theia 2.6 GB API documentation history
Scratch GUI 2 GB+ Built JavaScript bundles
lecture-python-programming 358 MB Jupyter Book HTML deploy snapshots (153 commits)

For lecture repositories with:

  • Large built outputs (Jupyter Book HTML, CSS, JS)
  • Frequent updates
  • Binary files that don't delta compress well

This problem compounds over time, affecting every contributor's clone operation.


3. Deployment Approaches Compared

3.1 How Each Approach Works

gh-pages Branch Method:

Build → Commit to gh-pages branch → GitHub detects change → Serves content
        ↓
        Accumulates in git history forever

Artifact-Based Method:

Build → Upload artifact → Deploy to Pages infrastructure → Serves content
        ↓                         ↓
        Expires after 90 days     Persists indefinitely
        (configurable)            (until replaced)

3.2 Detailed Comparison

Aspect gh-pages Branch Artifact Deployment
Repository Size Grows with each deployment; can reach GB+ No impact; artifacts stored separately
Clone Time Degrades over time; gh-pages fetched by default Always fast; no extra branches
Site Persistence Permanent (in git history) Permanent (in Pages infrastructure)
Deployment History Full git history preserved Limited to artifact retention (~90 days)
Security Branch protection rules Environment protection + OIDC verification
Workflow Complexity Single action Multi-step: configure, upload, deploy
GitHub Support Third-party (peaceiris) Official GitHub actions
Rollback Speed Instant (git reset) Minutes (rebuild or artifact redeploy)
Inspect Deployed Files Yes (browse gh-pages branch) No (only via artifact download)
Initial Setup Minimal Requires environment protection rule configuration (one-time)

3.3 Critical Clarification: Artifact Expiration

Important: Artifact expiration does NOT affect the live site.

  • Workflow artifacts (downloadable from Actions tab) expire after retention period
  • Deployed site content persists indefinitely in GitHub Pages infrastructure

Once actions/deploy-pages succeeds, content is copied to GitHub's hosting. The original artifact is just a staging mechanism—its expiration has no effect on the live site.

You could deploy once and never touch the repository for years—the site would remain live.


4. Pros and Cons Analysis

4.1 gh-pages Branch Method

Advantages:

  • Complete deployment history preserved in git
  • Instant rollback via git operations (git reset, git revert)
  • Simpler workflow configuration (single action)
  • Well-documented with extensive community examples
  • Can inspect deployed files directly in branch

Disadvantages:

  • Repository size bloat: Each deployment adds commits; large sites grow to GB+
  • Clone time degradation: All contributors download gh-pages history
  • Binary files compress poorly: Built JS bundles, images don't delta well
  • Requires periodic maintenance: Force-pushing or orphan branches needed
  • Third-party action dependency

4.2 Artifact-Based Deployment

Advantages:

  • Zero repository bloat: Artifacts stored separately, expire automatically
  • Fast clones forever: Repository size constant regardless of deployment frequency
  • First-party GitHub support: Official actions maintained by GitHub
  • Enhanced security: OIDC token verification, environment protection rules
  • Modern best practice: Recommended by GitHub documentation
  • Cleaner repository structure (no orphan branch)
  • gh-pages branch can be deleted: All history removed, space reclaimed

Disadvantages:

  • Limited deployment history (artifact retention period)
  • More complex workflow setup (multiple steps/jobs)
  • Cannot directly inspect deployed files in repository
  • Requires repository settings change and environment protection rule configuration
  • Rollback requires rebuilding or dedicated workflow

5. Recommendation

5.1 Decision

We recommend adopting artifact-based deployment as the standard for all QuantEcon lecture repositories.

5.2 Justification

  1. Sustainable infrastructure: Repository size remains constant regardless of how many times you deploy

  2. Better contributor experience: New contributors always get fast clones

  3. Official support: First-party GitHub actions ensure long-term maintenance and compatibility

  4. The trade-off is acceptable: Limited deployment history is mitigated by:

    • Source code history remains complete
    • Artifact retention (90-180 days) covers typical rollback needs
    • Any version can be rebuilt from source
    • Deployed site persists indefinitely
  5. Clean break: Deleting gh-pages branch removes all accumulated bloat permanently

  6. Proven results: lecture-python-programming achieved 96% size reduction (358 MB → 15 MB)

5.3 Implementation Strategy

Phase 1 — New Projects:

  • Use artifact-based deployment for all new repositories
  • 2026-tom-course serves as reference implementation

Phase 2 — Existing Projects:

  • Evaluate current gh-pages branch size
  • Migrate following the guide below
  • Delete gh-pages branch after successful migration
  • First migration complete: lecture-python-programming (March 2026)

6. Technical Deep Dive

6.1 Rollback Strategies

With gh-pages Branch (Old Method)

# Instant rollback to previous deployment
git checkout gh-pages
git reset --hard HEAD~1
git push --force

# Rollback to specific date
git checkout gh-pages@{2025-01-01}
git push --force

With Artifact Deployment (New Method)

Option A: Re-run Previous Workflow

# Find the run ID
gh run list --workflow=deploy.yml --limit=10

# Re-run (rebuilds from that commit's source)
gh run rerun <run-id>

Option B: Deploy from Tag

# Trigger workflow on specific tag
gh workflow run deploy.yml --ref publish-2025dec15

Option C: Dedicated Rollback Workflow

See Section 9.2 for complete workflow file.

# Rollback to specific run's artifact
gh workflow run rollback.yml -f run_id=12345678

6.2 Can We Safely Delete gh-pages After Migration?

Yes. Once migrated to artifact-based deployment:

  1. The deployed site content lives in GitHub's Pages infrastructure
  2. It is completely independent of any repository branch
  3. Deleting gh-pages removes all accumulated history
  4. Repository size decreases (after GitHub garbage collection)
  5. The live site continues serving without interruption

The only thing you lose: The ability to instantly rollback via git. But you retain:

  • Full source code history (rebuild any version)
  • Artifact history (90+ days of deployments)
  • The rollback workflow option

7. Migration Guide

7.1 Pre-Migration Checklist

# Check gh-pages branch size
git fetch origin gh-pages
git rev-list --count origin/gh-pages  # Number of commits

# Estimate size impact
git clone --single-branch --branch gh-pages \
  https://github.com/QuantEcon/your-repo.git gh-pages-only
du -sh gh-pages-only/.git

Document current configuration:

  • Custom domain (e.g., python.quantecon.org)
  • HTTPS enforcement setting
  • Any special build requirements (GPU runners, etc.)
  • Existing workflow permissions (does it upload release assets, sync notebooks, etc.?)

7.2 Migration Sequence

Critical: Follow this exact order to avoid downtime.

Step 1: Update workflow file (replace peaceiris with native deploy steps)
           ↓
Step 2: Merge workflow changes to main
           ↓
Step 3: Change Pages source to "GitHub Actions" in Settings → Pages
           ↓
Step 4: Configure environment protection rules for publish tags  ← IMPORTANT
           ↓
Step 5: Run new workflow (push a publish tag) → deploys via artifacts
           ↓
Step 6: Verify site works correctly
           ↓
Step 7: Verify linkcheck / other dependent workflows still work
           ↓
Step 8: Delete gh-pages branch (reclaims space)

7.3 Step-by-Step Instructions

Step 1: Update Workflow File

Replace the peaceiris/actions-gh-pages deployment step with the native Pages steps. Key changes:

Add workflow-level permissions:

permissions:
  contents: write    # write if also uploading release assets; read otherwise
  actions: read      # needed if other workflows download artifacts
  pages: write
  id-token: write    # required for OIDC-based Pages deployment

Add concurrency group:

concurrency:
  group: "pages"
  cancel-in-progress: false  # don't cancel in-flight deploys

Add environment to the job:

jobs:
  publish:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}

Replace the deploy step:

# REMOVE:
- name: Deploy to GitHub Pages
  uses: peaceiris/actions-gh-pages@v4
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    publish_dir: _build/html/
    cname: python.quantecon.org

# REPLACE WITH:
- name: Add CNAME for custom domain
  run: echo "python.quantecon.org" > _build/html/CNAME
- name: Setup Pages
  uses: actions/configure-pages@v5
- name: Upload Pages artifact
  uses: actions/upload-pages-artifact@v3
  with:
    path: _build/html/
- name: Deploy to GitHub Pages
  id: deployment
  uses: actions/deploy-pages@v4

Note on single vs multi-job workflows: The guide in Section 9.1 shows a two-job pattern (build + deploy). For repositories using GPU runners (like lecture-python-programming), a single-job workflow avoids transferring large build artifacts between jobs. Choose the pattern that fits your runner setup.

Step 2: Merge Workflow Changes

Create a PR, get review, merge to main.

Step 3: Change GitHub Pages Source Setting

  1. Go to repository SettingsPages
  2. Under "Build and deployment":
    • Change Source from "Deploy from a branch" to "GitHub Actions"
  3. Click Save

Note: This creates a github-pages environment automatically.

Step 4: Configure Environment Protection Rules

⚠️ This step is critical. The auto-created github-pages environment only allows the main branch by default. If your workflow triggers on tags (e.g., publish*), the deploy will fail with: "Tag is not allowed to deploy to github-pages due to environment protection rules."

  1. Go to SettingsEnvironmentsgithub-pages
  2. Under "Deployment branches and tags", click "Add deployment branch or tag rule"
  3. Select Tag (not Branch)
  4. Enter pattern: publish*
  5. Save

After configuration, you should see: "1 branch and N tags allowed"

Step 5: Test New Workflow

git checkout main
git pull origin main
git tag publish-YYYY-MMM-DD-test
git push origin publish-YYYY-MMM-DD-test

Monitor at: https://github.com/QuantEcon/<repo>/actions

Step 6: Verify Site

  • Workflow completed successfully
  • github-pages environment shows the deployment
  • Homepage loads correctly (e.g., https://python-programming.quantecon.org/)
  • Spot-check 3+ lecture pages
  • Custom domain works with HTTPS
  • Release assets attached (tar.gz, checksum, manifest) if applicable
  • Notebook sync completed (if applicable)
  • Pages API confirms build_type: workflow:
    gh api repos/QuantEcon/<repo>/pages --jq '.build_type'
    # Should return: workflow

Step 7: Verify Dependent Workflows

If you have a linkcheck or other workflow that previously checked out gh-pages:

  • Trigger it manually (workflow_dispatch)
  • Verify it correctly downloads from the new source (release assets, not gh-pages branch)

Step 8: Delete gh-pages Branch

Via Command Line:

# Delete remote
git push origin --delete gh-pages

# Delete local (if it exists)
git branch -D gh-pages 2>/dev/null || true

# Clean up references
git remote prune origin

Notify team members to clean up their local clones:

git fetch --prune
git branch -D gh-pages 2>/dev/null || true
git gc --prune=now

7.4 Custom Domain Handling

If you have a custom domain (e.g., python.quantecon.org):

Recommended: Include CNAME in Build Output

- name: Add CNAME for custom domain
  run: echo "python.quantecon.org" > _build/html/CNAME

This ensures the CNAME is included in every artifact upload. The Settings → Pages custom domain setting alone is not sufficient since each artifact deployment overwrites the Pages content.


8. Post-Migration Procedures

8.1 Verify Migration Success

# Check deployment source
gh api repos/:owner/:repo/pages --jq '.build_type'
# Should return: workflow

# Check site is live
curl -sI https://your-domain.quantecon.org/ | head -5

# Verify no gh-pages branch
git fetch --prune
git branch -r | grep gh-pages  # Should return nothing

# Check fresh clone size
cd /tmp && git clone https://github.com/QuantEcon/repo.git size-check
du -sh size-check/.git

8.2 Monitor Repository Size

GitHub runs garbage collection periodically. Size reduction is typically immediate for fresh clones but the API-reported size may take up to 24 hours to update.

# Check repository size via API
gh api repos/:owner/:repo --jq '.size'

# Compare with fresh clone
git clone --bare https://github.com/QuantEcon/repo.git /tmp/bare-check
du -sh /tmp/bare-check

8.3 Rollback If Needed

If migration fails and gh-pages still exists:

  1. Settings → Pages → Source → "Deploy from a branch"
  2. Select gh-pages
  3. Site immediately serves from gh-pages again

If gh-pages already deleted:

  1. Keep/restore old workflow with peaceiris/actions-gh-pages
  2. Change Pages source to "Deploy from a branch"
  3. Run old workflow (recreates gh-pages branch)

For lecture-python-programming specifically:
Rollback commit documented in #500: fc6487c (March 20, 2026 deploy)


9. Appendix: Complete Workflow Examples

9.1 Full Production Workflow

Adapted for QuantEcon lecture repositories:

name: Build & Deploy Lectures

on:
  push:
    tags:
      - 'publish*'
  workflow_dispatch:
    inputs:
      debug:
        description: 'Enable debug mode'
        required: false
        default: 'false'

permissions:
  contents: write   # write needed for release asset uploads
  actions: read     # needed for artifact downloads by other workflows
  pages: write
  id-token: write

concurrency:
  group: "pages"
  cancel-in-progress: false

jobs:
  build:
    # For GPU builds, use custom runner:
    # runs-on: "runs-on=${{ github.run_id }}/family=g4dn.2xlarge/image=quantecon_ubuntu2404"
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Miniconda
        uses: conda-incubator/setup-miniconda@v3
        with:
          auto-update-conda: true
          auto-activate-base: true
          miniconda-version: 'latest'
          python-version: "3.11"
          environment-file: environment.yml
          activate-environment: quantecon

      - name: Build HTML
        shell: bash -l {0}
        run: |
          jupyter-book build lectures --path-output ./

      - name: Build Download Notebooks
        shell: bash -l {0}
        run: |
          jupyter-book build lectures --path-output ./ \
            --builder=custom --custom-builder=jupyter
          mkdir -p _build/html/_notebooks
          cp _build/jupyter/*.ipynb _build/html/_notebooks/

      - name: Configure Custom Domain
        run: echo "python.quantecon.org" > _build/html/CNAME

      - name: Setup Pages
        uses: actions/configure-pages@v5

      - name: Upload Pages Artifact
        uses: actions/upload-pages-artifact@v4
        with:
          path: '_build/html'
          retention-days: 180

  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

9.2 Rollback Workflow

Save as .github/workflows/rollback.yml:

name: Rollback Deployment

on:
  workflow_dispatch:
    inputs:
      run_id:
        description: 'Workflow run ID to rollback to (find via: gh run list)'
        required: true
        type: string

permissions:
  contents: read
  pages: write
  id-token: write
  actions: read

jobs:
  rollback:
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - name: Download artifact from previous run
        uses: actions/download-artifact@v4
        with:
          name: github-pages
          path: ./rollback-artifact
          github-token: ${{ secrets.GITHUB_TOKEN }}
          run-id: ${{ inputs.run_id }}

      - name: Extract artifact
        run: |
          mkdir -p ./pages-content
          tar -xvf ./rollback-artifact/artifact.tar -C ./pages-content

      - name: Upload for deployment
        uses: actions/upload-pages-artifact@v4
        with:
          path: ./pages-content

      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

Usage:

# Find available runs
gh run list --workflow=deploy.yml --limit=20

# Trigger rollback
gh workflow run rollback.yml -f run_id=12345678

Summary

Question Answer
Should we migrate? Yes — artifact-based deployment is the modern best practice
Will the site go down? No — zero downtime if following the migration sequence
Can we delete gh-pages? Yes — safely delete after migration; site persists independently
What do we lose? Instant git-based rollback (mitigated by rollback workflow)
What do we gain? Sustainable repo size, fast clones, official GitHub support
Real-world savings? 96% reductionlecture-python-programming: 358 MB → 15 MB

Final recommendation: Proceed with migration for all lecture repositories. Use lecture-python-programming (PR #499) as the reference implementation.

Lessons Learned from lecture-python-programming Migration

  1. Environment protection rules are critical — the github-pages environment auto-creates with only main branch allowed. You must add a tag rule (not branch) for publish* before triggering the first deploy.
  2. Use release assets for linkcheck, not Pages artifacts — release tarballs are permanent and don't expire, making them more reliable for scheduled link checking.
  3. CNAME must be in the build output — add it as a build step before upload-pages-artifact, since each deploy overwrites the full Pages content.
  4. Single-job works for GPU runners — avoid the two-job (build+deploy) pattern when using expensive GPU runners to prevent large artifact transfers between jobs.
  5. GitHub GC is fast — after deleting gh-pages, a fresh clone immediately reflected the reduced size (15 MB vs 358 MB). No need to contact GitHub Support for manual GC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions