Skip to content

Add Kubernetes manifest generator #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fnerdman
Copy link

Builder Playground Kubernetes Integration

Design Analysis and Implementation Report

This report summarizes our discussions, design analysis, and implementation approach for enabling Kubernetes support in builder-playground. It serves as a comprehensive guide to continue development of the Kubernetes operator.

1. Context and Background

Builder Playground is a tool to deploy end-to-end environments for blockchain builder testing. Currently, it uses Docker Compose for local deployments, but there's a need to support Kubernetes for more scalable and production-ready deployments.

Current Architecture Overview

Builder Playground follows a clear architecture:

  • Recipes: Define environments to deploy (L1, OpStack, BuilderNet)
  • Artifacts: Files and configurations needed by services
  • Manifest: Intermediate representation of services and their relationships
  • Runner: Responsible for executing the manifest (currently Docker Compose)

Integration Goals

  1. Enable Kubernetes deployment option for builder-playground
  2. Maintain the simple developer experience
  3. Support both local development and testing/staging environments
  4. Allow for customization before deployment
  5. Provide a path toward a Kubernetes operator

2. Design Approaches Considered

We analyzed two primary approaches for Kubernetes integration:

Approach 1: K8s Runner Implementation

This approach would add a new runner to the existing architecture:

// Register the kubernetes runner
factory.Register("kubernetes", func(out *output, manifest *Manifest, overrides map[string]string, interactive bool) (Runner, error) {
    namespace := "default"
    if ns, ok := overrides["namespace"]; ok {
        namespace = ns
        delete(overrides, "namespace")
    }
    return NewK8sRunner(out, manifest, namespace)
})

Pros:

  • Tightly integrated with builder-playground
  • Extends existing architecture
  • Full control over resource lifecycle
  • Easier debugging (single process)

Cons:

  • Less Kubernetes-native
  • More complex logic to track resource status
  • Places burden of k8s resource management on CLI
  • More complex error handling

Approach 2: Kubernetes Operator

This approach creates a dedicated Kubernetes operator that watches for custom resources describing a Builder Playground deployment:

apiVersion: playground.flashbots.io/v1alpha1
kind: BuilderPlayground
metadata:
  name: demo-builder
spec:
  latestFork: true
  genesisDelay: 30
  dryRun: false
  storageMethod: local-path
  # other configuration...

Pros:

  • More Kubernetes-native (follows best practices)
  • Better scalability for future features
  • Cleaner separation of concerns
  • Better for production deployments

Cons:

  • More complex implementation (requires operator)
  • Requires managing two codebases
  • Debugging may be more difficult across system boundaries
  • Higher learning curve for contributors

Hybrid Approach (Selected)

After careful analysis, we chose a hybrid approach, focusing on:

  1. Creating a manifest generator that outputs Kubernetes CRDs
  2. Designing a CRD structure that preserves the builder-playground service model
  3. Setting the stage for a future operator implementation

This approach enables users to:

  • Generate Kubernetes manifests with --k8s flag
  • Optionally inspect and modify the manifests
  • Apply them to Kubernetes with standard tools
  • Eventually use a purpose-built operator for enhanced functionality

3. CRD Design

We designed a comprehensive CRD structure that captures all aspects of the builder-playground manifest:

apiVersion: playground.flashbots.io/v1alpha1
kind: BuilderPlaygroundDeployment
metadata:
  name: l1-environment
spec:
  # Recipe information
  recipe: l1
  
  # Storage configuration
  storage:
    type: local-path
    path: /data/builder-playground
  
  # Services configuration
  services:
    - name: beacon
      image: sigp/lighthouse
      tag: v7.0.0-beta.0
      entrypoint: ["lighthouse"]
      args:
        - "bn"
        - "--datadir"
        - "{{.Dir}}/data_beacon_node"
        # ...more args
      ports:
        - name: http
          containerPort: 3500
      dependencies:
        - name: el
          condition: healthy

Key design decisions:

  1. Self-Contained Definition: All services are contained within a single CRD, similar to how docker-compose works, making the resource easier to understand and manage.

  2. Template Preservation: The manifest preserves templating expressions like {{.Dir}} and {{Service "el" "authrpc"}} to be resolved by the operator.

  3. Service Relationships: Dependencies between services are explicitly defined, allowing the operator to manage deployment order.

  4. Storage Abstraction: Two storage options (local-path for development, PVC for production) provide flexibility for different environments.

  5. No Status Field: The status field is omitted from the generated CRD as it should be managed by the Kubernetes controller.

4. Implementation Details

Integration Approach

The implementation adds a --k8s flag to the existing cook command that triggers generation of a Kubernetes manifest alongside the regular artifacts:

$ playground cook l1 --dry-run --k8s --output ./output --storage-type local-path

Key Components

  1. K8sGenerator struct:
type K8sGenerator struct {
    Manifest     *Manifest
    RecipeName   string
    StorageType  string
    StoragePath  string
    StorageClass string
    StorageSize  string
    NetworkName  string
    OutputDir    string
}
  1. Strongly Typed CRD Structures:
type BuilderPlaygroundDeployment struct {
    APIVersion string                    `yaml:"apiVersion"`
    Kind       string                    `yaml:"kind"`
    Metadata   BuilderPlaygroundMetadata `yaml:"metadata"`
    Spec       BuilderPlaygroundSpec     `yaml:"spec"`
}
  1. Service Conversion Logic:
func convertServiceToK8s(svc *service) (BuilderPlaygroundService, error) {
    // Validate required fields
    if svc.image == "" {
        return BuilderPlaygroundService{}, fmt.Errorf("service %s missing required image", svc.Name)
    }
    
    k8sService := BuilderPlaygroundService{
        Name:  svc.Name,
        Image: svc.image,
        Tag:   svc.tag,
    }
    
    // Convert other fields...
    
    return k8sService, nil
}
  1. Label Filtering with Map:
var internalLabels = map[string]bool{
    "service":            true,
    "playground":         true,
    "playground.session": true,
}

for k, v := range labels {
    if !internalLabels[k] {
        serviceLabels[k] = v
    }
}

Error Handling

The implementation includes proper validation and error handling:

// Validate required fields
if svc.image == "" {
    return BuilderPlaygroundService{}, fmt.Errorf("service %s missing required image", svc.Name)
}

// Propagate errors with context
k8sService, err := convertServiceToK8s(svc)
if err != nil {
    return crd, fmt.Errorf("failed to convert service %s: %w", svc.Name, err)
}

5. Current Limitations and Future Development

Current Limitations

Our current CRD implementation doesn't yet address several important aspects:

  1. Host Execution: There's currently no concrete solution for running services on the host machine when deploying to Kubernetes. The CRD includes a useHostExecution flag, but implementing this in Kubernetes requires careful consideration. Possible approaches include:

    • Using DaemonSets with privileged containers
    • Running services outside of Kubernetes with network exposure
    • Using tools like Kubevirt for virtualization
  2. Complex Networking: The current design doesn't fully address complex networking scenarios like exposing specific ports to external clients or handling cross-service communication with specific protocols.

  3. Resource Limits: The CRD doesn't yet include configurations for CPU/memory limits and requests.

  4. Security Considerations: JWT tokens and other sensitive information aren't properly handled through Kubernetes secrets yet.

  5. Node Affinity and Placement: For testing/staging deployments, node selection and affinity rules would be needed.

Decision to Prioritize Core Functionality

We've deliberately chosen to focus first on covering the core functionality:

  • Service definitions and relationships
  • Storage configuration
  • Basic networking
  • Service dependencies

This approach allows us to get a working implementation more quickly, with a plan to address the more complex aspects in future iterations. The current implementation provides a solid foundation that captures the essential structure of builder-playground environments, while leaving room for enhancements.

6. Operator Implementation Guidelines

For developing the Kubernetes operator that will consume these CRDs, consider the following key aspects in the context of our single-pod approach:

1. Template Resolution in Single-Pod Context

Template resolution becomes more straightforward in a single-pod architecture:

Template Single-Pod Resolution
{{.Dir}} Common volume mount path (e.g., /artifacts)
{{Service "el" "authrpc"}} localhost:8551 (direct container-to-container communication)
{{Port "http" 8545}} The actual port number as defined in the container

This simplifies the implementation as services can use localhost networking rather than requiring Kubernetes DNS for service discovery.

2. Resource Creation Strategy

Given our single-pod approach, the operator should create:

  • One Pod with Multiple Containers: Each service becomes a container within the same pod
  • Services: For network access to exposed ports
  • ConfigMap/Secret: For shared configuration files
  • SinglePVC: For persistent storage (based on storage configuration)

This simplifies resource management compared to a multi-pod approach while maintaining the relationship structure between services.

3. Dependency Management

With the single-pod approach, dependency management becomes primarily an initialization concern:

  • Init Containers: Can be used to ensure services start in the correct order
  • Readiness Probes: Services can wait for dependencies to be ready
  • Shared Volume: All containers have access to the same storage, simplifying file-based dependencies

This is simpler than managing dependencies across multiple independent deployments, as all containers are guaranteed to run on the same node and can communicate directly.

4. Readiness Checks

Transform the readyCheck definitions into Kubernetes readiness probes:

# From CRD
readyCheck:
  queryURL: http://localhost:3500/eth/v1/node/syncing
  interval: 1s
  timeout: 30s

# To Kubernetes
readinessProbe:
  httpGet:
    path: /eth/v1/node/syncing
    port: 3500
  periodSeconds: 1
  timeoutSeconds: 30

5. Storage Implementation

With the single-pod approach, storage becomes simpler:

  • Single Shared Volume: All containers in the pod mount the same volume
  • Common Access Path: Each container can access the same files at the same path
  • Storage Options:
    • local-path: Use hostPath volumes for development environments
    • pvc: Use a single PersistentVolumeClaim with ReadWriteOnce access mode for production

This eliminates the need for ReadWriteMany storage classes that would be required in a multi-pod architecture.

7. Host Execution in Single-Pod Context

For services with useHostExecution: true, the single-pod approach presents challenges:

  • Container vs. Host: By definition, these services need to run outside the pod
  • Potential Solutions:
    • A coordinating controller that runs some services as pods and others as external processes
    • A hybrid deployment where the operator manages both Kubernetes resources and external processes
    • Using a privileged container with access to the host's process namespace

This remains one of the more challenging aspects to solve and may require further architectural discussions.

7. Testing Recommendations

When testing the operator, focus on:

  1. Local Development Environments:

    • Test with minikube, k3s, and kind
    • Verify all services start in the correct order
    • Check template resolution works correctly
  2. Recipe Compatibility:

    • Test all recipes (l1, opstack, buildernet)
    • Verify service interactions work as expected
  3. Storage Options:

    • Test local-path for development environments
    • Test PVC for testing/staging environments
  4. Template Processing:

    • Test resolution of all template expressions
    • Verify service discovery works correctly

7. Conclusion and Next Steps

The implemented Kubernetes manifest generator provides a solid foundation for building a complete Kubernetes operator for Builder Playground. By generating CRDs that preserve all the necessary information about services and their relationships, we enable a clear path forward for Kubernetes integration.

Current Focus: Matching the Docker Compose Workflow

It's important to emphasize that our current goal is to match the existing workflow with --dry-run and docker-compose.yml, not to build a fully automated deployment system yet. The current approach allows users to:

  1. Generate Kubernetes manifests
  2. Inspect and potentially modify them
  3. Manually deploy to Kubernetes when ready

This matches the developer workflow already familiar to builder-playground users, where the dry-run enables inspection and customization before deployment.

Future Deployment Options

For the longer term, we've considered two main approaches for the operator deployment:

  1. Standalone Operator: Deploy the operator separately, then submit BuilderPlayground manifests. The operator would deploy a one-shot builder-playground container that generates the final service manifests for the operator to apply.

  2. Integrated Operator: Include the operator code directly in the builder-playground binary, with an option to run it as the operator itself. This would create a more integrated solution but add complexity to the codebase.

Either approach would eventually enable a more automated workflow, but they are intentionally not part of the current implementation.

Next Steps

  1. Develop Operator Skeleton:

    • Register the CRD type
    • Create basic controller structure
    • Define reconciliation logic
  2. Implement Template Handling:

    • Create a template processor for builder-playground templates
    • Test with different service configurations
  3. Resource Management:

    • Implement single pod with multiple containers
    • Use init containers for proper ordering
    • Manage shared PVC for storage
  4. Testing:

    • Begin with simple recipes and progressively test more complex ones
    • Verify all features work as with docker-compose

This hybrid approach maintains the simplicity of builder-playground while enabling Kubernetes deployment, providing a flexible solution that supports both development and production use cases, with a clear path for evolution toward more automated deployment in the future.

- Created k8s_generator.go with Kubernetes manifest generation logic
- Added CLI flags for Kubernetes manifest generation
- Integrated manifest generation into recipe workflow
- Added CRD defintion to examples folder

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@@ -173,6 +178,11 @@ func main() {
recipeCmd.Flags().BoolVar(&bindExternal, "bind-external", false, "bind host ports to external interface")
recipeCmd.Flags().BoolVar(&withPrometheus, "with-prometheus", false, "whether to gather the Prometheus metrics")
recipeCmd.Flags().StringVar(&networkName, "network", "", "network name")
recipeCmd.Flags().BoolVar(&k8sFlag, "k8s", false, "Generate Kubernetes manifests")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the things I wanted to avoid for the integration. K8s must be a separated lifecycle from the normal playground. Playground should generate an artifacts folder which gets consumed by the k8s generator in a second step.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if I can follow you. The idea of the --k8s flag is to only use it with dry run, i.e.

$ playground cook l1 --dry-run --k8s --output ./output --storage-type local-path

What the flag does is create a k8s-manifest.yaml, which will then be consumed by the k8s generator/operator.
We need to convert the internal playground manifest to the k8s manifest. We shouldn't work with the docker-compose.yml as that has already abstracted away some of the core information needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, just saw your manifest.json patch. You would like the k8s impl to use this manifest.json and not include the k8s manifest logic in playground at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants