A high-performance alternative to spark-submit
for launching Spark applications via the Spark Operator in Kubernetes clusters. This plugin eliminates the JVM spin-up overhead associated with traditional spark-submit
commands, providing faster application startup times.
This repository contains a native submit plugin that runs as a sidecar container alongside the Spark Operator controller. The plugin provides a gRPC service that the Spark Operator can use to submit Spark applications without the overhead of JVM startup.
- π Native Go implementation bypassing JVM overhead
- β‘ Faster Spark application startup
- π§ gRPC service for Spark Operator integration
- π Secure execution environment with non-root user
- π Health checks and metrics endpoints
- π Support for various Spark application types (Java, Scala, Python, R)
- π³ Containerized deployment as sidecar
The plugin is designed to run as a sidecar container alongside the Spark Operator controller:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Spark Operator Pod β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β Spark Operator β β Native Submit Plugin β β
β β Controller β β (Sidecar Container) β β
β β β β β β
β β - Watches CRDs β β - gRPC Server (port 50051) β β
β β - Manages lifecycle β β - Health checks (port 9090)β β
β β - Calls gRPC β β - Native submit logic β β
β βββββββββββββββββββββββ βββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- gRPC Service: Runs on port 50051, provides
RunAltSparkSubmit
method - Health Checks: HTTP endpoints on port 9090 (
/healthz
,/readyz
) - Native Logic: Go implementation for Spark application submission
- Security: Runs as non-root user (UID: 185, GID: 185)
- Kubernetes cluster
- Spark Operator installed in the cluster
- kubectl configured to access the cluster
- Docker (for building the container image)
# Build the Docker image
docker build -t native-submit:latest .
# Tag for your registry (example)
docker tag native-submit:latest your-registry/native-submit:latest
docker push your-registry/native-submit:latest
The plugin is deployed as a sidecar container with the Spark Operator. The Spark Operator controller is configured to use the gRPC service:
# Example deployment configuration
containers:
- name: spark-operator-controller
image: ghcr.io/kubeflow/spark-operator/controller:2.2.1
args:
- --submitter-type=grpc
- --grpc-server-address=localhost:50051
- --grpc-submit-timeout=10s
# ... other args
- name: native-submit
image: your-registry/native-submit:latest
ports:
- containerPort: 50051 # gRPC
- containerPort: 9090 # Health checks
# Check if the sidecar is running
kubectl get pods -n spark-operator
# Check logs
kubectl logs -n spark-operator deployment/spark-operator-controller -c native-submit
# Test health endpoint
kubectl port-forward -n spark-operator deployment/spark-operator-controller 9090:9090
curl http://localhost:9090/healthz
The plugin provides a gRPC service with the following method:
service SparkSubmitService {
rpc RunAltSparkSubmit(RunAltSparkSubmitRequest) returns (RunAltSparkSubmitResponse);
}
message RunAltSparkSubmitRequest {
SparkApplication spark_application = 1;
string submission_id = 2;
}
message RunAltSparkSubmitResponse {
bool success = 1;
string error_message = 2;
}
- Health Check:
GET /healthz
- Service health status - Readiness Check:
GET /readyz
- Service readiness status
GRPC_PORT
: gRPC server port (default: 50051)HEALTH_PORT
: Health check port (default: 9090)
The Spark Operator controller must be configured with:
args:
- --submitter-type=grpc
- --grpc-server-address=localhost:50051
- --grpc-submit-timeout=10s
# Build the binary
go build -o native-submit ./main
# Build Docker image
docker build -t native-submit:latest .
# Run tests
go test -v ./...
# Run unit tests
go test -v ./...
# Run tests with coverage
go test -cover ./...
# Test gRPC service locally
go run test_grpc_client.go
The plugin is typically deployed as part of the Spark Operator Helm chart:
# Install Spark Operator with native submit plugin
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace \
--set controller.image.tag=2.2.1 \
--set controller.args.submitter-type=grpc \
--set controller.args.grpc-server-address=localhost:50051 \
--set controller.args.grpc-submit-timeout=10s \
--set controller.sidecars.native-submit.enabled=true \
--set controller.sidecars.native-submit.image=your-registry/native-submit:latest
For manual deployment, update the Spark Operator deployment to include the sidecar container and configure the controller to use the gRPC service.
The service includes:
- Liveness Probe:
GET /healthz
on port 9090 - Readiness Probe:
GET /readyz
on port 9090 - Docker Health Check: Built into the container
# View plugin logs
kubectl logs -n spark-operator deployment/spark-operator-controller -c native-submit
# View controller logs
kubectl logs -n spark-operator deployment/spark-operator-controller -c spark-operator-controller
The container runs with:
- Non-root user (UID: 185, GID: 185)
- Read-only root filesystem
- Dropped capabilities
- Security context with minimal privileges
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.