|
| 1 | +# KEP-XXXX: Add REST API Support for SparkApplication CRD |
| 2 | + |
| 3 | +<!-- |
| 4 | +A lightweight REST API proxy for SparkApplication CRDs in the Spark Operator. |
| 5 | +--> |
| 6 | + |
| 7 | +## Summary |
| 8 | + |
| 9 | +Expose a RESTful HTTP interface alongside the Spark Operator to streamline the creation, retrieval, update, listing, and deletion of `SparkApplication` Custom Resources. By bundling a minimal Go-based HTTP server that proxies JSON payloads directly to the Kubernetes API (using `client-go`), users and external systems (CI/CD pipelines, web UIs, custom dashboards) can manage Spark jobs without requiring `kubectl` or deep Kubernetes expertise. |
| 10 | + |
| 11 | +## Motivation |
| 12 | + |
| 13 | +Currently, submitting Spark jobs via the Spark Operator demands crafting and applying Kubernetes manifests with `kubectl` or invoking client libraries. This creates friction for non-Kubernetes-native workflows and requires boilerplate integration code in external tools. |
| 14 | + |
| 15 | +### Goals |
| 16 | + |
| 17 | +- Provide HTTP endpoints for CRUD operations on `SparkApplication` CRs. |
| 18 | +- Allow cluster administrators to configure and integrate the authentication and authorization mechanisms of their choice. |
| 19 | +- Package the REST proxy as a container alongside the Spark Operator in Helm charts or manifests. |
| 20 | +- Ensure minimal resource overhead and operational complexity. |
| 21 | + |
| 22 | +### Non-Goals |
| 23 | + |
| 24 | +- Replacing general-purpose CLI tools like `kubectl` for arbitrary resources. |
| 25 | +- Implementing extensive admission logic or API aggregation capabilities beyond basic proxying. |
| 26 | +- Managing non-Spark CRDs or core Kubernetes objects in this phase. |
| 27 | + |
| 28 | +## Proposal |
| 29 | + |
| 30 | +Deploy a companion HTTP server with the Spark Operator that: |
| 31 | + |
| 32 | +1. **Listens** on a configurable port (default 8080) inside the same pod or as a sidecar. |
| 33 | +2. **Maps HTTP routes** to Kubernetes operations using `client-go`, operating only within a configured namespace scope: |
| 34 | + - `POST /sparkapplications` → Create |
| 35 | + - `GET /sparkapplications/{namespace}/{name}` → Get |
| 36 | + - `PUT /sparkapplications/{namespace}/{name}` → Update |
| 37 | + - `DELETE /sparkapplications/{namespace}/{name}` → Delete |
| 38 | + - `GET /sparkapplications?namespace={ns}` → List |
| 39 | +3. **Accepts and returns** only JSON representations of the CRD, ensuring that manifests applied via `kubectl` or submitted via this REST API behave identically with no difference in outcomes. |
| 40 | +4. **Leverages in-cluster config** for authentication, mounting a namespaced ServiceAccount token bound to a Role granting access to `sparkapplications.sparkoperator.k8s.io` within that namespace. |
| 41 | +5. **Supports TLS termination** via mounted certificates (cert-manager or manual). |
| 42 | +6. **Emits** structured logs and exposes Prometheus metrics for request counts and latencies. |
| 43 | + |
| 44 | +### User Stories (Optional) |
| 45 | + |
| 46 | +#### Story 1 |
| 47 | +As a data engineer, I want to submit Spark jobs by sending a single HTTP request from my CI pipeline, so I don’t need to install or configure `kubectl` on my build agents. |
| 48 | + |
| 49 | +#### Story 2 |
| 50 | +As a platform operator, I want to integrate Spark job submission into our internal web portal using REST calls, so that users can launch jobs without learning Kubernetes details. |
| 51 | + |
| 52 | +#### Story 3 |
| 53 | +As a user without Kubernetes expertise, I want to use a familiar HTTP API to submit Spark jobs, so I don’t need direct cluster access or knowledge of `kubectl` commands. |
| 54 | + |
| 55 | +### Notes/Constraints/Caveats (Optional) |
| 56 | + |
| 57 | +- This proxy does not implement Kubernetes API aggregation; it is a user-space proxy translating HTTP to Kubernetes API calls. |
| 58 | +- All CRD validation and defaulting is still handled by the CRD’s OpenAPI schema and the Spark Operator admission logic. |
| 59 | +- TLS and authentication configurations must be explicitly managed by the cluster administrator. |
| 60 | + |
| 61 | +### Risks and Mitigations |
| 62 | + |
| 63 | +| Risk | Mitigation | |
| 64 | +|-----------------------------------------|---------------------------------------------------------------| |
| 65 | +| Exposed HTTP endpoint could be abused | Enforce RBAC, require ServiceAccount tokens, support TLS. | |
| 66 | +| Additional component to maintain | Keep proxy logic minimal, reuse `client-go`, align with Operator releases. | |
| 67 | +| Single point of failure for submissions | Deploy as a sidecar or with HA replica sets. | |
| 68 | + |
| 69 | +## Design Details |
| 70 | + |
| 71 | +- **Server implementation**: Go HTTP server using Gorilla Mux or standard `net/http`, calling Kubernetes API via `client-go`. |
| 72 | +- **Deployment**: Update the Spark Operator Helm chart to include a new Deployment (or sidecar) for the REST proxy, with ServiceAccount and RBAC definitions limited to a namespace. |
| 73 | +- **Configuration**: Helm values for port, TLS cert paths, namespace scope filter, resource limits. |
| 74 | + |
| 75 | +### Test Plan |
| 76 | + |
| 77 | +- **Unit Tests**: Mock `client-go` interactions to verify request-to-API mappings and error handling. |
| 78 | +- **Integration Tests**: Deploy in a test cluster; execute CRUD operations via HTTP and assert correct CRD states. |
| 79 | +- **E2E Tests**: Use the existing Spark Operator E2E framework to submit jobs via the proxy and verify job completion. |
| 80 | + |
| 81 | +## Graduation Criteria |
| 82 | + |
| 83 | +- Alpha: Basic CRUD endpoints implemented, tested in one real cluster, enabled by a feature flag in Helm. |
| 84 | +- Beta: TLS support, metrics, and documentation completed; rolling upgrades tested. |
| 85 | +- Stable: No feature flag; production-grade documentation and test coverage ≥ 90%; promoted in Spark Operator release notes. |
| 86 | + |
| 87 | +## Implementation History |
| 88 | + |
| 89 | +- 2025-04-27: KEP created (provisional). |
| 90 | + |
| 91 | +## Drawbacks |
| 92 | + |
| 93 | +- Introduces an extra deployment and potential attack surface. |
| 94 | +- May duplicate future Kubernetes API aggregation capabilities. |
| 95 | +- Slight increase in operational complexity for cluster administrators. |
| 96 | + |
| 97 | +## Alternatives |
| 98 | + |
| 99 | +- **Standalone Spark Cluster**: Deploy Spark in standalone mode, which natively includes a REST submission server, eliminating the need for an additional proxy component and leveraging Spark’s built-in submission API. |
| 100 | + |
0 commit comments