-
Couldn't load subscription status.
- Fork 1.4k
Added comprehensive Kerberos authentication support for Spark 3.1+ #2630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Added comprehensive Kerberos authentication support for Spark 3.1+ #2630
Conversation
93f3289 to
a58a803
Compare
|
@ChenYi015 I think this PR is ready for review! Appreciate any feedback 🚀 |
a5afe0c to
2b08a07
Compare
…plications This commit implements complete Kerberos authentication support for Spark applications running on Kubernetes, providing secure access to Hadoop ecosystem services including HDFS, Hive, HBase, and other Kerberos-enabled components. Key Features: - Native Kerberos configuration in SparkApplication CRD - Automatic keytab and krb5.conf secret mounting - Spark 4.0+ compatibility with delegation token management - Configurable credential renewal strategies (keytab/ccache) - Service-specific Kerberos credential control - Comprehensive documentation and examples Implementation Details: - New KerberosSpec API with principal, keytab, and config options - SecretTypeKerberosKeytab for automatic environment variable setup - Automatic secret mounting to driver and executor pods - Spark configuration generation for Hadoop and Kerberos settings - Environment variable configuration for KRB5_KEYTAB_FILE and KRB5_PRINCIPAL - Support for custom keytab/config file names and mount paths Configuration Options: - principal: Kerberos principal name - keytabSecret/configSecret: Secret names containing keytab and krb5.conf - renewalCredentials: Credential renewal strategy (keytab/ccache) - enabledServices: Configurable Hadoop services for delegation tokens - keytabFile/configFile: Custom file names within secrets Files Modified: - API types and generated code for new Kerberos fields - Spark submission logic with automatic Kerberos configuration - Helm chart with new Kerberos values and updated CRDs - Comprehensive documentation with setup guide and examples - Unit tests covering all Kerberos configuration scenarios The implementation automatically handles Spark 4.0's validation requirements while using user-provided secrets for actual authentication, ensuring compatibility with existing Kubernetes secret management practices. Signed-off-by: josecsotomorales <[email protected]>
2b08a07 to
91476ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great contribution! I have left some comments.
internal/controller/sparkapplication/submission_kerberos_test.go
Outdated
Show resolved
Hide resolved
| // Set driver and executor JVM options for krb5.conf | ||
| args = append(args, "--conf", fmt.Sprintf("spark.driver.extraJavaOptions=-Djava.security.krb5.conf=%s", configPath)) | ||
| args = append(args, "--conf", fmt.Sprintf("spark.executor.extraJavaOptions=-Djava.security.krb5.conf=%s", configPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it affect .spec.[driver|executor].javaOptions or spark.driver.extraJavaOptions defined in .spec.sparkConf? Since this property can be passed to spark-submit multiple times.
|
/assign @vara-bonthu @nabuskey @jacobsalway |
|
@josecsotomorales Does this feature also work for Spark 3? |
Yes, Kerberos support was added in Spark 3.0.0, and Keytab auth is supported since 3.1.0: But this PR is only tested with Spark 4. |
Co-authored-by: Yi Chen <[email protected]> Signed-off-by: Jose Soto <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- Correct Chart version from 2.4.0 to 2.3.0 to match VERSION file - Fix version inconsistencies in spark-pi-kerberos.yaml example - Update serviceAccount name in Kerberos example to use proper Helm chart service account - Regenerate Helm chart README.md with updated version and Kerberos documentation - Document JavaOptions interaction with Kerberos configuration Signed-off-by: Jose Soto <[email protected]> Signed-off-by: josecsotomorales <[email protected]>
b246583 to
abb78f8
Compare
|
Here's the doc for adding it to the Kubeflow website: Kerberos Authentication SupportThe Spark Operator now supports Kerberos authentication for secure access to Hadoop clusters, optimized for Apache Spark. This enables Spark applications to authenticate with Kerberos-enabled services such as HDFS, Hive, HBase, and other components in a secure Hadoop ecosystem using the latest Spark security features. OverviewKerberos is a network authentication protocol designed to provide strong authentication for client/server applications. In Hadoop clusters, Kerberos is commonly used to secure access to:
The Spark Operator's Kerberos support automates the configuration of Kerberos authentication for Spark applications, making it easier to run secure Spark jobs. PrerequisitesBefore using Kerberos authentication with Spark Operator, ensure:
Configuration1. Create Kerberos SecretsFirst, create Kubernetes secrets containing your keytab and Kerberos configuration files: # Create secret with keytab file
kubectl create secret generic spark-kerberos-keytab \
--from-file=krb5.keytab=/path/to/your/spark.keytab
# Create secret with Kerberos configuration
kubectl create secret generic spark-kerberos-config \
--from-file=krb5.conf=/path/to/your/krb5.conf2. Configure SparkApplicationAdd the Kerberos configuration to your SparkApplication specification: apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi-kerberos
spec:
# ... other configuration ...
# Kerberos authentication configuration
kerberos:
principal: "[email protected]" # Kerberos principal
realm: "EXAMPLE.COM" # Kerberos realm (optional)
kdc: "kdc.example.com:88" # KDC address (optional)
keytabSecret: "spark-kerberos-keytab" # Secret containing keytab
configSecret: "spark-kerberos-config" # Secret containing krb5.conf
# Hadoop configuration for Kerberos
hadoopConf:
"hadoop.security.authentication": "kerberos"
"hadoop.security.authorization": "true"
# Add other Hadoop-specific configuration as needed
driver:
# Mount Kerberos secrets
secrets:
- name: "spark-kerberos-keytab"
path: "/etc/kerberos/keytab"
secretType: "KerberosKeytab"
- name: "spark-kerberos-config"
path: "/etc/kerberos/conf"
secretType: "Generic"
executor:
# Mount Kerberos secrets
secrets:
- name: "spark-kerberos-keytab"
path: "/etc/kerberos/keytab"
secretType: "KerberosKeytab"
- name: "spark-kerberos-config"
path: "/etc/kerberos/conf"
secretType: "Generic"Configuration OptionsKerberosSpec Fields
SecretTypeThe operator supports a new secret type
Environment VariablesThe Kerberos implementation uses standard Kerberos environment variables:
These are automatically configured when using the Spark ConfigurationAutomatic ConfigurationThe Kerberos configuration automatically adds the following Spark/Hadoop configurations: Integration with Existing Spark ConfigurationJavaOptions Interaction:
Option 1: Use sparkConf instead of javaOptions (Recommended) spec:
sparkConf:
# Your custom JVM options - will be merged with Kerberos options
"spark.driver.extraJavaOptions": "-XX:+UseG1GC -Xms1g"
"spark.executor.extraJavaOptions": "-XX:+UseG1GC -Xms512m"
kerberos:
# Kerberos configurationOption 2: Include Kerberos options in your javaOptions spec:
driver:
# Include both your options AND the Kerberos krb5.conf option
javaOptions: "-XX:+UseG1GC -Xms1g -Djava.security.krb5.conf=/etc/krb5-config/krb5.conf"
executor:
javaOptions: "-XX:+UseG1GC -Xms512m -Djava.security.krb5.conf=/etc/krb5-config/krb5.conf"
kerberos:
# Other Kerberos configurationSparkConf Interaction: Credential Renewal StrategiesKeytab-based Renewal (Recommended): kerberos:
renewalCredentials: "keytab" # Default for long-running applicationsTicket Cache Renewal: kerberos:
renewalCredentials: "ccache" # Requires external ticket managementService-Specific ConfigurationControl which services have Kerberos credentials enabled: kerberos:
enabledServices: ["hadoopfs", "hbase", "hive", "yarn"]TroubleshootingCommon Issues
Debug Commands# Check if secrets are properly mounted
kubectl exec -it <spark-driver-pod> -- ls -la /etc/kerberos/keytab/
kubectl exec -it <spark-driver-pod> -- ls -la /etc/kerberos/conf/
# Test Kerberos authentication
kubectl exec -it <spark-driver-pod> -- kinit -kt /etc/kerberos/keytab/krb5.keytab [email protected]
# View current Kerberos tickets
kubectl exec -it <spark-driver-pod> -- klistExamplesSee Security Considerations
Migration from Manual ConfigurationIf you were previously configuring Kerberos manually via spark configuration, you can migrate to the new native support: Before (Manual Configuration)sparkConf:
"spark.hadoop.hadoop.security.authentication": "kerberos"
"spark.hadoop.hadoop.kerberos.principal": "[email protected]"
"spark.hadoop.hadoop.kerberos.keytab": "/etc/secrets/keytab"
# ... manual secret mounting ...After (Native Support)kerberos:
principal: "[email protected]"
keytabSecret: "spark-kerberos-keytab"
configSecret: "spark-kerberos-config"The native support automatically handles the underlying Spark and Hadoop configurations, secret mounting, and environment variable setup. Content for Kubeflow Website Kerberos DocumentationThis file contains the content that should be added to the Kubeflow website documentation after the Kerberos support PR is merged. This content is based on docs/kerberos-support.md but streamlined for the Kubeflow website format. Kerberos Authentication SupportThe Spark Operator supports Kerberos authentication for secure access to Hadoop clusters and services. This guide explains how to configure Kerberos authentication for Spark applications. PrerequisitesBefore using Kerberos authentication:
Configuration1. Create Kerberos SecretsCreate Kubernetes secrets containing your keytab and Kerberos configuration: # Create secret with keytab file
kubectl create secret generic spark-kerberos-keytab \
--from-file=krb5.keytab=/path/to/your/spark.keytab
# Create secret with Kerberos configuration
kubectl create secret generic spark-kerberos-config \
--from-file=krb5.conf=/path/to/your/krb5.conf2. Configure SparkApplicationAdd Kerberos configuration to your SparkApplication: apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-kerberos-app
namespace: default
spec:
sparkVersion: "4.0.0"
type: Scala
mode: cluster
image: "apache/spark:4.0.0"
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0.jar"
# Kerberos configuration
kerberos:
principal: "[email protected]"
keytabSecret: "spark-kerberos-keytab"
keytabFile: "krb5.keytab"
configSecret: "spark-kerberos-config"
configFile: "krb5.conf"
renewalCredentials: "keytab"
enabledServices: ["hadoopfs", "hive", "hbase"]
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: "4.0.0"
serviceAccount: spark-operator-spark
executor:
cores: 1
instances: 2
memory: "512m"
labels:
version: "4.0.0"Configuration OptionsCore Kerberos Settings
Advanced Options
JavaOptions Integration
spec:
sparkConf:
"spark.driver.extraJavaOptions": "-XX:+UseG1GC -Xms1g"
"spark.executor.extraJavaOptions": "-XX:+UseG1GC -Xms512m"
kerberos:
# Kerberos configuration will be merged with the above optionsTroubleshooting
For more examples and detailed configuration, see the spark-operator examples. |
Signed-off-by: josecsotomorales <[email protected]>
cd85a12 to
ed69652
Compare
|
Some questions:
It's been a long time since I've worked with Kerberos so it's going to take me some time to review this. And it concerns me in that we are adding support for critical things (like auth) we may not be able to support well in the future. |
Kerberos today: You can make Kerberos work with the operator today by hand (mounting keytabs/krb5, injecting Spark/Hadoop confs), but it’s brittle and undocumented. Multiple issues report failures or complexity around token acquisition, renewal, and path wiring. This PR makes that path first-class in the CRD and examples, so users aren’t reinventing it each time. How painful is it without this? Fairly high—lots of bespoke YAML, custom images, init scripts, and easy foot-guns around delegation tokens. Upstream Spark already defines the security model on K8s; we’re just wiring to it in a supported way.  Other auth methods we might need? For object stores, users rely on cloud identities (IRSA, Workload Identity, etc.), which are orthogonal to Kerberos and already work with Spark on K8s. For HiveServer2, LDAP/SIMPLE/PAM exist but are configured at the Hive level; the operator just passes through configs. So adding Kerberos here doesn’t preclude future methods.    Maintenance risk: The feature is opt-in and leverages Spark’s built-in token/renewal mechanisms. The operator’s surface is limited to CRD fields + secret mounting + conf plumbing, keeping ongoing support modest. @nabuskey here's the official Spark docs for Kerberos: https://spark.apache.org/docs/latest/security.html#kerberos  |
This commit introduces native Kerberos authentication support for secure Hadoop cluster access, optimized for Apache Spark 3.1+ and Kubernetes deployments.
Key Features
API Changes
KerberosSpectoSparkApplicationSpecwith comprehensive configurationSecretType.KerberosKeytabfor automatic keytab file handlingkeytabandccachecredential renewal strategieshadoopfs,hbase,hive,yarn)Implementation Details
KRB5_KEYTAB_FILE,KRB5_CONFIG)Configuration Example
Files Changed