Skip to content

Commit 91476ae

Browse files
Add comprehensive Kerberos authentication support for Apache Spark applications
This commit implements complete Kerberos authentication support for Spark applications running on Kubernetes, providing secure access to Hadoop ecosystem services including HDFS, Hive, HBase, and other Kerberos-enabled components. Key Features: - Native Kerberos configuration in SparkApplication CRD - Automatic keytab and krb5.conf secret mounting - Spark 4.0+ compatibility with delegation token management - Configurable credential renewal strategies (keytab/ccache) - Service-specific Kerberos credential control - Comprehensive documentation and examples Implementation Details: - New KerberosSpec API with principal, keytab, and config options - SecretTypeKerberosKeytab for automatic environment variable setup - Automatic secret mounting to driver and executor pods - Spark configuration generation for Hadoop and Kerberos settings - Environment variable configuration for KRB5_KEYTAB_FILE and KRB5_PRINCIPAL - Support for custom keytab/config file names and mount paths Configuration Options: - principal: Kerberos principal name - keytabSecret/configSecret: Secret names containing keytab and krb5.conf - renewalCredentials: Credential renewal strategy (keytab/ccache) - enabledServices: Configurable Hadoop services for delegation tokens - keytabFile/configFile: Custom file names within secrets Files Modified: - API types and generated code for new Kerberos fields - Spark submission logic with automatic Kerberos configuration - Helm chart with new Kerberos values and updated CRDs - Comprehensive documentation with setup guide and examples - Unit tests covering all Kerberos configuration scenarios The implementation automatically handles Spark 4.0's validation requirements while using user-provided secrets for actual authentication, ensuring compatibility with existing Kubernetes secret management practices. Signed-off-by: josecsotomorales <[email protected]>
1 parent 44972be commit 91476ae

16 files changed

+1203
-6
lines changed

api/v1beta2/sparkapplication_types.go

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,9 @@ type SparkApplicationSpec struct {
142142
// scheduler backend since Spark 3.0.
143143
// +optional
144144
DynamicAllocation *DynamicAllocation `json:"dynamicAllocation,omitempty"`
145+
// Kerberos configures Kerberos authentication for Hadoop access.
146+
// +optional
147+
Kerberos *KerberosSpec `json:"kerberos,omitempty"`
145148
}
146149

147150
// SparkApplicationStatus defines the observed state of SparkApplication
@@ -604,6 +607,9 @@ const (
604607
// SecretTypeHadoopDelegationToken is for secrets from an Hadoop delegation token that needs the
605608
// environment variable HADOOP_TOKEN_FILE_LOCATION.
606609
SecretTypeHadoopDelegationToken SecretType = "HadoopDelegationToken"
610+
// SecretTypeKerberosKeytab is for secrets from a Kerberos keytab file that needs the
611+
// environment variable KRB5_KEYTAB_FILE.
612+
SecretTypeKerberosKeytab SecretType = "KerberosKeytab"
607613
// SecretTypeGeneric is for secrets that needs no special handling.
608614
SecretTypeGeneric SecretType = "Generic"
609615
)
@@ -717,3 +723,41 @@ type DynamicAllocation struct {
717723
// +optional
718724
ShuffleTrackingTimeout *int64 `json:"shuffleTrackingTimeout,omitempty"`
719725
}
726+
727+
// KerberosSpec defines the Kerberos authentication configuration for Hadoop access.
728+
type KerberosSpec struct {
729+
// Principal is the Kerberos principal name for authentication.
730+
// +optional
731+
Principal *string `json:"principal,omitempty"`
732+
// KeytabSecret is the name of the secret containing the Kerberos keytab file.
733+
// +optional
734+
KeytabSecret *string `json:"keytabSecret,omitempty"`
735+
// KeytabFile is the path to the keytab file within the keytab secret.
736+
// Defaults to "krb5.keytab" if not specified.
737+
// +optional
738+
KeytabFile *string `json:"keytabFile,omitempty"`
739+
// ConfigSecret is the name of the secret containing the Kerberos configuration file (krb5.conf).
740+
// +optional
741+
ConfigSecret *string `json:"configSecret,omitempty"`
742+
// ConfigFile is the path to the krb5.conf file within the config secret.
743+
// Defaults to "krb5.conf" if not specified.
744+
// +optional
745+
ConfigFile *string `json:"configFile,omitempty"`
746+
// Realm is the Kerberos realm. This is optional and can be inferred from the principal.
747+
// +optional
748+
Realm *string `json:"realm,omitempty"`
749+
// KDC is the Key Distribution Center address.
750+
// +optional
751+
KDC *string `json:"kdc,omitempty"`
752+
// RenewalCredentials specifies the credential renewal strategy.
753+
// Valid values are "keytab" (default) and "ccache".
754+
// "keytab" enables automatic renewal using the provided keytab.
755+
// "ccache" uses existing ticket cache (requires manual ticket management).
756+
// +optional
757+
// +kubebuilder:validation:Enum={keytab,ccache}
758+
RenewalCredentials *string `json:"renewalCredentials,omitempty"`
759+
// EnabledServices specifies which Hadoop services should have Kerberos credentials enabled.
760+
// Defaults to ["hadoopfs", "hbase", "hive"] if not specified.
761+
// +optional
762+
EnabledServices []string `json:"enabledServices,omitempty"`
763+
}

api/v1beta2/zz_generated.deepcopy.go

Lines changed: 65 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

charts/spark-operator-chart/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ name: spark-operator
2020

2121
description: A Helm chart for Spark on Kubernetes operator.
2222

23-
version: 2.3.0
23+
version: 2.4.0
2424

25-
appVersion: 2.3.0
25+
appVersion: 2.4.0
2626

2727
keywords:
2828
- apache spark

charts/spark-operator-chart/README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# spark-operator
22

3-
![Version: 2.3.0](https://img.shields.io/badge/Version-2.3.0-informational?style=flat-square) ![AppVersion: 2.3.0](https://img.shields.io/badge/AppVersion-2.3.0-informational?style=flat-square)
3+
![Version: 2.4.0](https://img.shields.io/badge/Version-2.4.0-informational?style=flat-square) ![AppVersion: 2.4.0](https://img.shields.io/badge/AppVersion-2.4.0-informational?style=flat-square)
44

55
A Helm chart for Spark on Kubernetes operator.
66

@@ -173,6 +173,12 @@ See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command docum
173173
| spark.serviceAccount.automountServiceAccountToken | bool | `true` | Auto-mount service account token to the spark applications pods. |
174174
| spark.rbac.create | bool | `true` | Specifies whether to create RBAC resources for spark applications. |
175175
| spark.rbac.annotations | object | `{}` | Optional annotations for the spark application RBAC resources. |
176+
| spark.kerberos.enabled | bool | `false` | Enable Kerberos authentication support for Spark applications. |
177+
| spark.kerberos.defaultPrincipal | string | `""` | Default Kerberos principal for authentication (can be overridden per application). Example: [email protected] |
178+
| spark.kerberos.defaultRealm | string | `""` | Default Kerberos realm (can be overridden per application). Example: EXAMPLE.COM |
179+
| spark.kerberos.defaultKDC | string | `""` | Default Kerberos KDC address (can be overridden per application). Example: kdc.example.com:88 |
180+
| spark.kerberos.defaultKeytabSecret | string | `""` | Name of the secret containing the default Kerberos keytab file. This secret should contain a file named 'krb5.keytab' |
181+
| spark.kerberos.defaultConfigSecret | string | `""` | Name of the secret containing the default Kerberos configuration (krb5.conf). This secret should contain a file named 'krb5.conf' |
176182
| prometheus.metrics.enable | bool | `true` | Specifies whether to enable prometheus metrics scraping. |
177183
| prometheus.metrics.port | int | `8080` | Metrics port. |
178184
| prometheus.metrics.portName | string | `"metrics"` | Metrics port name. |

charts/spark-operator-chart/crds/sparkoperator.k8s.io_scheduledsparkapplications.yaml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10342,6 +10342,57 @@ spec:
1034210342
items:
1034310343
type: string
1034410344
type: array
10345+
kerberos:
10346+
description: Kerberos configures Kerberos authentication for Hadoop
10347+
access.
10348+
properties:
10349+
configFile:
10350+
description: |-
10351+
ConfigFile is the path to the krb5.conf file within the config secret.
10352+
Defaults to "krb5.conf" if not specified.
10353+
type: string
10354+
configSecret:
10355+
description: ConfigSecret is the name of the secret containing
10356+
the Kerberos configuration file (krb5.conf).
10357+
type: string
10358+
enabledServices:
10359+
description: |-
10360+
EnabledServices specifies which Hadoop services should have Kerberos credentials enabled.
10361+
Defaults to ["hadoopfs", "hbase", "hive"] if not specified.
10362+
items:
10363+
type: string
10364+
type: array
10365+
kdc:
10366+
description: KDC is the Key Distribution Center address.
10367+
type: string
10368+
keytabFile:
10369+
description: |-
10370+
KeytabFile is the path to the keytab file within the keytab secret.
10371+
Defaults to "krb5.keytab" if not specified.
10372+
type: string
10373+
keytabSecret:
10374+
description: KeytabSecret is the name of the secret containing
10375+
the Kerberos keytab file.
10376+
type: string
10377+
principal:
10378+
description: Principal is the Kerberos principal name for
10379+
authentication.
10380+
type: string
10381+
realm:
10382+
description: Realm is the Kerberos realm. This is optional
10383+
and can be inferred from the principal.
10384+
type: string
10385+
renewalCredentials:
10386+
description: |-
10387+
RenewalCredentials specifies the credential renewal strategy.
10388+
Valid values are "keytab" (default) and "ccache".
10389+
"keytab" enables automatic renewal using the provided keytab.
10390+
"ccache" uses existing ticket cache (requires manual ticket management).
10391+
enum:
10392+
- keytab
10393+
- ccache
10394+
type: string
10395+
type: object
1034510396
mainApplicationFile:
1034610397
description: MainFile is the path to a bundled JAR, Python, or
1034710398
R file of the application.

charts/spark-operator-chart/crds/sparkoperator.k8s.io_sparkapplications.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10260,6 +10260,56 @@ spec:
1026010260
items:
1026110261
type: string
1026210262
type: array
10263+
kerberos:
10264+
description: Kerberos configures Kerberos authentication for Hadoop
10265+
access.
10266+
properties:
10267+
configFile:
10268+
description: |-
10269+
ConfigFile is the path to the krb5.conf file within the config secret.
10270+
Defaults to "krb5.conf" if not specified.
10271+
type: string
10272+
configSecret:
10273+
description: ConfigSecret is the name of the secret containing
10274+
the Kerberos configuration file (krb5.conf).
10275+
type: string
10276+
enabledServices:
10277+
description: |-
10278+
EnabledServices specifies which Hadoop services should have Kerberos credentials enabled.
10279+
Defaults to ["hadoopfs", "hbase", "hive"] if not specified.
10280+
items:
10281+
type: string
10282+
type: array
10283+
kdc:
10284+
description: KDC is the Key Distribution Center address.
10285+
type: string
10286+
keytabFile:
10287+
description: |-
10288+
KeytabFile is the path to the keytab file within the keytab secret.
10289+
Defaults to "krb5.keytab" if not specified.
10290+
type: string
10291+
keytabSecret:
10292+
description: KeytabSecret is the name of the secret containing
10293+
the Kerberos keytab file.
10294+
type: string
10295+
principal:
10296+
description: Principal is the Kerberos principal name for authentication.
10297+
type: string
10298+
realm:
10299+
description: Realm is the Kerberos realm. This is optional and
10300+
can be inferred from the principal.
10301+
type: string
10302+
renewalCredentials:
10303+
description: |-
10304+
RenewalCredentials specifies the credential renewal strategy.
10305+
Valid values are "keytab" (default) and "ccache".
10306+
"keytab" enables automatic renewal using the provided keytab.
10307+
"ccache" uses existing ticket cache (requires manual ticket management).
10308+
enum:
10309+
- keytab
10310+
- ccache
10311+
type: string
10312+
type: object
1026310313
mainApplicationFile:
1026410314
description: MainFile is the path to a bundled JAR, Python, or R file
1026510315
of the application.

charts/spark-operator-chart/values.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,31 @@ spark:
409409
# -- Optional annotations for the spark application RBAC resources.
410410
annotations: {}
411411

412+
# Kerberos configuration for Spark applications
413+
kerberos:
414+
# -- Enable Kerberos authentication support for Spark applications.
415+
enabled: false
416+
417+
# -- Default Kerberos principal for authentication (can be overridden per application).
418+
# Example: [email protected]
419+
defaultPrincipal: ""
420+
421+
# -- Default Kerberos realm (can be overridden per application).
422+
# Example: EXAMPLE.COM
423+
defaultRealm: ""
424+
425+
# -- Default Kerberos KDC address (can be overridden per application).
426+
# Example: kdc.example.com:88
427+
defaultKDC: ""
428+
429+
# -- Name of the secret containing the default Kerberos keytab file.
430+
# This secret should contain a file named 'krb5.keytab'
431+
defaultKeytabSecret: ""
432+
433+
# -- Name of the secret containing the default Kerberos configuration (krb5.conf).
434+
# This secret should contain a file named 'krb5.conf'
435+
defaultConfigSecret: ""
436+
412437
prometheus:
413438
metrics:
414439
# -- Specifies whether to enable prometheus metrics scraping.

config/crd/bases/sparkoperator.k8s.io_scheduledsparkapplications.yaml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10342,6 +10342,57 @@ spec:
1034210342
items:
1034310343
type: string
1034410344
type: array
10345+
kerberos:
10346+
description: Kerberos configures Kerberos authentication for Hadoop
10347+
access.
10348+
properties:
10349+
configFile:
10350+
description: |-
10351+
ConfigFile is the path to the krb5.conf file within the config secret.
10352+
Defaults to "krb5.conf" if not specified.
10353+
type: string
10354+
configSecret:
10355+
description: ConfigSecret is the name of the secret containing
10356+
the Kerberos configuration file (krb5.conf).
10357+
type: string
10358+
enabledServices:
10359+
description: |-
10360+
EnabledServices specifies which Hadoop services should have Kerberos credentials enabled.
10361+
Defaults to ["hadoopfs", "hbase", "hive"] if not specified.
10362+
items:
10363+
type: string
10364+
type: array
10365+
kdc:
10366+
description: KDC is the Key Distribution Center address.
10367+
type: string
10368+
keytabFile:
10369+
description: |-
10370+
KeytabFile is the path to the keytab file within the keytab secret.
10371+
Defaults to "krb5.keytab" if not specified.
10372+
type: string
10373+
keytabSecret:
10374+
description: KeytabSecret is the name of the secret containing
10375+
the Kerberos keytab file.
10376+
type: string
10377+
principal:
10378+
description: Principal is the Kerberos principal name for
10379+
authentication.
10380+
type: string
10381+
realm:
10382+
description: Realm is the Kerberos realm. This is optional
10383+
and can be inferred from the principal.
10384+
type: string
10385+
renewalCredentials:
10386+
description: |-
10387+
RenewalCredentials specifies the credential renewal strategy.
10388+
Valid values are "keytab" (default) and "ccache".
10389+
"keytab" enables automatic renewal using the provided keytab.
10390+
"ccache" uses existing ticket cache (requires manual ticket management).
10391+
enum:
10392+
- keytab
10393+
- ccache
10394+
type: string
10395+
type: object
1034510396
mainApplicationFile:
1034610397
description: MainFile is the path to a bundled JAR, Python, or
1034710398
R file of the application.

0 commit comments

Comments
 (0)