Skip to content

Monitoring API: Add Metric server config #2322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions config/v1alpha1/types_cluster_monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ limitations under the License.
package v1alpha1

import (
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

Expand Down Expand Up @@ -77,6 +79,13 @@ type ClusterMonitoringSpec struct {
// userDefined set the deployment mode for user-defined monitoring in addition to the default platform monitoring.
// +required
UserDefined UserDefinedMonitoring `json:"userDefined"`

// metricsServer defines the configuration for the Metrics Server component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this field allow a user to configure about the metrics server component? (i.e why would I, as a user, care to do anything with this field?)

// The Metrics Server provides container resource metrics for use in autoscaling pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understood what most of this jargon meant, if you are looking to explain what the metrics server does, I would recommend being a bit more verbose. From the metrics-server doc site, this is a pretty good and succinct summary IMO:

Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.

This gives me, as a user, a bit more context as to why I might care about doing some configuration of this field

// When omitted, this means no opinion and the platform is left to choose a default,
// which is subject to change over time.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit mention that the field is optional in the GoDoc. "When omitted" implies that, but I generally prefer being explicit.

MetricsServerConfig MetricsServerConfig `json:"metricsServer,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you want to continue with discoverability as in #2148 - you'll want to drop the omitempty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies throughout the APIs for wherever you want discoverability.

}

// UserDefinedMonitoring config for user-defined projects.
Expand All @@ -101,3 +110,146 @@ const (
// UserDefinedNamespaceIsolated enables monitoring for user-defined projects with namespace-scoped tenancy. This ensures that metrics, alerts, and monitoring data are isolated at the namespace level.
UserDefinedNamespaceIsolated UserDefinedMode = "NamespaceIsolated"
)

// The MetricsServerConfig resource defines settings for the Metrics Server component.
// This configuration allows users to customize how the Metrics Server is deployed
// and how it operates within the cluster.
type MetricsServerConfig struct {
// audit defines the audit configuration used by the Metrics Server instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "audit configuration"? Why should I care about configuring this?

// When omitted, this means no opinion and the platform is left to choose a default,
// which is subject to change over time. The current default is "metadata".
// The audit field is optional.
// +optional
Audit *Audit `json:"audit,omitempty"`

// nodeSelector is the node selector applied to metrics server pods.
// nodeSelector is optional.
//
// When omitted, this means the user has no opinion and the platform is left
// to choose reasonable defaults. These defaults are subject to change over time.
// The current default is `kubernetes.io/os: linux` so that Pods can be scheduled onto any available node.
// +optional
NodeSelector map[string]string `json:"nodeSelector,omitempty"`

// tolerations is a list of tolerations applied to metrics server components
// tolerations is optional.
//
// When omitted, this means the user has no opinion and the platform is left
// to choose reasonable defaults. These defaults are subject to change over time.
Comment on lines +137 to +138
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actually going to be assigning defaults here?

If we aren't ever going to be configuring defaults for a field, we don't need to put this "no opinion" blurb here.

// Maximum length for this list is 10
// +kubebuilder:validation:MaxItems=10
// +optional
Tolerations []v1.Toleration `json:"tolerations,omitempty"`

// resources defines the compute resource requests and limits for the Metrics Server container.
// This includes CPU, memory and HugePages constraints to help control scheduling and resource usage.
// When not specified, defaults are used by the platform. Requests cannot exceed limits.
// Resources is optional.
// More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
// This is a simplified API that maps to Kubernetes ResourceRequirements.
// +optional
Resources *MetricServerConfigContainerResources `json:"resources,omitempty"`

// topologySpreadConstraints defines rules for how Metrics Server Pods should be distributed
// across topology domains such as zones, nodes, or other user-defined labels.
// topologySpreadConstraints is optional.
// This helps improve high availability and resource efficiency by avoiding placing
// too many replicas in the same failure domain.
//
// When omitted, this means no opinion and the platform is left to choose a default, which is subject to change over time.
// This field maps directly to the `topologySpreadConstraints` field in the Pod spec.
// Maximum length for this list is 10
// +kubebuilder:validation:MaxItems=10
// +optional
TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}

// Audit defines the configuration for Metrics Server audit logging.
type Audit struct {
// profile specifies the audit log level to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we elaborate here a bit more? To use for what? Why might I care to modify this as a user?

// Valid values are:
// - "metadata" - log metadata about requests (default)
// - "request" - log metadata and request payloads
// - "requestresponse" - log metadata, requests, and responses
// - "none" - don't log requests
Comment on lines +170 to +174
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with OpenShift APIs, I encourage following the format of:

// Allowed values are Metadata, Request, RequestResponse, None, and omitted
// When set to Metadata, ...
// When set to RequestResponse, ...
// ...
// When omitted, ...

// The default audit log level is "metadata"
//
// See: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#audit-policy
// for more details about audit logging.
Comment on lines +177 to +178
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we generally encourage to not link to external documentation wherever possible. Is there information that could be taken from this documentation that we think is important for users to understand?

//
// +kubebuilder:validation:Enum=metadata;request;requestresponse;none
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All values should be PascalCase:

Suggested change
// +kubebuilder:validation:Enum=metadata;request;requestresponse;none
// +kubebuilder:validation:Enum=Metadata;Request;RequestResponse;None

// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit mention of optional in the GoDoc.

Profile AuditProfileType `json:"profile,omitempty"`
}

// AuditProfileType defines the audit policy profile type.
// +kubebuilder:validation:Enum=none;metadata;request;requestresponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put the enum marker on both the type and the field - pick one or the other. Using both causes an allOf in the OpenAPI schema for the field (see https://github.com/openshift/api/pull/2322/files#diff-2a283f1c7a9ed75159eeb51f06ea88be271d34a335475d4155ea8d8ee364fcb5R66-R76 ) which can end up causing weird behaviors if any drift happens.

type AuditProfileType string

const (
// None - don't log events that match this rule.
AuditProfileTypeNone AuditProfileType = "none"

// Metadata - log events with metadata (requesting user, timestamp, resource, verb, etc.) but not request or response body.
AuditProfileTypeMetadata AuditProfileType = "metadata"

// Request - log events with request metadata and body but not response body. This does not apply for non-resource requests.
AuditProfileTypeRequest AuditProfileType = "request"

// RequestResponse - log events with request metadata, request body and response body. This does not apply for non-resource requests.
AuditProfileTypeRequestResponse AuditProfileType = "requestresponse"
Comment on lines +191 to +200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PascalCase

)

// ResourceSpec defines the requested and limited value of a resource.
type ResourceSpec struct {
// request is the minimum amount of the resource required (e.g. "2Mi", "1Gi").
// This field is optional.
// +optional
Request resource.Quantity `json:"request,omitempty"`

// limit is the maximum amount of the resource allowed (e.g. "2Mi", "1Gi").
// This field is optional.
// +optional
Limit resource.Quantity `json:"limit,omitempty"`
}

// MetricServerConfigContainerResources defines simplified resource requirements for a container.
type MetricServerConfigContainerResources struct {
// cpu defines the CPU resource limits and requests.
// This filed is optional
// +optional
CPU *ResourceSpec `json:"cpu,omitempty"`

// memory defines the memory resource limits and requests.
// This filed is optional
// +optional
Memory *ResourceSpec `json:"memory,omitempty"`

// hugepages is a list of hugepage resource specifications by page size.
// defines an optional list of unique configurations identified by their `size` field.
// A maximum of 10 items is allowed.
// The list is treated as a map, using `size` as the key
// +optional
// +listType=map
// +listMapKey=size
// +kubebuilder:validation:MaxItems=10
HugePages []HugePageResource `json:"hugepages,omitempty"`
}

// HugePageResource describes hugepages resources by page size (e.g. 2Mi, 1Gi).
type HugePageResource struct {
// size of the hugepage (e.g. "2Mi", "1Gi").
// This field is required.
// +required
Size resource.Quantity `json:"size"`

// request amount for this hugepage size.
// This filed is optional
// +optional
Request resource.Quantity `json:"request,omitempty"`

// limit amount for this hugepage size.
// This filed is optional
// +optional
Limit resource.Quantity `json:"limit,omitempty"`
}
Loading