Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit timeout handling #254

Merged
merged 11 commits into from
Mar 25, 2025
Merged

revisit timeout handling #254

merged 11 commits into from
Mar 25, 2025

Conversation

cbarbian-sap
Copy link
Contributor

@cbarbian-sap cbarbian-sap commented Mar 18, 2025

This PR is about revisiting/improving the timeout handling of components.

Logic of the processing/timeout flow

It is well-known that every component has a processing timeout. Components can specify the timeout value by implementing the component.TimeoutConfiguration interface. Otherwise (or if a zero timeout is specified), it will be defaulted by the effective requeue interval, which defaults to 10 minutes.

Then, note that a component can be in a 'processing' or 'non-processing' state (which is not directly related to status.state being Processing). Here, 'processing' means that status.processingSince is non-initial. Now, if a component is reconciled, a certain component digest is calculated from the component's annotations, spec and references in the spec (see below for more details about references). Whenever this component digest differs from the current status.processingDigest, then status.processingSince is set to the current time, and status.processingDigest is set to the new component digest.
Roughly spoken, that means a new timeout countdown is started.

In addition to 'processing' a component can be in a 'timeout' state; this is the case if the status.processingSince timestamp lies more than the specified timeout duration in the past. If a component gets into the 'timeout' state

  • in non-error situations, then the component status (that is status.state) will be set to Error with condition reason Timeout
  • in error situations, then the component status, then the component status will be according to the error (that is, Error or Pending), and the condition reason is set to Timeout.

That means, a timeout can always be reliably detected by checking if the condition reason equals Timeout.

A 'processing' component will be set to 'non-processing' (that is, status.processingSince is cleared) if the component becomes ready (in that case, in addition, one immediate requeue is triggered).

Calculation of the component digest

At the beginning of the reconcilation of a component, a (component) digest is calculated that considers

  • the metadata.annotations of the component
  • the metadata.generation resp. the spec of the component
  • the loaded content of all spec fields having one of the following types:ConfigMapReference, ConfigMapKeyReference, SecretReference, SecretKeyReference, Reference.

Such references will be automatically loaded at the beginning of the reconcile iteration; for the builtinConfigMap and Secret reference types the logic is part of the framework, and for types implementing the

type Reference[T Component] interface {
	Load(ctx context.Context, clnt client.Client, component T) error
	Digest() string
}

interface, the loading and digest logic is to be provided by the implementation. Besides being used in the timeout handling as status.processingDigest, the component digest

  • is used when calculating event annotations
  • is passed to generators in their context
  • is used when calculating the object digest of dependent objects with an effective reconcile policy of OnObjectOrComponentChange.

Roughly speaking, the component digest should identify result of reconciling the component as exact as possible; that means: applying two components with identical digest should produce the same cluster state.

Incompatible changes

Besides the changes outlined above (which should have a bad impact) that PR contains the following incompatible changes:

  • so far, if a retriable error occurred, then status.state was set to Pending with reason Pending, respectively to DeletionPending with reason DeletionPending; the reason values are changed to Retrying and DeletionRetrying, respectively
  • a new reason Restarting was added, that will be used with status.state being Pending, if the processing state of a component is reset due to a component digest change.

@cbarbian-sap cbarbian-sap merged commit ab08dfb into main Mar 25, 2025
8 checks passed
@cbarbian-sap cbarbian-sap deleted the revisit-timeout branch March 25, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant