Skip to content

Auto restart all instance types if there is no activity #3474

Open
@ramonsmits

Description

@ramonsmits

Describe the feature.

Is your feature related to a problem? Please describe.

Sporadically we see users mention that ingestion halted. Logs provide no insights and show no issues but after users restart that instance ingestion resumes.

Users detect such issues because their queues reach their quotas which has other side effects.

Describe the requested feature

The feature could keep track how much time has passed after the last ingested message and if there has been no activity in for example 5 minutes the instance should trigger a termination sequence so that the host will restart the instance.

It could be that there are actually no messages in the queue and the restart was not required but then at least the instance is running as a fresh process.

As an alternative, this could also be done after a certain duration although that could be handled in the environment via a scheduled task (restart service every day at 02:00 AM).

Optionally expose the "last message received timestamp" to a JSON result on a /health API

Queue monitoring

This logic could be enhanced by querying the age of the oldest message in the queue (or alternatively, the length of a queue). If the queue is empty then it is expected that there is no activity but otherwise, this indicates that the message pump is no longer working and we are in an unrecoverable state and should terminate.

Workaround

  1. Run a script at a fixed interval to stop/start each instance
  2. Script or application that inspects the relevant queues as described above and then to stop/start the corresponding instance

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions