Skip to content

Create thresholds response time limits for 'degraded' and 'failed' #837

@mxkaske

Description

@mxkaske

Allow the user to select the response time limits for when a monitor is degraded and failed.

Add two new db columns for monitor:

  • limitDegraded or thresholdDegraded (e.g.)
  • limitFailed or thresholdFailed (e.g.)

Both should be a number that the user selects (can be via select or input). We should have a max allowed number, e.g. 45sec (45.000ms).

Right now, we have the export const monitorStatus = ["active", "error"] as const; that is used within the monitor_status and monitor schema.

We could extend it with "degraded" to include that information within a specific monitor or within a specific monitor_status (based on the region).

Within a monitor overview, extend the cards:

Image

Important

Right now, the displayed data is coming from Tinybird. How to calculate the 'degraded' values?

How to calculate the 'degraded' value? Multiple options:

  1. extend our ping_response tb schema with a 'status' column and count the amount of different status (which will then be fixed as it will then be hardcoded in)
  2. within our metrics_endpoints, pass the limit props and calculate/count the amount based on the props (dynamic, as the user could change the threshold and we will calculate them differently)

Image

I like the second option as it is dynamic - and we don't have to extend the schema.

Open questions:

  • How should we notify the users about degraded services? (again, only if >50% of regions are degraded?) We might wanna include an additional boolean db column to allow alerts?
  • How to extend the Tracker class get the current status of the monitors, like "degraded" - and how to display it in the status page?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions