The monitoring and telemetry system has the following high-level structure:
flowchart TD
A[Threshold Dashboard] -->|Deposit data| B(Second track deposit verifier)
B --> |Status| A
A --> |Telemetry| C(Sentry)
D[TBTCv2 minters and guardians] --> |Telemetry| C
C -->|Alerts| E(PagerDuty)
C --> |Alerts| F(Discord)
G[TBTCv2 monitoring] --> |Alerts| C
G --> |Notifications| F
There are several sources of monitoring and telemetry data:
Monitoring and telemetry data are gathered by Sentry hub. Collected data are used to produce and send alerts to:
-
Discord channels
Each source produces a different kind of monitoring and telemetry data. Here is a detailed description of each data source.
The dashboard instances collect the following telemetry data that are sent directly to Sentry hub:
-
generated deposit data,
-
unhandled errors.
Each time a new deposit address is generated, the dashboard collects the following data:
-
BTC address of the deposit
-
JSON recovery file content
-
Second track deposit verification result
Worth noting that those data are send to the telemetry regardless of whether the user actually funded and revealed the deposit. Those data are collected in order to help users with potential deposit reveals that have not occurred automatically and deposit recovery in case of wallet misbehavior. Last but not least, all deposit addresses are checked against the second track deposit verifier to make sure the address generation logic used by the dashboard is proper and will not lead to fund loss.
Note
|
The second track deposit verifier is actually a Google Cloud Function named
|
The minters and guardians instances collect the following telemetry data that are sent directly to Sentry hub:
-
handled processing and validation errors,
-
unhandled errors.
During their work, the minters and guardians may encounter several errors related to the processing logic and deposit validation. Most of them are explicitly sent to the telemetry as errors or warnings. Examples of such errors are:
-
Revert of an Ethereum transaction created by the bot
-
Failed validation of a revealed deposit
The monitoring component inspects the tBTC v2 system on-chain contracts and
produce different kind of system events that are sent to Sentry and Discord
based on their type. A general rule of thumb is that notifications
(i.e. informational sytem events) are sent directly to Discord *-notifications
channels while alerts requiring an action (i.e. warning/critical system events)
are propagated to the Sentry hub that decides about next steps. Specific system
events produced by the monitoring component are:
-
deposit revealed,
-
redemption requested,
-
wallet registered,
-
DKG result submitted,
-
DKG result approved,
-
DKG result challenged,
-
large deposit revealed,
-
large redemption requested,
-
stale redemption,
-
optimistic minting canceled,
-
optimistic minting requested too early,
-
optimistic minting requested for undetermined Bitcoin transaction,
-
optimistic minting not requested by designated minter,
-
optimistic minting not finalized by designated minter,
-
optimistic minting not requested by any minter,
-
optimistic minting not finalized by any minter,
-
high TBTC token total supply change.
An informational system event indicating that a new deposit was revealed to the on-chain Bridge contract. This event is directly sent to Discord as a notification that does not require any action.
An informational system event indicating that a new redemption was requested from the on-chain Bridge contract. This event is directly sent to Discord as a notification that does not require any action.
An informational system event indicating that a new wallet was registered on the on-chain Bridge contract. This event is directly sent to Discord as a notification that does not require any action.
An informational system event indicating that a new DKG result was submitted to the on-chain WalletRegistry contract. This event is directly sent to Discord as a notification that does not require any action.
An informational system event indicating that the submitted DKG result was approved on the on-chain WalletRegistry contract. This event is directly sent to Discord as a notification that does not require any action.
A critical system event indicating that the submitted DKG result was challenged on the on-chain WalletRegistry contract. This event is sent to Sentry hub and requires an immediate team’s action. The default action is checking the reason of the challenge as that event may indicate a malicious wallet operator or a serious bug in the off-chain client code.
A warning system event indicating that a large deposit was revealed to the on-chain Bridge contract. This event is sent to Sentry hub and should get team’s attention. The default action is making sure that the deposit is handled correctly by the system.
A warning system event indicating that a large redemption was requested from the on-chain Bridge contract. This event is sent to Sentry hub and should get team’s attention. The default action is making sure that the redemption is not a result of a malicious action, and if not, that the redemption is handled correctly by the system.
A warning system event indicating that a redemption request became stale, i.e. was not handled within the expected time. This event is sent to Sentry hub and should get team’s attention. The default action is investigating the cause of the extended processing time as this alert may be an early sign of a malfunctioning wallet or may indicate a problem with the maintainer bot.
A warning system event indicating that an optimistic minting request was cancelled by a guardian. This event is sent to Sentry hub and should get team’s attention. The default action is checking the reason of cancellation as that event may indicate a malicious minter or guardian that should be evicted from the system.
A critical system event indicating that an optimistic minting request was issued too early regarding their BTC funding transaction confirmation state. This event is sent to Sentry hub and requires an immediate team’s action. The default action is checking the reason of the early request as that event may indicate a malicious minter that should be evicted from the system.
A critical system event indicating that an optimistic minting request was done for an undetermined Bitcoin transaction. This event is sent to Sentry hub and requires an immediate team’s action. The default action is checking why the Bitcoin transaction cannot be determined as that event may indicate problems with the underlying Bitcoin client used by the monitoring component or flag a malicious minter that should be evicted from the system.
A warning system event indicating that an optimistic minting request was not issued by the designated minter and another minter did that job. This event is sent to Sentry hub and should get team’s attention. The default action is investigating the cause of the designated minter idleness as the designated minter may be unhealthy/malicious or there may be a bug in the minters bot code.
A warning system event indicating that an optimistic minting request was not finalized by the designated minter and another minter did that job. This event is sent to Sentry hub and should get team’s attention. The default action is investigating the cause of the designated minter idleness as the designated minter may be unhealthy/malicious or there may be a bug in the minters bot code.
A warning system event indicating that an optimistic minting request was not issued by any minter. This event is sent to Sentry hub and should get team’s attention. The default action is investigating the cause of the minters idleness as the underlying deposit may be invalid, minters may be unhealthy/malicious or there may be a bug in the minters bot code.
A warning system event indicating that an optimistic minting request was not finalized by any minter. This event is sent to Sentry hub and should get team’s attention. The default action is investigating the cause of the minters idleness as the underlying deposit may be invalid, minters may be unhealthy/malicious or there may be a bug in the minters bot code.
A critical system event indicating that a high change (i.e. >=10%) of the total TBTC v2 token supply took place in the last 12 hours. This event is sent to Sentry hub and requires an immediate team’s action. The default action is checking the root cause of the supply change and making sure its source is actually a proper deposit/redemption and there are no signs of any malicious action.
The monitoring and telemetry system uses Sentry as hub for relevant monitoring and telemetry data that requires an action from the team. Here is a detailed description of this component.
The Sentry application has been configured in the following way:
-
There is a Keep organization that groups all invited members under the #Keep team
-
There are projects corresponding to specific monitoring and telemetry data sources:
-
prod-threshold-dashboard that collects telemetry from the production (mainnet) Threshold dashboard as well as from production previews
-
test-threshold-dashboard that collects telemetry from the test (Goerli) Threshold dashboard as well as from test previews
-
prod-tbtc-v2-minters-guardians that collects telemetry from production (mainnet) TBTCv2 minters and guardians instances
-
test-tbtc-v2-minters-guardians that collects telemetry from test (Goerli) TBTCv2 minters and guardians instances
-
prod-tbtc-v2-monitoring that collects alerts (i.e. warning/critical system events) from the production (mainnet) TBTCv2 monitoring instance
-
test-tbtc-v2-monitoring that collects alerts (i.e. warning/critical system events) from the test (Goerli) TBTCv2 monitoring instance
-
As mentioned earlier, Sentry uses the collected monitoring and telemetry data
to raise alerts that are propagated to PagerDuty and Discord *-alerts
channels.
Here is the exact summary of configured alert rules:
Alert name | Project | Firing conditions | Notified entities |
---|---|---|---|
When deposit address returned by the second track deposit verifier is different from the address generated by the dashboard |
PagerDuty and Discord |
||
When deposit address returned by the second track deposit verifier is different from the address generated by the dashboard |
Discord |
||
When a new alert (i.e. warning/critical system event) is received from the TBTCv2 monitoring component |
Discord |
||
When a new critical alert (i.e. critical system event) is received from the TBTCv2 monitoring component |
PagerDuty |
||
When a new alert (i.e. warning/critical system event) is received from the TBTCv2 monitoring component |
Discord |