-
Notifications
You must be signed in to change notification settings - Fork 4.1k
feat: OpenTelemetry configuration and BaseApp instrumentation #25516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
server/api/server.go
Outdated
| return s.listener.Close() | ||
| } | ||
|
|
||
| // Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before merge, lets give the deprecate notices some more context, and encourage them to use otel
telemetry/config.go
Outdated
| cfgJson, err := json.Marshal(cfg) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to marshal telemetry config file: %w", err) | ||
| } | ||
| fmt.Printf("\nInitializing telemetry with config:\n%s\n\n", cfgJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume this is just for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I think that it is somewhat useful to have some debugging indicating whether opentelemetry is getting initialized or not. Maybe just a single line without dumping the config?
telemetry/config.go
Outdated
| } | ||
| } | ||
|
|
||
| func doInit() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quite a lot going on here - do you see this simplifying in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream, I think otelconf will add the file exporters so most of the cosmos extra will go away although we probably want to keep the runtime instrumentation stuff. And we may also want some other cosmos extra config options
| return sdkerrors.QueryResult(errorsmod.Wrap(sdkerrors.ErrUnknownRequest, "no query path provided"), app.trace), nil | ||
| } | ||
|
|
||
| // TODO: propagate context with span into the sdk.Context used for queries so that we can trace queries properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a tracking issue for this?
| # The fallback is the db_backend value set in CometBFT's config.toml. | ||
| app-db-backend = "{{ .BaseConfig.AppDBBackend }}" | ||
| ############################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this actually break existing telemetry users? like without this its not exactly deprecated but fully broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it breaks the testnet command which is using this to generate config files. So I can revert. I was just thinking that we shouldn't encourage users to use it by generating this in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can just put a deprecation notice comment here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did more testing, if we started a new app with this removed, then we'd lose access to all cosmos SDK metrics currently wired with the legacy go-metrics system.
Co-authored-by: Tyler <[email protected]>
|
if you don't have the grafana/otel-lgtm stack running, and you run the SDK with a valid otel configuration, there is quite a delay from pressing ctrl-c and killing the app, edit: and it displays a nil pointer exception error |
|
Some metrics I would like to have (maybe some of them exist)
As we keep debugging, my assumption is that we'll find more things, so I would also like good docs on how to contribute after this is merged in. |
|
Thanks for that input @gjermundgaraba! So generally, for each metric type, it would be good to know what sort of data type we want. In otel, we can basically choose from:
In this PR, I've leaned towards trace/span instrumentation of the ABCI lifecycle because we get a detailed execution trace and we can also analyze all the timing and count info from that. But we can really do any types of metrics you specify. To contribute, I'm suggesting that we follow the official otel go examples: https://opentelemetry.io/docs/languages/go/instrumentation/. But note, that we don't need to do any provider/exporter setup (this PR covers that), we just need to do the instrumentation. The roll dice example from the otel docs looks like a perfect minimal example: package main
import (
"fmt"
"io"
"log"
"math/rand"
"net/http"
"strconv"
"go.opentelemetry.io/contrib/bridges/otelslog"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/metric"
)
const name = "go.opentelemetry.io/otel/example/dice"
var (
tracer = otel.Tracer(name)
meter = otel.Meter(name)
logger = otelslog.NewLogger(name)
rollCnt metric.Int64Counter
)
func init() {
var err error
rollCnt, err = meter.Int64Counter("dice.rolls",
metric.WithDescription("The number of rolls by roll value"),
metric.WithUnit("{roll}"))
if err != nil {
panic(err)
}
}
func rolldice(w http.ResponseWriter, r *http.Request) {
ctx, span := tracer.Start(r.Context(), "roll")
defer span.End()
roll := 1 + rand.Intn(6)
var msg string
if player := r.PathValue("player"); player != "" {
msg = fmt.Sprintf("%s is rolling the dice", player)
} else {
msg = "Anonymous player is rolling the dice"
}
logger.InfoContext(ctx, msg, "result", roll)
rollValueAttr := attribute.Int("roll.value", roll)
span.SetAttributes(rollValueAttr)
rollCnt.Add(ctx, 1, metric.WithAttributes(rollValueAttr))
resp := strconv.Itoa(roll) + "\n"
if _, err := io.WriteString(w, resp); err != nil {
log.Printf("Write failed: %v\n", err)
}
} |
Description
This PR:
telemetry/package via https://pkg.go.dev/go.opentelemetry.io/contrib/otelconf/v0.3.0telemetry/which are based on https://pkg.go.dev/github.com/hashicorp/go-metrics which is under-maintained and requires mutex locks and map lookups on every telemetry methodBaseAppwith OpenTelemetry spans and block and tx counter metricsTestingMainfunction for tests to export telemetry datalog/slogdefault logger to send logs to OpenTelemetry and allows otelslog bridged to be used for logging. NOTE: this leaves a bit of a disconnect between the SDK's existing logging infrastructure which currently just writes to stdout - this can either be default with in this PR or dealt with in a follow upThe OpenTelemetry go libraries are very actively maintained, most vendors in the space are adding OpenTelemetry support and generally it seems like the industry is headed in this direction. Much of our existing telemetry code is to configure basic telemetry exporting, but with otelconf declarative config, we don't need to maintain any of this ourselves and the out of the box experience is quite simple even for usage in testing.