-
Notifications
You must be signed in to change notification settings - Fork 21
Add features flag API proposal #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,139 @@ | ||||||
| ## Feature Flags API | ||||||
|
|
||||||
| * **Owners:** | ||||||
| * `@roidelapluie` | ||||||
|
|
||||||
| * **Implementation Status:** Not implemented | ||||||
|
|
||||||
| * **Related Issues and PRs:** | ||||||
| * [Feature request](https://github.com/prometheus/prometheus/issues/10022) | ||||||
| * [Example use case](https://github.com/grafana/grafana/issues/33487) | ||||||
|
|
||||||
| This design document proposes introducing a "features list" API within the Prometheus ecosystem. This API would allow Prometheus-like endpoints to advertise which features they support and have enabled. By exposing this information, clients can determine in advance what functionality is available on a given endpoint, leading to more efficient API usage, optimized PromQL queries, and clearer expectations about endpoint capabilities. | ||||||
|
|
||||||
| The primary objectives are to create a solution that is broadly applicable across various targets, encouraging wide adoption, and to address practical needs and optimizations that arise when such capability information is easily accessible. | ||||||
|
|
||||||
| ## Why | ||||||
|
|
||||||
| Over time, the Prometheus APIs have undergone numerous optimizations, such as supporting POST in addition to GET requests and allowing filtering on certain API endpoints. Additionally, new APIs, PromQL functions, and capabilities are regularly introduced. Some of these features are optional and can be enabled or disabled by users. | ||||||
|
|
||||||
| Without a "features API," new advancements are often underutilized because API clients are hesitant to adopt them before widespread support exists among users. By creating an API that clearly communicates available and enabled features, clients can take advantage of new capabilities as soon as they are released. For instance, HTTP POST support was added to Prometheus in version 2.1.0 (2018) but was not adopted as the default in Grafana until version 8.0 (2021), illustrating a three-year delay caused by limited visibility of feature availability. | ||||||
|
|
||||||
| ### Pitfalls of the current solution | ||||||
|
|
||||||
| Currently, there is no proper solution for feature discovery. While users can retrieve configs or version flags, these APIs are tightly coupled to Prometheus, not machine-friendly, and unsuitable for third-party or generic integrations. | ||||||
|
|
||||||
| There are client-side workarounds. In Grafana, users can configure datasources like this: | ||||||
|
|
||||||
| ```yaml | ||||||
| prometheusType: Prometheus # Options: Cortex | Mimir | Prometheus | Thanos | ||||||
| prometheusVersion: 2.40.0 | ||||||
| ``` | ||||||
|
|
||||||
| Grafana infers compatibility from these values and selects endpoints accordingly. For instance, all of the following support label matchers in the Labels API: | ||||||
|
|
||||||
| - Prometheus >= 2.24.0 | ||||||
| - Mimir >= 2.0.0 | ||||||
| - Cortex >= 1.11.0 | ||||||
| - Thanos >= 0.18.0 | ||||||
|
|
||||||
| If the criteria are met, Grafana chooses more efficient label endpoints (`/api/v1/labels`, `/api/v1/label/<name>/values` with `match[]`). Otherwise, it falls back to the less efficient `/api/v1/series` for label queries. | ||||||
|
|
||||||
| Key limitations of this approach: | ||||||
|
|
||||||
| 1. Configuration errors (wrong type or version) can lead to incompatible or missing features. | ||||||
| 2. Backend upgrades alone do not enable new features in clients—client logic must also be updated and released. | ||||||
| 3. New Prometheus-compatible backends require explicit code changes in Grafana, slowing adoption. | ||||||
| 4. Type and version checks are coarse; they do not reflect actual enabled features, which may depend on flags or configuration. | ||||||
|
|
||||||
| Alternatives already exist in some downstream projects and demonstrate the need for such kind of APIs. However, the current approach is based on extending the [`buildinfo` endpoint](https://prometheus.io/docs/prometheus/latest/querying/api/#build-information) with a [`features` field](https://github.com/grafana/mimir/blob/9fccbacdabdd236cb7ff97cf154643b409078178/pkg/util/version/info_handler.go#L11-L30), which is very vendor specific. Grafana already uses this approach for some [alertmanager features](https://github.com/grafana/grafana/blob/8863ed9d6f8395808196b5d81d436fb637a43d37/public/app/features/alerting/unified/api/buildInfo.ts#L137-L145). | ||||||
|
|
||||||
| ## Goals | ||||||
|
|
||||||
| - Provide a machine-readable API to report enabled features. | ||||||
| - Ensure the solution is lightweight to encourage broad adoption in the ecosystem. | ||||||
| - Cover a comprehensive and relevant subset of Prometheus features. | ||||||
| - Design the API to be extensible, allowing third-party projects to declare their own features. | ||||||
|
|
||||||
| ### Audience | ||||||
|
|
||||||
| The intended audience for this proposal includes: | ||||||
|
|
||||||
| - Developers creating software that exposes the Prometheus API | ||||||
| - Consumers of the Prometheus API | ||||||
|
|
||||||
| ## Non-Goals | ||||||
|
|
||||||
| Implementing a unified feature gate in the code is out of scope | ||||||
|
|
||||||
| ## How | ||||||
|
|
||||||
| The `/api/v1/features` endpoint returns a JSON object with top-level categories inspired by Prometheus package organization. Each category key contains a map of unique feature names (strings) to `true`/`false` booleans indicating whether the feature is enabled. | ||||||
|
|
||||||
| Initial categories: | ||||||
|
|
||||||
| - `api` - API endpoint features and capabilities | ||||||
| - `otlp_receiver` - OTLP receiver features | ||||||
| - `prometheus` - Prometheus-specific features | ||||||
| - `promql` - PromQL language features (syntax, modifiers) | ||||||
| - `promql_functions` - Individual PromQL functions | ||||||
| - `promql_operators` - PromQL operators and aggregations | ||||||
| - `rules` - Rule evaluation features | ||||||
| - `scrape` - Scraping capabilities | ||||||
| - `service_discovery_providers` - Service discovery mechanisms | ||||||
| - `templating_functions` - Template functions for alerts and rules | ||||||
| - `tsdb` - Time series database features | ||||||
| - `ui` - Web UI capabilities | ||||||
|
|
||||||
| Example response: | ||||||
|
|
||||||
| ```json | ||||||
| { | ||||||
| "status": "success", | ||||||
| "data": { | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would the requirement "Allow for potential future inclusion of Alertmanager, even though it is currently out of scope." look like in this API? I mean I can imagine alertmanager simply returning a similar JSON like this, so I'm not sure what inclusion means?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed the requirement. I expect that if Alertmanager introduces this they will follow the same principles naturally. |
||||||
| "api": { | ||||||
| "exemplars": true, | ||||||
| "labels_matchers": true, | ||||||
| "query_post": true | ||||||
| }, | ||||||
| "promql": { | ||||||
| "negative_offset": true, | ||||||
| "at_modifier": true, | ||||||
| "subqueries": true | ||||||
| }, | ||||||
| "promql_functions": { | ||||||
| "last_over_time": true, | ||||||
| "limitk": true | ||||||
| }, | ||||||
| "prometheus": { | ||||||
| "stringlabels": true, | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| Naming conventions: | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| - All names MUST use `snake_case` | ||||||
| - Each category value is a map from unique feature name to a boolean | ||||||
| - Clients MUST ignore unknown feature names and categories | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we mention proposed stability of those categories and features? (e.g. can we rename feature in same major version? Can we remove a feature or category?) |
||||||
| - The response follows standard Prometheus API conventions with `status` and `data` fields | ||||||
| - The endpoint returns HTTP 200 OK, like other Prometheus APIs | ||||||
| - Vendors MAY add vendor-specific categories (e.g., `prometheus`, `mimir`, `cortex`, or other categories such as `clustering`) to expose vendor's unique abilities. Vendors implementing custom PromQL functions SHOULD register them under a vendor-specific category (e.g. `vendor_functions`, `metricsql_functions`), in case a future prometheus function gets implemented with a different signature. | ||||||
|
|
||||||
| Some items might exist in multiple categories. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. won't this make it harder for clients to integrate with the endpoint? They'd need to search for the same feature in multiple categories and handle cases where the feature appears in one category but not the other
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I meant is that I did not want to restrict some items to some categories. But I can remove this sentence from the design doc, not to encourage duplication of elements. |
||||||
|
|
||||||
| We do not differentiate between a feature that is simply disabled and one that is missing because it was not compiled in. There is no separate "build" category. Instead, if a feature depends on a compile-time flag, it will appear under its relevant category. If it is not built-in or disabled, it should be set to `false`. Implementations MAY omit features set to `false`, and clients MUST treat absent features as equivalent to `false`. | ||||||
|
|
||||||
| ## Alternatives | ||||||
|
|
||||||
| - Flat list: Having categories makes it easier for things like PromQL functions. | ||||||
| - No booleans (only trues): clients might use false to hint the user that they could enable a feature. | ||||||
| - Richer information than booleans (limits, etc): primarily to keep things simple | ||||||
|
|
||||||
| ## Action Plan | ||||||
|
|
||||||
| The package will be located in the prometheus/prometheus repository. | ||||||
|
|
||||||
| Instead of actively collecting features from other packages, this package will allow other components to register their supported features with it. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏽 I'd do this without this proposal to, great to see |
||||||
|
|
||||||
| For the initial launch, I plan to include a substantial set of already existing features. | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have some evidences supporting this statement? I find this hard to believe that this is what blocks user adoption for advanced feature. Also we don't even really measure adoption, but that's a separate problem (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, HTTP POST support was added to Prometheus in version 2.1.0 (2018) but was not adopted as the default in Grafana until version 8.0 (2021) => that is the evidence :)