-
Notifications
You must be signed in to change notification settings - Fork 21
Add features flag API proposal #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9997ef7 to
84cf2d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like this, will review in details later on
|
Here's what I get so far with PromQL: {
"status": "success",
"data": {
"promql": {
"at_modifier": true,
"delayed_name_removal": false,
"experimental_duration_expr": false,
"extended_range_selectors": false,
"negative_offset": true,
"per_query_lookback_delta": true,
"per_step_stats": false,
"subqueries": true,
"type_and_unit_labels": false
},
"promql_functions": {
"abs": true,
"absent": true,
"absent_over_time": true,
"acos": true,
"acosh": true,
"asin": true,
"asinh": true,
"atan": true,
"atanh": true,
"avg_over_time": true,
"ceil": true,
"changes": true,
"clamp": true,
"clamp_max": true,
"clamp_min": true,
"cos": true,
"cosh": true,
"count_over_time": true,
"day_of_month": true,
"day_of_week": true,
"day_of_year": true,
"days_in_month": true,
"deg": true,
"delta": true,
"deriv": true,
"double_exponential_smoothing": false,
"exp": true,
"first_over_time": false,
"floor": true,
"histogram_avg": true,
"histogram_count": true,
"histogram_fraction": true,
"histogram_quantile": true,
"histogram_stddev": true,
"histogram_stdvar": true,
"histogram_sum": true,
"hour": true,
"idelta": true,
"increase": true,
"info": false,
"irate": true,
"label_join": true,
"label_replace": true,
"last_over_time": true,
"ln": true,
"log10": true,
"log2": true,
"mad_over_time": false,
"max_over_time": true,
"min_over_time": true,
"minute": true,
"month": true,
"pi": true,
"predict_linear": true,
"present_over_time": true,
"quantile_over_time": true,
"rad": true,
"rate": true,
"resets": true,
"round": true,
"scalar": true,
"sgn": true,
"sin": true,
"sinh": true,
"sort": true,
"sort_by_label": false,
"sort_by_label_desc": false,
"sort_desc": true,
"sqrt": true,
"stddev_over_time": true,
"stdvar_over_time": true,
"sum_over_time": true,
"tan": true,
"tanh": true,
"time": true,
"timestamp": true,
"ts_of_first_over_time": false,
"ts_of_last_over_time": false,
"ts_of_max_over_time": false,
"ts_of_min_over_time": false,
"vector": true,
"year": true
},
"promql_operators": {
"!=": true,
"!~": true,
"%": true,
"*": true,
"+": true,
"-": true,
"/": true,
"<": true,
"<=": true,
"==": true,
"=~": true,
">": true,
">=": true,
"@": true,
"^": true,
"and": true,
"atan2": true,
"avg": true,
"bottomk": true,
"count": true,
"count_values": true,
"group": true,
"limit_ratio": false,
"limitk": false,
"max": true,
"min": true,
"or": true,
"quantile": true,
"stddev": true,
"stdvar": true,
"sum": true,
"topk": true,
"unless": true
}
}
} |
I like the proposal, but I feel that list is too granular. I don't think it's useful to list features and functions which have existed in PromQL for years. So |
To avoid drifting from the actual code, the list is auto-generated. Note that I do not expect users of the API to check if "rate" is supported before using it. |
proposals/0063-feature-flag.md
Outdated
| 3. New Prometheus-compatible backends require explicit code changes in Grafana, slowing adoption. | ||
| 4. Type and version checks are coarse; they do not reflect actual enabled features, which may depend on flags or configuration. | ||
|
|
||
| There is no real alternative to this for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what we use in mimir is the buildinfo endpoint by adding a features field to the response.
Grafana already uses that for some alertmanager features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I added that to the design doc.
| - The response follows standard Prometheus API conventions with `status` and `data` fields | ||
| - The endpoint returns HTTP 200 OK, like other Prometheus APIs | ||
|
|
||
| Some items might exist in multiple categories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this make it harder for clients to integrate with the endpoint? They'd need to search for the same feature in multiple categories and handle cases where the feature appears in one category but not the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that I did not want to restrict some items to some categories. But I can remove this sentence from the design doc, not to encourage duplication of elements.
9ebb830 to
964e418
Compare
|
Here's the content of the features API currently: |
| ```json | ||
| { | ||
| "status": "success", | ||
| "data": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the requirement "Allow for potential future inclusion of Alertmanager, even though it is currently out of scope." look like in this API?
I mean I can imagine alertmanager simply returning a similar JSON like this, so I'm not sure what inclusion means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the requirement. I expect that if Alertmanager introduces this they will follow the same principles naturally.
proposals/0063-feature-flag.md
Outdated
| - `rule` - Rule evaluation features | ||
| - `ui` - Web UI capabilities | ||
| - `promql` - PromQL language features (syntax, modifiers, operators) | ||
| - `promql_functions` - Individual PromQL functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTLP is first class, so I'd already add otlp_receiver
proposals/0063-feature-flag.md
Outdated
| - Clients MUST ignore unknown feature names and categories | ||
| - The response follows standard Prometheus API conventions with `status` and `data` fields | ||
| - The endpoint returns HTTP 200 OK, like other Prometheus APIs | ||
| - Vendors MAY add vendor-specific categories (e.g., `prometheus`, `mimir`, `cortex`) to expose implementation-specific features such as build tags or vendor-unique capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would a vendor put a feature like a vendor specific PromQL function? Under "promql_functions" or "vendor" ? Probably saying explicitly that vendors MUST put vendor specific flags into their own section would clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated: Vendors MAY add vendor-specific categories (e.g., prometheus, mimir, cortex, or other categories such as clustering) to expose vendor's unique abilities. Vendors implementing custom PromQL functions SHOULD register them under a vendor-specific category (e.g. vendor_functions, metricsql_functions), in case a future prometheus function gets implemented with a different signature.
proposals/0063-feature-flag.md
Outdated
| - Clients MUST ignore unknown feature names and categories | ||
| - The response follows standard Prometheus API conventions with `status` and `data` fields | ||
| - The endpoint returns HTTP 200 OK, like other Prometheus APIs | ||
| - Vendors MAY add vendor-specific categories (e.g., `prometheus`, `mimir`, `cortex`) to expose implementation-specific features such as build tags or vendor-unique capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you allow sub categories for vendors? I don't know if vendors will have a tonne of features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated wording: Vendors MAY add vendor-specific categories (e.g., prometheus, mimir, cortex, or other categories such as clustering) to expose vendor's unique abilities. Vendors implementing custom PromQL functions SHOULD register them under a vendor-specific category (e.g. vendor_functions, metricsql_functions), in case a future prometheus function gets implemented with a different signature.
Ideally no subcategories so this is always map[string]map[string]bool
964e418 to
2a249d6
Compare
Signed-off-by: Julien Pivotto <[email protected]>
2a249d6 to
00e9264
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks. LGTM from my side.
For consistency it would be fun to enable some ff to be dynamically settable (via runtime config/HTTP), but that's a separate discussion. I'd love feature flags to be a separate package in Go code so we can consistently used it everywhere too, likely an after effect of the implementation of this PROM-63.
I found categorization very odd at the first glance, similar to @bboreham. Like why we have a mix in this API feature flags, build tags (stringlabels), PromQL functions, PromQL operators and even mode (agent). Then we already have endpoints that give some of this info like build info, flags.
What got me sold is the simplified way of accessing the info of what's the current Prometheus process is "capable of" which is a mix of those flags, Prometheus exact version, the tag it was used with and even build options like what SD providers were build with. Then it's also great for 4.x and 5.x where we might kill some features (improving feature lifecycle maintenance). So this is super useful 👍🏽
| } | ||
| ``` | ||
|
|
||
| Naming conventions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Naming conventions: | |
| Semantics: |
| Naming conventions: | ||
| - All names MUST use `snake_case` | ||
| - Each category value is a map from unique feature name to a boolean | ||
| - Clients MUST ignore unknown feature names and categories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we mention proposed stability of those categories and features? (e.g. can we rename feature in same major version? Can we remove a feature or category?)
|
|
||
| Over time, the Prometheus APIs have undergone numerous optimizations, such as supporting POST in addition to GET requests and allowing filtering on certain API endpoints. Additionally, new APIs, PromQL functions, and capabilities are regularly introduced. Some of these features are optional and can be enabled or disabled by users. | ||
|
|
||
| Without a "features API," new advancements are often underutilized because API clients are hesitant to adopt them before widespread support exists among users. By creating an API that clearly communicates available and enabled features, clients can take advantage of new capabilities as soon as they are released. For instance, HTTP POST support was added to Prometheus in version 2.1.0 (2018) but was not adopted as the default in Grafana until version 8.0 (2021), illustrating a three-year delay caused by limited visibility of feature availability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a "features API," new advancements are often underutilized because API clients are hesitant to adopt them before widespread support exists among users. By creating an API that clearly communicates available and enabled features, clients can take advantage of new capabilities as soon as they are released.
Do we have some evidences supporting this statement? I find this hard to believe that this is what blocks user adoption for advanced feature. Also we don't even really measure adoption, but that's a separate problem (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, HTTP POST support was added to Prometheus in version 2.1.0 (2018) but was not adopted as the default in Grafana until version 8.0 (2021) => that is the evidence :)
|
|
||
| The package will be located in the prometheus/prometheus repository. | ||
|
|
||
| Instead of actively collecting features from other packages, this package will allow other components to register their supported features with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽 I'd do this without this proposal to, great to see
I can review this and filter down to only have the layer of "useful" things that can be used externally (e.g. remove agent and TSDB features). Keeping PromQL and API-related things.
The goal is NOT to expose Prometheus exact version, so that it does not matter if you query Prometheus, Thanos, Cortex, ... |
Sorry for confusion, I was explaining my train of thoughts 🙈 While initially odd, I think I like the low level details here.
Yup yes! That's why I approved (: |
No description provided.