-
Notifications
You must be signed in to change notification settings - Fork 170
Metrics framework integration with ml-commons #3661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9195bdf
9b388d5
e23a9b5
5edce6e
8b87540
cf7ec3d
ce1a948
039392d
dfaa932
fc756d8
b50d5d0
9433c1a
78056ff
2eaeaaa
88c7477
f24fae6
9e3c409
9661993
c3e677c
8c48068
995b7fe
62f5df3
9fc2925
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -342,4 +342,12 @@ private MLCommonsSettings() {} | |
/** This setting sets the remote metadata service name */ | ||
public static final Setting<String> REMOTE_METADATA_SERVICE_NAME = Setting | ||
.simpleString("plugins.ml_commons." + REMOTE_METADATA_SERVICE_NAME_KEY, Setting.Property.NodeScope, Setting.Property.Final); | ||
|
||
// Feature flag for enabling telemetry metric collection via metrics framework | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the difference between static metric vs metric? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. static metric refers to the MLStatsJobProcessor that runs every 5 mins to collect state data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From the name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Any suggestion on what it should be? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what happens if we don't enable this : |
||
public static final Setting<Boolean> ML_COMMONS_METRIC_COLLECTION_ENABLED = Setting | ||
.boolSetting("plugins.ml_commons.metrics_collection_enabled", false, Setting.Property.NodeScope, Setting.Property.Dynamic); | ||
|
||
// Feature flag for enabling telemetry static metric collection job -- MLStatsJobProcessor | ||
public static final Setting<Boolean> ML_COMMONS_STATIC_METRIC_COLLECTION_ENABLED = Setting | ||
.boolSetting("plugins.ml_commons.metrics_static_collection_enabled", false, Setting.Property.NodeScope, Setting.Property.Dynamic); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,10 +12,13 @@ | |
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_CONTROLLER_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_LOCAL_MODEL_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_MCP_SERVER_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_METRIC_COLLECTION_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_MULTI_TENANCY_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_OFFLINE_BATCH_INFERENCE_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_OFFLINE_BATCH_INGESTION_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_REMOTE_INFERENCE_ENABLED; | ||
import static org.opensearch.ml.common.settings.MLCommonsSettings.ML_COMMONS_STATIC_METRIC_COLLECTION_ENABLED; | ||
|
||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
@@ -43,6 +46,11 @@ public class MLFeatureEnabledSetting { | |
|
||
private volatile Boolean isMcpServerEnabled; | ||
|
||
private volatile Boolean isRagSearchPipelineEnabled; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to add flag for each feature? like mlInferenceProcessor and agent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this PR, I refactored the JobScheduler plugin to be more extensible. This involved extending the JobSchedulerPlugin from MachineLearningPlugin. JobSchedulerPlugin requires an empty constructor, however, our constructor was accepting isRagSearchPipelineEnabled. So, I had to refactor the way this setting works and pass it via MLFeatureEnabledSetting. This change is specific to that. No, we don't need to add flags for each feature. Just a backlog item, there is also a todo in the old constructor. |
||
|
||
private volatile Boolean isMetricCollectionEnabled; | ||
private volatile Boolean isStaticMetricCollectionEnabled; | ||
|
||
private final List<SettingsChangeListener> listeners = new ArrayList<>(); | ||
|
||
public MLFeatureEnabledSetting(ClusterService clusterService, Settings settings) { | ||
|
@@ -55,6 +63,9 @@ public MLFeatureEnabledSetting(ClusterService clusterService, Settings settings) | |
isBatchInferenceEnabled = ML_COMMONS_OFFLINE_BATCH_INFERENCE_ENABLED.get(settings); | ||
isMultiTenancyEnabled = ML_COMMONS_MULTI_TENANCY_ENABLED.get(settings); | ||
isMcpServerEnabled = ML_COMMONS_MCP_SERVER_ENABLED.get(settings); | ||
isRagSearchPipelineEnabled = ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED.get(settings); | ||
isMetricCollectionEnabled = ML_COMMONS_METRIC_COLLECTION_ENABLED.get(settings); | ||
isStaticMetricCollectionEnabled = ML_COMMONS_STATIC_METRIC_COLLECTION_ENABLED.get(settings); | ||
|
||
clusterService | ||
.getClusterSettings() | ||
|
@@ -74,6 +85,15 @@ public MLFeatureEnabledSetting(ClusterService clusterService, Settings settings) | |
.getClusterSettings() | ||
.addSettingsUpdateConsumer(ML_COMMONS_OFFLINE_BATCH_INFERENCE_ENABLED, it -> isBatchInferenceEnabled = it); | ||
clusterService.getClusterSettings().addSettingsUpdateConsumer(ML_COMMONS_MCP_SERVER_ENABLED, it -> isMcpServerEnabled = it); | ||
clusterService | ||
.getClusterSettings() | ||
.addSettingsUpdateConsumer(MLCommonsSettings.ML_COMMONS_RAG_PIPELINE_FEATURE_ENABLED, it -> isRagSearchPipelineEnabled = it); | ||
clusterService | ||
.getClusterSettings() | ||
.addSettingsUpdateConsumer(ML_COMMONS_METRIC_COLLECTION_ENABLED, it -> isMetricCollectionEnabled = it); | ||
clusterService | ||
.getClusterSettings() | ||
.addSettingsUpdateConsumer(ML_COMMONS_STATIC_METRIC_COLLECTION_ENABLED, it -> isStaticMetricCollectionEnabled = it); | ||
} | ||
|
||
/** | ||
|
@@ -148,6 +168,22 @@ public void addListener(SettingsChangeListener listener) { | |
listeners.add(listener); | ||
} | ||
|
||
/** | ||
* Whether the rag search pipeline feature is enabled. If disabled, APIs in ml-commons will block rag search pipeline. | ||
* @return whether the feature is enabled. | ||
*/ | ||
public boolean isRagSearchPipelineEnabled() { | ||
return isRagSearchPipelineEnabled; | ||
} | ||
|
||
public boolean isMetricCollectionEnabled() { | ||
return isMetricCollectionEnabled; | ||
} | ||
|
||
public boolean isStaticMetricCollectionEnabled() { | ||
return isStaticMetricCollectionEnabled; | ||
} | ||
|
||
@VisibleForTesting | ||
public void notifyMultiTenancyListeners(boolean isEnabled) { | ||
for (SettingsChangeListener listener : listeners) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have multiple system index. To avoid adding too many system index, can we reuse .plugins-ml-task index ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the job scheduler monitors a particular index for its documents. If a new document is added, it starts a new job using certain parameters in document defined in MLJobParameter. If the index has different documents with different format, I'm not sure how this will react. At the same time, how will the existing tasks work if job scheduler documents are present in it? I can test this out and get back to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any update on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not feasible to use the existing task-index and job scheduler requires its own index with documents following the structure of MLJobParameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other plugins that have extended this plugin also have a separate index for the jobs.