opensearch-project · natebower · Oct 13, 2025 · Oct 2, 2025 · Oct 2, 2025 · Oct 7, 2025
@@ -18,3 +18,5 @@ An _agent_ orchestrates and runs ML models and tools. For a list of supported ag
 
 A _tool_ performs a set of specific tasks. Some examples of tools are the [`VectorDBTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/vector-db-tool/), which supports vector search, and the [`ListIndexTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/list-index-tool/), which executes the List Indices API. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/).
 
+You can modify and transform tool outputs using [output processors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). Output processors allow you to chain multiple data transformations that execute sequentially on any tool's output.
+
diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md
@@ -0,0 +1,362 @@
+---
+layout: default
+title: Output processors
+parent: Agents and tools
+grand_parent: ML Commons APIs
+nav_order: 30
+---
+
+# Output processors
+**Introduced 3.3**
+{: .label .label-purple }
+
+Output processors allow you to modify and transform the output of any tool before it's returned to the agent or user. You can chain multiple output processors together to create complex data transformation pipelines that execute sequentially.
+
+## Overview
+
+Output processors provide a powerful way to:
+
+- **Transform data formats**: Convert between different data structures (strings, JSON, arrays)
+- **Extract specific information**: Use JSONPath or regex patterns to pull out relevant data
+- **Clean and filter content**: Remove unwanted fields or apply formatting rules
+- **Standardize outputs**: Ensure consistent data formats across different tools
+
+Each tool can have multiple output processors that execute in the order they are defined. The output of one processor becomes the input for the next processor in the chain.
+
+### Sequential execution
+
+Output processors execute in the order they appear in the array. Each processor receives the output from the previous processor (or the original tool output for the first processor):
+
+```
+Tool Output → Processor 1 → Processor 2 → Processor 3 → Final Output
+```
+
+## Configuration
+
+Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registeration. 
+
+For a complete example, see [Example usage with agents](#example-usage-with-agents).
+
+## Supported output processor types
+
+The following processors are available for transforming tool outputs:
+
+### to_string
+
+Converts the input to a JSON string representation.
+
+**Parameters:**
+- `escape_json` (boolean, optional): Whether to escape JSON characters. Default: `false`
+
+**Example Configuration:**
+```json
+{
+  "type": "to_string",
+  "escape_json": true
+}
+```
+
+**Example Input/Output:**
+```
+Input: {"name": "test", "value": 123}
+Output: "{\"name\":\"test\",\"value\":123}"
+```
+
+### regex_replace
+
+Replaces text using regular expression patterns. For regex syntax details, see [OpenSearch regex syntax](https://docs.opensearch.org/latest/query-dsl/regex-syntax/).
+
+**Parameters:**
+- `pattern` (string, required): Regular expression pattern to match
+- `replacement` (string, optional): Replacement text. Default: `""`
+- `replace_all` (boolean, optional): Whether to replace all matches or just the first. Default: `true`
+
+**Example Configuration:**
+```json
+{
+  "type": "regex_replace",
+  "pattern": "^.*?\n",
+  "replacement": ""
+}
+```
+
+**Example Input/Output:**
+```
+Input: "row,health,status,index\n1,green,open,.plugins-ml-model\n2,red,closed,test-index"
+Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index"
+```
+
+### regex_capture
+
+Captures specific groups from regex matches. For regex syntax details, see [OpenSearch regex syntax](https://docs.opensearch.org/latest/query-dsl/regex-syntax/).
+
+**Parameters:**
+- `pattern` (string, required): Regular expression pattern with capture groups
+- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default: `"1"`
+
+**Example Configuration:**
+```json
+{
+  "type": "regex_capture",
+  "pattern": "(\\d+),(\\w+),(\\w+),([^,]+)",
+  "groups": "[1, 4]"
+}
+```
+
+**Example Input/Output:**
+```
+Input: "1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0"
+Output: ["1", ".plugins-ml-model-group"]
+```
+
+### jsonpath_filter
+
+Extracts data using JSONPath expressions.
+
+**Parameters:**
+- `path` (string, required): JSONPath expression to extract data
+- `default` (any, optional): Default value if path is not found
+
+**Example Configuration:**
+```json
+{
+  "type": "jsonpath_filter",
+  "path": "$.data.items[*].name",
+  "default": []
+}
+```
+
+**Example Input/Output:**
+```
+Input: {"data": {"items": [{"name": "item1"}, {"name": "item2"}]}}
+Output: ["item1", "item2"]
+```
+
+### extract_json
+
+Extracts JSON objects or arrays from text strings.
+
+**Parameters:**
+- `extract_type` (string, optional): Type of JSON to extract - `"object"`, `"array"`, or `"auto"`. Default: `"auto"`
+- `default` (any, optional): Default value if JSON extraction fails
+
+**Example Configuration:**
+```json
+{
+  "type": "extract_json",
+  "extract_type": "object",
+  "default": {}
+}
+```
+
+**Example Input/Output:**
+```
+Input: "The result is: {\"status\": \"success\", \"count\": 5} - processing complete"
+Output: {"status": "success", "count": 5}
+```
+
+### remove_jsonpath
+
+Removes fields from JSON objects using JSONPath.
+
+**Parameters:**
+- `path` (string, required): JSONPath expression identifying fields to remove
+
+**Example Configuration:**
+```json
+{
+  "type": "remove_jsonpath",
+  "path": "$.sensitive_data"
+}
+```
+
+**Example Input/Output:**
+```
+Input: {"name": "user1", "sensitive_data": "secret", "public_info": "visible"}
+Output: {"name": "user1", "public_info": "visible"}
+```
+
+### conditional
+
+Applies different processor chains based on conditions.
+
+**Parameters:**
+- `path` (string, optional): JSONPath to extract value for condition evaluation
+- `routes` (array, required): Array of condition-processor mappings
+- `default` (array, optional): Default processors if no conditions match
+
+**Supported conditions:**
+- Exact value match: `"value"`
+- Numeric comparisons: `">10"`, `"<5"`, `">=", `"<="`, `"==5"`
+- Existence checks: `"exists"`, `"null"`, `"not_exists"`
+- Regex matching: `"regex:pattern"`
+- Contains text: `"contains:substring"`
+
+**Example Configuration:**
+```json
+{
+  "type": "conditional",
+  "path": "$.status",
+  "routes": [
+    {
+      "green": [
+        {"type": "regex_replace", "pattern": "status", "replacement": "healthy"}
+      ]
+    },
+    {
+      "red": [
+        {"type": "regex_replace", "pattern": "status", "replacement": "unhealthy"}
+      ]
+    }
+  ],
+  "default": [
+    {"type": "regex_replace", "pattern": "status", "replacement": "unknown"}
+  ]
+}
+```
+
+**Example Input/Output:**
+```
+Input: {"index": "test-index", "status": "green", "docs": 100}
+Output: {"index": "test-index", "healthy": "green", "docs": 100}
+```
+
+### process_and_set
+
+Applies a chain of processors to the input and sets the result at a specified JSONPath location.
+
+**Parameters:**
+- `path` (string, required): JSONPath expression specifying where to set the processed result
+- `processors` (array, required): List of processor configurations to apply sequentially
+
+**Path behavior:**
+- If the path exists, it will be updated with the processed value
+- If the path doesn't exist, attempts to create it (works for simple nested fields)
+- Parent path must exist for new field creation to succeed
+
+**Example Configuration:**
+```json
+{
+  "type": "process_and_set",
+  "path": "$.summary.clean_name",
+  "processors": [
+    {
+      "type": "to_string"
+    },
+    {
+      "type": "regex_replace",
+      "pattern": "[^a-zA-Z0-9]",
+      "replacement": "_"
+    }
+  ]
+}
+```
+
+**Example Input/Output:**
+```
+Input: {"name": "Test Index!", "status": "active"}
+Output: {"name": "Test Index!", "status": "active", "summary": {"clean_name": "Test_Index_"}}
+```
+
+### set_field
+
+Sets a field to a specified static value or copies a value from another field.
+
+**Parameters:**
+- `path` (string, required): JSONPath expression specifying where to set the value
+- `value` (any, conditionally required): Static value to set. Either `value` or `source_path` must be provided
+- `source_path` (string, conditionally required): JSONPath to copy value from. Either `value` or `source_path` must be provided
+- `default` (any, optional): Default value when `source_path` doesn't exist. Only used with `source_path`
+
+**Path behavior:**
+- If the path exists, it will be updated with the new value
+- If the path doesn't exist, attempts to create it (works for simple nested fields)
+- Parent path must exist for new field creation to succeed
+
+**Example Configuration (static value):**
+```json
+{
+  "type": "set_field",
+  "path": "$.metadata.processed_at",
+  "value": "2024-03-15T10:30:00Z"
+}
+```
+
+**Example Configuration (copy field):**
+```json
+{
+  "type": "set_field",
+  "path": "$.userId",
+  "source_path": "$.user.id",
+  "default": "unknown"
+}
+```
+
+**Example Input/Output:**
+```
+Input: {"user": {"id": 123}, "name": "John"}
+Output: {"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"processed_at": "2024-03-15T10:30:00Z"}}
+```
+
+### Example usage with agents
+
+**Step 1: Register a flow agent with output processors**
+
+```json
+POST /_plugins/_ml/agents/_register
+{
+  "name": "Index Summary Agent",
+  "type": "flow",
+  "description": "Agent that provides clean index summaries",
+  "tools": [
+    {
+      "type": "ListIndexTool",
+      "parameters": {
+        "output_processors": [
+          {
+            "type": "regex_replace",
+            "pattern": "^.*?\n",
+            "replacement": ""
+          },
+          {
+            "type": "regex_capture",
+            "pattern": "(\\d+,\\w+,\\w+,([^,]+))"
+          }
+        ]
+      }
+    }
+  ]
+}
+```
+
+**Step 2: Execute the agent**
+
+Using the `agent_id` returned in the previous step:
+
+```json
+POST /_plugins/_ml/agents/{agent_id}/_execute
+{
+  "parameters": {
+    "question": "List the indices"
+  }
+}
+```
+
+**Without output processors, the raw ListIndexTool would return:**
+```
+row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.store.size
+1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0,1,0,12.7kb,12.7kb
+2,green,open,.plugins-ml-memory-message,6qVpepfRSCi9bQF_As_t2A,1,0,7,0,53kb,53kb
+3,green,open,.plugins-ml-memory-meta,LqP3QMaURNKYDZ9p8dTq3Q,1,0,2,0,44.8kb,44.8kb
+```
+
+**With output processors, the agent returns:**
+```
+1,green,open,.plugins-ml-model-group
+2,green,open,.plugins-ml-memory-message
+3,green,open,.plugins-ml-memory-meta
+```
+
+The output processors transform the verbose CSV output into a clean, readable format by:
+1. **`regex_replace`**: Removing the CSV header row
+2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -18,3 +18,5 @@ An _agent_ orchestrates and runs ML models and tools. For a list of supported ag

		A _tool_ performs a set of specific tasks. Some examples of tools are the [`VectorDBTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/vector-db-tool/), which supports vector search, and the [`ListIndexTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/list-index-tool/), which executes the List Indices API. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/).

		You can modify and transform tool outputs using [output processors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). Output processors allow you to chain multiple data transformations that execute sequentially on any tool's output.
pyek-bot marked this conversation as resolved. Outdated Show resolved Hide resolved