RFC: Create OpenSearch Direct Query Plugin
Overview
This RFC proposes the creation of a new OpenSearch plugin repository called direct-query that will enable OpenSearch to interact with external data sources beyond the native OpenSearch indices. This plugin will provide a unified interface for not only querying data but also managing resources in various data sources including Prometheus, Amazon S3, and support extensibility for custom data source implementations. The plugin will support full CRUD (Create, Read, Update, Delete) operations on datasource-specific resources such as alerts, metrics, configurations, and more.
Motivation
Currently, the OpenSearch SQL plugin contains mixed responsibilities: SQL query engine functionality and data source connectivity. This coupling creates several challenges:
- Tight Coupling: Data source implementations are tightly coupled with the SQL engine, making it difficult to maintain and extend
- Limited Extensibility: Adding new data sources requires modifying the core SQL plugin
- Code Complexity: The SQL plugin has grown large and complex with multiple concerns
- Reusability: Data source connectivity cannot be easily reused by other OpenSearch components
- Limited Resource Management: Current implementation focuses only on querying data, not managing datasource resources like alerts, rules, or configurations
- No Unified API: Each datasource requires custom implementations for resource operations
Proposed Solution
Summary
Solution Overview: This RFC proposes creating a new OpenSearch direct-query plugin by extracting data source connectivity from the SQL plugin, implementing a handler-based architecture, and establishing a complete migration plan to transform OpenSearch's external data interaction capabilities.
Proposed changes:
- New repository: Create
opensearch-project/direct-query repository with extracted modules (direct-query-core, async-query, datasources, connectors)
- Handler-based architecture: Three specialized interfaces - QueryHandler (data access), ReadResourcesHandler (resource reading), WriteResourcesHandler (resource management)
- Comprehensive functionality: Full CRUD operations on both data queries AND external resources (Prometheus alerts, S3 bucket policies, database configurations)
- Clean separation: SQL plugin focuses solely on SQL/PPL parsing/execution, delegates all external operations to direct-query plugin
- Extensible connector system: Simple API for third-party developers to create connectors with unified query and resource management capabilities
- REST API framework: Complete API endpoints for data source management, query execution, and resource operations
Migration Plan (5 Phases):
- Repository Setup: Create new repo, CI/CD pipelines, build system
- Code Migration: Extract direct-query, async-query, and datasources modules from SQL plugin with backward compatibility
- Refactoring: Implement handler interfaces, create connector abstraction layer, establish plugin integration
- Integration: Update SQL plugin to use direct-query plugin, comprehensive testing and validation
- Release: Beta release, community feedback, GA release
Key Benefits:
- Separation of Concerns: Clear boundary between query engine and data source connectivity
- Unified Resource Management: Single API for managing resources across all data sources
- Extensibility: Easy addition of new data sources and resource types without modifying core plugins
- Maintainability: Smaller, focused codebases easier to maintain and evolve
- Reusability: Direct query and resource management capabilities available to other OpenSearch components
- Community Contributions: Lower barrier for contributing new connectors with both query and resource support
- Performance: Optimized execution for both queries and resource operations specific to each data source
- Operational Efficiency: Manage external resources directly from OpenSearch without switching tools
Repository Structure
Create a new opensearch-project/direct-query repository that will:
-
Extract from SQL Plugin:
- Move
direct-query and direct-query-core modules from the SQL plugin
- Migrate
async-query and async-query-core modules
- Transfer
datasources module for data source management
- Include recent Prometheus integration work (PRs
#3440 and #3441)
-
Core Components:
direct-query/
├── direct-query-core/ # Core query interfaces and abstractions
├── async-query-core/ # Async query execution framework
├── async-query/ # OpenSearch-specific async implementations
├── datasources/ # Data source management
├── connectors/ # Built-in connectors
│ ├── prometheus/
│ ├── s3/
│ └── ...
└── plugin/ # OpenSearch plugin integration
Architecture
Core Interfaces
-
DataSourceEngine Interface:
public interface DataSourceEngine {
// Connect to external data source
DataSourceConnection connect(DataSourceConfig config);
// Get handler for query operations
QueryHandler getQueryHandler();
// Get handler for read resource operations
ReadResourcesHandler getReadResourcesHandler();
// Get handler for write resource operations
WriteResourcesHandler getWriteResourcesHandler();
// Schema discovery
Schema discoverSchema(DataSourceConfig config);
// Health check
HealthStatus checkHealth();
}
-
QueryHandler Interface:
public interface QueryHandler {
// Execute synchronous query
QueryResult executeQuery(Query query, DataSourceConnection connection);
// Execute asynchronous query
CompletableFuture<QueryResult> executeAsyncQuery(Query query, DataSourceConnection connection);
// Validate query syntax
ValidationResult validateQuery(Query query);
// Get query capabilities
QueryCapabilities getCapabilities();
}
-
ReadResourcesHandler Interface:
public interface ReadResourcesHandler {
// List available resource types
Set<ResourceType> getSupportedResourceTypes();
// List resources of a specific type
ResourceList listResources(ResourceType type, ResourceFilter filter, DataSourceConnection connection);
// Get a specific resource
Resource getResource(ResourceType type, String resourceId, DataSourceConnection connection);
// Search resources with advanced filters
ResourceSearchResult searchResources(ResourceType type, SearchQuery query, DataSourceConnection connection);
// Get resource metadata
ResourceMetadata getResourceMetadata(ResourceType type, String resourceId, DataSourceConnection connection);
}
-
WriteResourcesHandler Interface:
public interface WriteResourcesHandler {
// Create a new resource
Resource createResource(ResourceType type, ResourceDefinition definition, DataSourceConnection connection);
// Update an existing resource
Resource updateResource(ResourceType type, String resourceId, ResourceDefinition definition, DataSourceConnection connection);
// Delete a resource
void deleteResource(ResourceType type, String resourceId, DataSourceConnection connection);
// Bulk operations
BulkOperationResult bulkCreate(ResourceType type, List<ResourceDefinition> definitions, DataSourceConnection connection);
BulkOperationResult bulkUpdate(ResourceType type, Map<String, ResourceDefinition> updates, DataSourceConnection connection);
BulkOperationResult bulkDelete(ResourceType type, List<String> resourceIds, DataSourceConnection connection);
// Validate resource definition before write
ValidationResult validateResourceDefinition(ResourceType type, ResourceDefinition definition);
}
-
ResourceType Enum (Examples):
public enum ResourceType {
// Prometheus resources
ALERT_RULE,
RECORDING_RULE,
SILENCE,
// S3 resources
BUCKET_POLICY,
LIFECYCLE_RULE,
// Generic
CONFIGURATION,
PERMISSION,
CUSTOM
}
-
Query Interface:
public interface Query {
String getQueryString();
Map<String, Object> getParameters();
QueryType getType(); // SQL, PPL, NATIVE, etc.
TimeRange getTimeRange();
}
-
Extensibility API:
public interface DataSourceConnector {
// Unique identifier for the connector
String getType();
// Create engine instance
DataSourceEngine createEngine(ConnectorConfig config);
// Supported query types
Set<QueryType> getSupportedQueryTypes();
// Supported resource operations
Set<ResourceType> getSupportedResourceTypes();
// Configuration schema
ConfigSchema getConfigurationSchema();
}
Key Features
-
Plugin Architecture:
- Independent OpenSearch plugin deployable alongside SQL plugin
- RESTful APIs for data source management, query execution, and resource operations
- Integration with OpenSearch security and access control
- Unified interface for both data access and resource management
-
Built-in Connectors with Resource Management:
- Prometheus: Time-series queries (PromQL) + Alert rules, recording rules, and silences management
- Amazon S3: Query structured data (Parquet, JSON, CSV) + Bucket policies and lifecycle rules
- JDBC: Generic database connectivity with configuration management (future)
-
Comprehensive Resource Operations:
- Full CRUD operations on datasource-specific resources
- Bulk operations for efficient resource management
- Resource filtering and search capabilities
- Resource versioning and change tracking
-
Async Query Support:
- Long-running query execution
- Result caching and pagination
- Query status tracking and cancellation
- Background resource synchronization
-
Developer Experience:
- Simple connector development API with resource management interfaces
- Maven/Gradle artifacts for third-party connector development
- Comprehensive documentation with examples for both queries and resources
- Type-safe resource definitions and operations
API Design
REST Endpoints
# Data source management
PUT /_plugins/_direct_query/datasource/{name}
GET /_plugins/_direct_query/datasource/{name}
DELETE /_plugins/_direct_query/datasource/{name}
GET /_plugins/_direct_query/datasource
# Query execution
POST /_plugins/_direct_query/_execute
{
"datasource": "my-prometheus",
"query": "rate(http_requests_total[5m])",
"format": "json"
}
# Async query
POST /_plugins/_direct_query/_async_execute
GET /_plugins/_direct_query/_async_query/{query_id}
DELETE /_plugins/_direct_query/_async_query/{query_id}
# Schema discovery
GET /_plugins/_direct_query/datasource/{name}/_schema
# Resource management operations
GET /_plugins/_direct_query/datasource/{name}/resources/{type}
POST /_plugins/_direct_query/datasource/{name}/resources/{type}/_search
GET /_plugins/_direct_query/datasource/{name}/resources/{type}/{id}
PUT /_plugins/_direct_query/datasource/{name}/resources/{type}/{id}
POST /_plugins/_direct_query/datasource/{name}/resources/{type}
DELETE /_plugins/_direct_query/datasource/{name}/resources/{type}/{id}
# Bulk resource operations
POST /_plugins/_direct_query/datasource/{name}/resources/{type}/_bulk
{
"operations": [
{"action": "create", "definition": {...}},
{"action": "update", "id": "resource1", "definition": {...}},
{"action": "delete", "id": "resource2"}
]
}
Integration with SQL Plugin
The SQL plugin will be refactored to:
- Remove data source-specific code
- Depend on direct-query plugin for external data source queries
- Focus on SQL/PPL parsing, planning, and execution
- Delegate external queries to direct-query plugin via well-defined interfaces
// SQL Plugin integration
public class DirectQueryStorageEngine implements StorageEngine {
private final DirectQueryClient client;
@Override
public Table getTable(DataSourceSchemaName dataSourceSchemaName, String tableName) {
return client.getTable(dataSourceSchemaName, tableName);
}
}
Migration Plan
Phase 1: Repository Setup
- Create new repository
opensearch-project/direct-query
- Set up CI/CD pipelines
- Establish code structure and build system
Phase 2: Code Migration
- Extract direct-query modules from SQL plugin
- Move async-query modules
- Migrate datasources module
- Ensure backward compatibility
Phase 3: Refactoring
- Define clean interfaces and APIs
- Refactor existing code to new architecture
- Create connector abstraction layer
- Implement plugin integration
Phase 4: Integration
- Update SQL plugin to use direct-query plugin
- Testing and validation
- Documentation updates
- Performance optimization
Phase 5: Release
- Beta release with core functionality
- Gather feedback from community
- Address issues and improvements
- GA release
Risks and Mitigation
| Risk |
Mitigation |
| Breaking changes for existing users |
Maintain backward compatibility layer during transition |
| Increased deployment complexity |
Provide clear migration guides and tooling |
| Performance overhead from plugin communication |
Optimize inter-plugin communication, consider native integration |
| Connector quality variance |
Establish certification program and quality standards |
Open Questions
- Should the direct-query plugin be required for SQL plugin operation or optional?
- How to handle version compatibility between SQL and direct-query plugins?
- Should we support federation queries across multiple data sources?
- What level of SQL/PPL support should each connector provide?
References
Conclusion
The proposed OpenSearch Direct Query Plugin represents a significant architectural improvement that will transform how OpenSearch interacts with external data sources. By introducing a handler-based architecture with separate QueryHandler, ReadResourcesHandler, and WriteResourcesHandler interfaces, this plugin will provide a unified, extensible framework for both data querying and comprehensive resource management across diverse data sources.
This separation of concerns will not only simplify the SQL plugin's architecture but also create new opportunities for the OpenSearch ecosystem. The plugin will enable developers to build rich connectors that go beyond simple data access to provide full lifecycle management of external resources like Prometheus alerts and S3 policies. The comprehensive example implementations demonstrate how this architecture can be practically applied to real-world data sources.
The direct-query plugin will serve as a foundation for OpenSearch's evolution into a unified data platform that can seamlessly integrate with and manage resources across the entire data infrastructure landscape, while maintaining the simplicity and extensibility that developers expect from the OpenSearch ecosystem.
This RFC is open for community feedback. Please comment with your thoughts, concerns, and suggestions.
RFC: Create OpenSearch Direct Query Plugin
Overview
This RFC proposes the creation of a new OpenSearch plugin repository called
direct-querythat will enable OpenSearch to interact with external data sources beyond the native OpenSearch indices. This plugin will provide a unified interface for not only querying data but also managing resources in various data sources including Prometheus, Amazon S3, and support extensibility for custom data source implementations. The plugin will support full CRUD (Create, Read, Update, Delete) operations on datasource-specific resources such as alerts, metrics, configurations, and more.Motivation
Currently, the OpenSearch SQL plugin contains mixed responsibilities: SQL query engine functionality and data source connectivity. This coupling creates several challenges:
Proposed Solution
Summary
Solution Overview: This RFC proposes creating a new OpenSearch direct-query plugin by extracting data source connectivity from the SQL plugin, implementing a handler-based architecture, and establishing a complete migration plan to transform OpenSearch's external data interaction capabilities.
Proposed changes:
opensearch-project/direct-queryrepository with extracted modules (direct-query-core, async-query, datasources, connectors)Migration Plan (5 Phases):
Key Benefits:
Repository Structure
Create a new
opensearch-project/direct-queryrepository that will:Extract from SQL Plugin:
direct-queryanddirect-query-coremodules from the SQL pluginasync-queryandasync-query-coremodulesdatasourcesmodule for data source management#3440and#3441)Core Components:
Architecture
Core Interfaces
DataSourceEngine Interface:
QueryHandler Interface:
ReadResourcesHandler Interface:
WriteResourcesHandler Interface:
ResourceType Enum (Examples):
Query Interface:
Extensibility API:
Key Features
Plugin Architecture:
Built-in Connectors with Resource Management:
Comprehensive Resource Operations:
Async Query Support:
Developer Experience:
API Design
REST Endpoints
Integration with SQL Plugin
The SQL plugin will be refactored to:
Migration Plan
Phase 1: Repository Setup
opensearch-project/direct-queryPhase 2: Code Migration
Phase 3: Refactoring
Phase 4: Integration
Phase 5: Release
Risks and Mitigation
Open Questions
References
Conclusion
The proposed OpenSearch Direct Query Plugin represents a significant architectural improvement that will transform how OpenSearch interacts with external data sources. By introducing a handler-based architecture with separate QueryHandler, ReadResourcesHandler, and WriteResourcesHandler interfaces, this plugin will provide a unified, extensible framework for both data querying and comprehensive resource management across diverse data sources.
This separation of concerns will not only simplify the SQL plugin's architecture but also create new opportunities for the OpenSearch ecosystem. The plugin will enable developers to build rich connectors that go beyond simple data access to provide full lifecycle management of external resources like Prometheus alerts and S3 policies. The comprehensive example implementations demonstrate how this architecture can be practically applied to real-world data sources.
The direct-query plugin will serve as a foundation for OpenSearch's evolution into a unified data platform that can seamlessly integrate with and manage resources across the entire data infrastructure landscape, while maintaining the simplicity and extensibility that developers expect from the OpenSearch ecosystem.
This RFC is open for community feedback. Please comment with your thoughts, concerns, and suggestions.