-
Notifications
You must be signed in to change notification settings - Fork 267
Description
Is your feature request related to a problem? Please describe.
When using Data Prepper to prepare Machine Learning batch job input files, many AWS AI services like Bedrock and SageMaker specifically require the input files to be in JSONL format with .jsonl extension. However, Data Prepper's S3 sink currently only supports .ndjson extension for JSON Lines format, requiring additional file renaming steps. Adding a new codec option in Data Prepper S3 sink to support .jsonl extension while maintaining the same format would improve compatibility with AWS AI/ML services. https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-data.html.
Currently data perpper only has .ndjson extension in the jsonl format, we should add a new codec that saves the file in the same format but in .jsonl extension. https://docs.opensearch.org/latest/data-prepper/pipelines/configuration/sinks/s3/#codec

Describe the solution you'd like
Allow the following codec in the S3 sink config, and saves the file in the .jsonl extension.
sink:
- s3:
codec:
jsonl: {}
Additional context
#5509
Metadata
Metadata
Assignees
Labels
Type
Projects
Status