From df6ee79fca6d7df90222da1a295fe16d2c9859b3 Mon Sep 17 00:00:00 2001 From: burnalting Date: Mon, 21 Aug 2023 15:40:09 +1000 Subject: [PATCH 1/2] Create SPLIT Event Data pipelines --- source/template-pipelines/README.md | 17 ++ ....1b7f5d6e-5db6-44e8-a45e-e9315070b940.meta | 7 + ....1b7f5d6e-5db6-44e8-a45e-e9315070b940.node | 4 + ...e.1b7f5d6e-5db6-44e8-a45e-e9315070b940.xml | 185 +++++++++++++++ ....39ac6789-2655-4040-96f2-276509ace0ae.meta | 7 + ....39ac6789-2655-4040-96f2-276509ace0ae.node | 4 + ...e.39ac6789-2655-4040-96f2-276509ace0ae.xml | 220 ++++++++++++++++++ ....27971b76-6462-4eff-b63f-a7282df265f5.meta | 6 + ....27971b76-6462-4eff-b63f-a7282df265f5.node | 4 + ...e.27971b76-6462-4eff-b63f-a7282df265f5.xml | 167 +++++++++++++ 10 files changed, 621 insertions(+) create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.meta create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.node create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.xml create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.meta create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.node create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.xml create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.meta create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.node create mode 100644 source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.xml diff --git a/source/template-pipelines/README.md b/source/template-pipelines/README.md index 6197e6e..b7f791f 100644 --- a/source/template-pipelines/README.md +++ b/source/template-pipelines/README.md @@ -13,6 +13,9 @@ The following represents the folder structure and content that will be imported * [Event Data (JSON)](#event-data-json) `Pipeline` * [Event Data (Text)](#event-data-text) `Pipeline` * [Event Data (XML)](#event-data-xml) `Pipeline` + * [Event Data Split (JSON)](#event-data-split-json) `Pipeline` + * [Event Data Split (Text)](#event-data-split-text) `Pipeline` + * [Event Data Split (XML)](#event-data-split-xml) `Pipeline` * [Indexing](#indexing) `Pipeline` * [JSON](#json-xslt) `XSLT` * [Reference Data](#reference-data) `Pipeline` @@ -50,6 +53,20 @@ Inherits from Event Data Base, adding a Data Splitter parser in front of it. Thi Inherits from Event Data Base, adding a Data Splitter parser in front of it. This pipeline can be used as a template for pipelines processing text format data (e.g. Apache logs) into event-logging format XML. Pipelines inheriting from this will need to supply as a minimum a Text Converter to convert the text into XML, an XSLT translation to convert this XML into event-logging form and an XSLT translation to decorate the event-logging XML with any additional data (e.g. IP -> hostname lookups). +## Event Data Split (JSON) + +This pipeline can be used as a template for pipelines processing JSON format data into event-logging format XML where the events can be split in half for storage in different streamAppender event feeds. Typically this is used to break event data into one stream that should be held for a different time period to that of another. For example, all user attributed events could go to one output stream held for 7 years and all others could go to another output stream that could be held for say 1 year. Pipelines inheriting from this will need to supply as a mimumum a JSON Parser (`jsonParser`) to convert JSON fragments into XML, one or more XSLT translations (`preTranslationFilter`, `initialTranslationFilter`) to convert this XML into event-logging form and then two XSLT translations (`translationFilterA`, `translationFilterB`) to split the event-logging XML form into differing sub-pipelines than can individually decorate then store the events in a given event feed. + +## Event Data Split (Text) + +This pipeline can be used as a template for pipelines processing text format data (e.g. Apache logs) into event-logging format XML where the events can be split in half for storage in different streamAppender event feeds. Typically this is used to break event data into one stream that should be held for a different time period to that of another. For example, all user attributed events could go to one output stream held for 7 years and all others could go to another output stream that could be held for say 1 year. Pipelines inheriting from this will need to supply as a mimumum a Text Converter (`dsParser`) to convert text into XML, one or more XSLT translations (`preTranslationFilter`, `initialTranslationFilter`) to convert this XML into event-logging form and then two XSLT translations (`translationFilterA`, `translationFilterB`) to split the event-logging XML form into differing sub-pipelines than can individually decorate then store the events in a given event feed. + + +## Event Data Split (XML) + +This pipeline can be used as a template for pipelines processing text format data (e.g. Apache logs) into event-logging format XML where the events can be split in half for storage in different streamAppender event feeds. Typically this is used to break event data into one stream that should be held for a different time period to that of another. For example, all user attributed events could go to one output stream held for 7 years and all others could go to another output stream that could be held for say 1 year. Pipelines inheriting from this will need to supply as a mimumum a XML Fragment parser (`xmlFragmentParser`), one or more XSLT translations (`preTranslationFilter`, `initialTranslationFilter`) to convert this XML into event-logging form and then two XSLT translations (`translationFilterA`, `translationFilterB`) to split the event-logging XML form into differing sub-pipelines than can individually decorate then store the events in a given event feed. + + ## Indexing diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.meta b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.meta new file mode 100644 index 0000000..84b4c10 --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.meta @@ -0,0 +1,7 @@ +{ + "type" : "Pipeline", + "uuid" : "1b7f5d6e-5db6-44e8-a45e-e9315070b940", + "name" : "Event Data Split (JSON)", + "version" : "ff930205-4eff-4ff2-bc3e-4d251e7a65bc", + "description" : "This pipeline is designed to allow one to split data into two output streams. Typically this is used to break event data into one stream that should be held for a different time period to that of another. For example, all user attributed events could go to one output stream held for 7 years and all others could go to another output stream that could be held for say 1 year." +} diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.node b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.node new file mode 100644 index 0000000..44eaac5 --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.node @@ -0,0 +1,4 @@ +name=Event Data Split (JSON) +path=Template Pipelines +type=Pipeline +uuid=1b7f5d6e-5db6-44e8-a45e-e9315070b940 diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.xml b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.xml new file mode 100644 index 0000000..3c573de --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__JSON_.Pipeline.1b7f5d6e-5db6-44e8-a45e-e9315070b940.xml @@ -0,0 +1,185 @@ + + + + + + Source + Source + + + jsonParser + JSONParser + + + readRecordCountFilter + RecordCountFilter + + + splitFilter + SplitFilter + + + preTranslationFilter + XSLTFilter + + + initialTranslationFilter + XSLTFilter + + + translationFilterA + XSLTFilter + + + translationFilterB + XSLTFilter + + + decorationFilterA + XSLTFilter + + + decorationFilterB + XSLTFilter + + + schemaFilterA + SchemaFilter + + + schemaFilterB + SchemaFilter + + + recordOutputFilterA + RecordOutputFilter + + + recordOutputFilterB + RecordOutputFilter + + + writeRecordCountFilterA + RecordCountFilter + + + writeRecordCountFilterB + RecordCountFilter + + + xmlWriterA + XMLWriter + + + xmlWriterB + XMLWriter + + + streamAppenderA + StreamAppender + + + streamAppenderB + StreamAppender + + + + + + + writeRecordCountFilterA + countRead + + false + + + + writeRecordCountFilterB + countRead + + false + + + + + + + + Source + jsonParser + + + jsonParser + readRecordCountFilter + + + readRecordCountFilter + splitFilter + + + splitFilter + preTranslationFilter + + + preTranslationFilter + initialTranslationFilter + + + initialTranslationFilter + translationFilterA + + + initialTranslationFilter + translationFilterB + + + translationFilterA + decorationFilterA + + + translationFilterB + decorationFilterB + + + decorationFilterA + schemaFilterA + + + decorationFilterB + schemaFilterB + + + schemaFilterA + recordOutputFilterA + + + schemaFilterB + recordOutputFilterB + + + recordOutputFilterA + writeRecordCountFilterA + + + recordOutputFilterB + writeRecordCountFilterB + + + writeRecordCountFilterA + xmlWriterA + + + writeRecordCountFilterB + xmlWriterB + + + xmlWriterA + streamAppenderA + + + xmlWriterB + streamAppenderB + + + + diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.meta b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.meta new file mode 100644 index 0000000..1240a41 --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.meta @@ -0,0 +1,7 @@ +{ + "type" : "Pipeline", + "uuid" : "39ac6789-2655-4040-96f2-276509ace0ae", + "name" : "Event Data Split (Text)", + "version" : "183e6941-6a31-4ce0-ad99-a98da3c156a0", + "description" : "This pipeline is designed to allow one to split data into two output streams. Typically this is used to break event data into one stream that should be held for a different time period to that of another. For example, all user attributed events could go to one output stream held for 7 years and all others could go to another output stream that could be held for say 1 year." +} diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.node b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.node new file mode 100644 index 0000000..cbfc87c --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.node @@ -0,0 +1,4 @@ +name=Event Data Split (Text) +path=Template Pipelines +type=Pipeline +uuid=39ac6789-2655-4040-96f2-276509ace0ae diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.xml b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.xml new file mode 100644 index 0000000..f60277f --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__Text_.Pipeline.39ac6789-2655-4040-96f2-276509ace0ae.xml @@ -0,0 +1,220 @@ + + + + + + Source + Source + + + dsParser + DSParser + + + readRecordCountFilter + RecordCountFilter + + + splitFilter + SplitFilter + + + preTranslationFilter + XSLTFilter + + + initialTranslationFilter + XSLTFilter + + + translationFilterA + XSLTFilter + + + decorationFilterA + XSLTFilter + + + schemaFilterA + SchemaFilter + + + recordOutputFilterA + RecordOutputFilter + + + writeRecordCountFilterA + RecordCountFilter + + + xmlWriterA + XMLWriter + + + streamAppenderA + StreamAppender + + + translationFilterB + XSLTFilter + + + decorationFilterB + XSLTFilter + + + schemaFilterB + SchemaFilter + + + recordOutputFilterB + RecordOutputFilter + + + writeRecordCountFilterB + RecordCountFilter + + + xmlWriterB + XMLWriter + + + streamAppenderB + StreamAppender + + + + + + + splitFilter + splitCount + + 1000 + + + + schemaFilterA + schemaGroup + + EVENTS + + + + schemaFilterB + schemaGroup + + EVENTS + + + + writeRecordCountFilterA + countRead + + false + + + + writeRecordCountFilterB + countRead + + false + + + + streamAppenderA + streamType + + Events + + + + streamAppenderB + streamType + + Events + + + + + + + + Source + dsParser + + + dsParser + readRecordCountFilter + + + readRecordCountFilter + splitFilter + + + splitFilter + preTranslationFilter + + + preTranslationFilter + initialTranslationFilter + + + initialTranslationFilter + translationFilterA + + + initialTranslationFilter + translationFilterB + + + translationFilterA + decorationFilterA + + + translationFilterB + decorationFilterB + + + decorationFilterA + schemaFilterA + + + decorationFilterB + schemaFilterB + + + schemaFilterA + recordOutputFilterA + + + schemaFilterB + recordOutputFilterB + + + recordOutputFilterA + writeRecordCountFilterA + + + recordOutputFilterB + writeRecordCountFilterB + + + writeRecordCountFilterA + xmlWriterA + + + writeRecordCountFilterB + xmlWriterB + + + xmlWriterA + streamAppenderA + + + xmlWriterB + streamAppenderB + + + + diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.meta b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.meta new file mode 100644 index 0000000..4410e6c --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.meta @@ -0,0 +1,6 @@ +{ + "type" : "Pipeline", + "uuid" : "27971b76-6462-4eff-b63f-a7282df265f5", + "name" : "Event Data Split (XML)", + "version" : "fa99e8ff-41bb-4238-93f0-381588d08370" +} diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.node b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.node new file mode 100644 index 0000000..64d9562 --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.node @@ -0,0 +1,4 @@ +name=Event Data Split (XML) +path=Template Pipelines +type=Pipeline +uuid=27971b76-6462-4eff-b63f-a7282df265f5 diff --git a/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.xml b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.xml new file mode 100644 index 0000000..62ae52e --- /dev/null +++ b/source/template-pipelines/stroomContent/Template_Pipelines/Event_Data_Split__XML_.Pipeline.27971b76-6462-4eff-b63f-a7282df265f5.xml @@ -0,0 +1,167 @@ + + + + + + Source + Source + + + xmlFragmentParser + XMLFragmentParser + + + readRecordCountFilter + RecordCountFilter + + + splitFilter + SplitFilter + + + preTranslationFilter + XSLTFilter + + + initialTranslationFilter + XSLTFilter + + + translationFilterA + XSLTFilter + + + translationFilterB + XSLTFilter + + + decorationFilterA + XSLTFilter + + + decorationFilterB + XSLTFilter + + + schemaFilterA + SchemaFilter + + + schemaFilterB + SchemaFilter + + + recordOutputFilterA + RecordOutputFilter + + + recordOutputFilterB + RecordOutputFilter + + + writeRecordCountFilterA + RecordCountFilter + + + writeRecordCountFilterB + RecordCountFilter + + + xmlWriterA + XMLWriter + + + xmlWriterB + XMLWriter + + + streamAppenderA + StreamAppender + + + streamAppenderB + StreamAppender + + + + + + + Source + xmlFragmentParser + + + xmlFragmentParser + readRecordCountFilter + + + readRecordCountFilter + splitFilter + + + splitFilter + preTranslationFilter + + + preTranslationFilter + initialTranslationFilter + + + initialTranslationFilter + translationFilterA + + + initialTranslationFilter + translationFilterB + + + translationFilterA + decorationFilterA + + + translationFilterB + decorationFilterB + + + decorationFilterA + schemaFilterA + + + decorationFilterB + schemaFilterB + + + schemaFilterA + recordOutputFilterA + + + schemaFilterB + recordOutputFilterB + + + recordOutputFilterA + writeRecordCountFilterA + + + recordOutputFilterB + writeRecordCountFilterB + + + writeRecordCountFilterA + xmlWriterA + + + writeRecordCountFilterB + xmlWriterB + + + xmlWriterA + streamAppenderA + + + xmlWriterB + streamAppenderB + + + + From a464f95c7e678120384afcecf0230c4d190de3ad Mon Sep 17 00:00:00 2001 From: burnalting Date: Mon, 21 Aug 2023 17:59:08 +1000 Subject: [PATCH 2/2] Updated source/template-pipelines/CHANGELOG.md to indicate change --- source/template-pipelines/CHANGELOG.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/source/template-pipelines/CHANGELOG.md b/source/template-pipelines/CHANGELOG.md index c6c3ec8..507091f 100644 --- a/source/template-pipelines/CHANGELOG.md +++ b/source/template-pipelines/CHANGELOG.md @@ -13,6 +13,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/). ### Removed +## [template-pipelines-v0.5.1] + +### Added + +* `Event Data Split (JSON)`, `Event Data Split (Text)` and `Event Data Split (XML)` pipelines to support multiple feed destinations dependent on additional filtering XSLTs + ## [template-pipelines-v0.5] ### Changed @@ -63,6 +69,7 @@ Stroom up to and including `v7.0`. [Unreleased]: https://github.com/gchq/stroom-content/compare/template-pipelines-v0.5...HEAD +[template-pipelines-v0.5.1]: https://github.com/gchq/stroom-content/compare/template-pipelines-v0.5...template-pipelines-v0.5.1 [template-pipelines-v0.5]: https://github.com/gchq/stroom-content/compare/template-pipelines-v0.4.1...template-pipelines-v0.5 [template-pipelines-v0.4.1]: https://github.com/gchq/stroom-content/compare/template-pipelines-v0.3...template-pipelines-v0.4.1 [template-pipelines-v0.4]: https://github.com/gchq/stroom-content/compare/template-pipelines-v0.3...template-pipelines-v0.4