Skip to content

Dataflow: add batch mode #108

@mgolosova

Description

@mgolosova

Batch mode allows to process more than one message at once.
It is essential for Sink Connector stages (like Stage 060 or Stage 069), as most of storages support bulk data load -- and it makes things work much faster, than as if we were uploading one record at a time.
Also it might be useful for processing stages (see PR #84).

As currently we use common library only for Processing stages, not connectors, we can add batch mode for them (and later, possibly, reuse for Sink Connectors).

What is needed:

  • add batch processing machinery to pyDKB library.
    Parameters:
    • batch processing on/off;
    • max batch size;
    • max batch time (how long to wait before starting the processing for less then max_batch_size messages);
  • make it reusable (keep Sink Connectors in mind);
  • make it optional (so that if batch mode is not supported (like in already developed stages), it wouldn`t break anything; at most, produce a warning message on startup);
  • (possibly) update existing stages (if it is still necessary in spite of the prev. item);
  • implement for Stage 95 for testing;

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions