Skip to content

Redpanda output with idempotent_writes makes unordered results #4387

@pserwacinski

Description

@pserwacinski

Hi Team

I have unordered results using redpanda input/output. Source topic has been replied 3 times, so it contains the same blocks of messages. For some reason, the 3rd block got disordered as on screenshot below.

Image

My input and output configuration looks like below

output:
    label: kafka_output
    processors:
        - label: remove_build_in_meta_added
          mapping: |-
            let blacklist = ["kafka_topic", "kafka_partition", "kafka_offset",
            "kafka_lag", "kafka_timestamp_unix", "kafka_tombstone_message",
            "backoff_duration", "retry_count", "schema_id"]

            meta = metadata().filter(kv -> !$blacklist.contains(kv.key))
    redpanda:
        seed_brokers:
            - output-kafka-bootstrap.svc.cluster.local:9092
        topic: ${! meta("topic") }
        key: ${! meta("kafka_key") }
        timestamp_ms: ${! meta("kafka_timestamp_ms") }
        partitioner: murmur2_hash
        max_message_bytes: 5MB
        compression: zstd
        metadata:
            include_patterns:
                - .*
        sasl:
            - username: streams
              password: ${sasl_password}
              mechanism: SCRAM-SHA-512
        tls:
            enabled: true
            skip_cert_verify: true
        idempotent_write: true
        max_in_flight: 1
input:
    label: kafka_input
    redpanda:
        seed_brokers:
            - input-kafka-bootstrap.svc.cluster.local:9092
        topics:
            - source-topic
        consumer_group: source-topic
        auto_replay_nacks: true
        sasl:
            - username: streams
              password: ${sasl_password}
              mechanism: SCRAM-SHA-512
        tls:
            enabled: true
            skip_cert_verify: true
        fetch_max_partition_bytes: 5MB
        fetch_max_bytes: 50MB
        transaction_isolation_level: read_committed
        max_yield_batch_bytes: 5MB

I am using red panda connect version v4.61.0

I use idempotent writes and max in flight 1, I understand there might be duplicates as this is not the same as exactly once semantic, but to my understanding this should ensure topic is effectively ordered by looking at given key, meaning that in my example last record should have value 4529.49 and not 2203.57

Can you please tell me if this configuration should ensure data is effectively ordered within the scope of the same key?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions