Skip to content

ValueError in KafkaEvent.decoded_headers when processing non-ASCII characters in headers #6862

Open
@ipt-jbu

Description

@ipt-jbu

Expected Behaviour

record.decoded_headers should be able to correctly process header values that are represented as an array of signed bytes, without raising a ValueError. The signed bytes should be correctly converted to their unsigned representation before being decoded.

For a header {"test-header": "hello-world-ë"}, which AWS might deliver as {'test-header': [104, 101, 108, 108, 111, 45, 119, 111, 114, 108, 100, -61, -85]}, the decoded_headers property should return a dictionary like:

{'test-header': 'hello-world-ë'}

Current Behaviour

When the Lambda function is invoked with a Kafka record containing a non-ASCII character in a header, the code fails with a ValueError. This happens because the header value is passed by AWS as a list of signed bytes (e.g., [-61, -85] for ë), and the bytes() function in Powertools cannot process the negative numbers.

See the full traceback in the "Debugging logs" section.

Code snippet

from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.parser import event_source
from aws_lambda_powertools.utilities.parser.models import KafkaEvent

@event_source(data_class=KafkaEvent)
def lambda_handler(event: KafkaEvent, context: LambdaContext):
    # This is the event structure we receive from AWS for a header like
    # "test-header": "hello-world-ë"
    # The header value is serialized as:
    # [-27, -122, -103, 108, 111, 45, 119, 111, 114, 108, 100, -61, -85]

    for record in event.records:
        try:
            # The following line will raise the ValueError
            decoded_headers = record.decoded_headers
            print(f"Decoded headers: {decoded_headers}")
        except ValueError as e:
            print(f"Error decoding headers: {e}")
            # Log the raw header value that causes the issue
            for header in record.headers:
                if 'test-header' in header:
                     print(f"Raw header value: {header['test-header']}")

Possible Solution

A potential fix could be to convert the signed bytes to unsigned bytes before calling bytes().

For example:

# Example of converting signed to unsigned bytes for a value like [-61, -85]
signed_bytes = [-61, -85]
unsigned_bytes = [b & 255 for b in signed_bytes] # Results in [195, 171]
bytes(unsigned_bytes).decode('utf-8') # Correctly decodes to 'ë'

Steps to Reproduce

  1. Create a Lambda function with a Kafka trigger.
  2. Use the Powertools @event_source(data_class=KafkaEvent) decorator on the handler.
  3. Type-hint the event parameter as KafkaEvent: def lambda_handler(event: KafkaEvent, context: LambdaContext):.
  4. Send a Kafka message with a header containing a non-ASCII character (e.g., test-header: hello-world-ë).
  5. Inside the handler, try to access the decoded headers of a record: headers = event.records[0].decoded_headers.

Powertools for AWS Lambda (Python) version

3.13.0

AWS Lambda function runtime

3.12

Packaging format used

PyPi

Debugging logs

ValueError: bytes must be in range(0, 256)

at kafka_event.py: 83

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriagePending triage from maintainers

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions