Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

Incorrect use of sys.getsizeof in trim_message #20

Open
roosmaa opened this issue May 3, 2017 · 2 comments
Open

Incorrect use of sys.getsizeof in trim_message #20

roosmaa opened this issue May 3, 2017 · 2 comments

Comments

@roosmaa
Copy link

roosmaa commented May 3, 2017

In platform_strategy.py:

    def trim_message(self, message):
        import sys
        trim_length = SCARFACE_DEFAULT_MESSAGE_TRIM_LENGTH
        if hasattr(settings, 'SCARFACE_MESSAGE_TRIM_LENGTH'):
            trim_length = settings.SCARFACE_MESSAGE_TRIM_LENGTH

        if sys.getsizeof(message) > trim_length:
            while sys.getsizeof(message) > trim_length:
                message = message[:-3]
            message += '...'
        return message

Usage of sys.getsizeof is fundamentally incorrect to measuring message length. getsizeof returns the internal size of the Python object, not the actual encoded size of the final message. TL;DR; it kind of works for ascii, but utterly breaks down for unicode strings.

Here's a quick demonstration from an interactive shell:

>>> sys.getsizeof('') # Basic empty string takes up 49 bytes
49
>>> sys.getsizeof('For ascii string it kind of works, but not really')
98
>>> sys.getsizeof('Let\'s add some ünicode to themix, 👌🏻')
220
>>> # Quickly the sizeof spikes, as it is the internal representation after all.
>>> # Let's try again by not relying on how the string is implemented behind the
>>> # scenes. Instead we encode the string to a byte array, which most likely
>>> # has less magic and grows in size more predictably.
>>> sys.getsizeof('For ascii string it kind of works, but not really'.encode('utf8'))
82
>>> sys.getsizeof('Let\'s add some ünicode to themix, 👌🏻'.encode('utf8'))
76
>>> # Much better, but in this context, getsizeof really shouldn't be used,
>>> # instead we should be counting the bytes that will be actually used up
>>> # when sending the pushes.
>>> len('For ascii string it kind of works, but not really'.encode('utf8'))
49
>>> len('Let\'s add some ünicode to themix, 👌🏻'.encode('utf8'))
43
@melbic
Copy link
Member

melbic commented May 3, 2017

Hey @roosmaa, thanks for pointing this out. Could you please create a pull request for this issue?

@roosmaa
Copy link
Author

roosmaa commented May 3, 2017

@melbic You can cherry-pick this commit villoid@277e7ba

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants