Slow-ish #2

r3m0t · 2014-09-09T13:26:33Z

It would be faster by using realloc instead of malloc for our strings (avail_out) and thus putting that loop into C-space.

dstromberg · 2017-09-29T22:41:40Z

Using this with pypy3 I'm finding that compression is faster than CPython's lzma module, but decompression is a little more than 30% slower than CPython's lzma module. I'd love to see a speedup.

r3m0t · 2017-09-30T11:04:36Z

Huh, I have a user! What compression ratio do your files have? Try setting file._decrompressor._bufsiz = io.DEFAULT_BUFFER_SIZE * 5 if your compression ratio is 0.2 for example. Do that before your first call to read().

You're the first user I know of outside of my then-employee, so I'm pretty chuffed. Are you using xz blocks, or you just wanted a pypy-friendly xz library?

dstromberg · 2017-09-30T22:30:40Z

I'm using lzma (xz) compression in a filesystem backup program: http://stromberg.dnsalias.org/~strombrg/backshift/
I have one 3+ terabyte backup repo for a half dozen machines, comprising my home backups. Some files are compressed with CPython's native lzma module, some are compressed with a ctypes-based xz module, and some are compressed with the lzma module that comes with pypy3 which I believe is your lzmaffi module.

Because I'm backing up many different file types, I believe my compression ratios are all over the place, but I'm not displeased with them. Here's a 3 year old analysis of compression in my personal backshift use: http://stromberg.dnsalias.org/~strombrg/backshift/documentation/for-all/chunk-sizes.html . I believe I'm getting just-OK compression because much of what I'm backing up is DVD rips, which is of course already lossily compressed.

I'm not looking for xz blocks; I just want something that'll work on pypy3 faster than the xz+ctypes code I wrote and faster than CPython's lzma module, so I can switch to pypy3 for backups. Right now, your first backup of a given filesystem is faster with pypy3 and subsequent backups are faster with cpython3. I'd like to find a way to get to both being faster with pypy3.

As I said, your module appears to be faster for compression, but slower for decompression. Initial backups are compression-heavy with very little decompression, but subsequent backups are doing both compression and decompression.

Here's some code I've been using to performance-test two of the different lzma modules:
http://stromberg.dnsalias.org/svn/utime-performance-comparison/trunk

There was a memory leak in the lzma module that comes with pypy3; they fixed that. Did they let you know about it? Or even get it from you?

I don't see anything about bufsiz in /usr/local/pypy3-5.8.0-with-lzma-fixes/lib-python/3/lzma.py . I'm now starting to wonder if pypy3's lzma code has diverged significantly from yours.

Thanks!

r3m0t · 2017-10-01T01:08:09Z

I didn't realise they had forked my project no! I assumed pypy3 might want a more compatible copy of the CPython lzma module (without the extra features I added). But I'm very happy to see it there.

It hasn't received huge changes, the class LZMADecompressor is just implemented in the _lzma module rather than the lzma module. If I take your microbenchmark and change _bufsiz on the LZMADecompressor instance to the expected decompressed size 1000000 then pypy3 and CPython become practically the same speed (0.37s), while setting it to 1000000-1 makes it slow again (0.48s).

The algorithm for growing the buffer that liblzma outputs decompressed data is quite bizarre, as it calls realloc 2479 times in this case. If it simply grew the output buffer by 8KB each time, it would only call it 122 times. And all The algorithm is from the CPython module's source code though.

Anyway, I'm not sure this is a good benchmark when your actual decompressed data is only 10% bigger than the compressed data. Also the usual caveats about benchmarking pypy apply - it's slow when it starts, but after a few thousand loops the JIT has kicked in and you'll see its real speed. I added another 10 or 100 calls to alt_lzma_decompression_test and the last call took 0.43 or 0.41 seconds instead of the usual 0.48.

Well, good luck finding the reason for the speed difference. :)

dstromberg · 2017-10-02T23:53:47Z

Where did you change _bufsiz? I tried: for counter in range(100): _unused = counter decompressor = lzma.LZMADecompressor(format=lzma.FORMAT_XZ, memlimit=max_size) if hasattr(decompressor, '_bufsiz'): print('Setting _bufsize to {}'.format(max_size)) decompressor._bufsiz = max_size result = decompressor.decompress(compressed_data) _unused = result ...but I didn't get a speedup. (You can see this in context at http://stromberg.dnsalias.org/svn/utime-performance-comparison/trunk/upc) Would you consider sending a small diff? Thanks for thinking about it!

…

On Sat, Sep 30, 2017 at 6:08 PM, Tomer Chachamu ***@***.***> wrote: I didn't realise they had forked my project no! I assumed pypy3 might want a more compatible copy of the CPython lzma module (without the extra features I added). But I'm very happy to see it there. It hasn't received huge changes, the class LZMADecompressor is just implemented in the _lzma module rather than the lzma module. If I take your microbenchmark and change _bufsiz on the LZMADecompressor instance to the expected decompressed size 1000000 then pypy3 and CPython become practically the same speed (0.37s), while setting it to 1000000-1 makes it slow again (0.48s). The algorithm for growing the buffer that liblzma outputs decompressed data is quite bizarre, as it calls realloc 2479 times in this case. If it simply grew the output buffer by 8KB each time, it would only call it 122 times. And all The algorithm is from the CPython module's source code though. Anyway, I'm not sure this is a good benchmark when your actual decompressed data is only 10% bigger than the compressed data. Also the usual caveats about benchmarking pypy apply - it's slow when it starts, but after a few thousand loops the JIT has kicked in and you'll see its real speed. I added another 10 or 100 calls to alt_lzma_decompression_test and the last call took 0.43 or 0.41 seconds instead of the usual 0.48. Well, good luck finding the reason for the speed difference. :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA0yGscxVGEUIX4yEhdYfB6Zn7jOd8dOks5snuX5gaJpZM4CgDze> .

-- Dan Stromberg

dstromberg · 2017-10-12T03:07:29Z

In Pypy3 5.9, xz decompression is more than twice as slow as that found in CPython 3.6. So it's actually gotten worse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow-ish #2

Slow-ish #2

r3m0t commented Sep 9, 2014

dstromberg commented Sep 29, 2017

r3m0t commented Sep 30, 2017

dstromberg commented Sep 30, 2017

r3m0t commented Oct 1, 2017

dstromberg commented Oct 2, 2017 via email

dstromberg commented Oct 12, 2017

Slow-ish #2

Slow-ish #2

Comments

r3m0t commented Sep 9, 2014

dstromberg commented Sep 29, 2017

r3m0t commented Sep 30, 2017

dstromberg commented Sep 30, 2017

r3m0t commented Oct 1, 2017

dstromberg commented Oct 2, 2017 via email

dstromberg commented Oct 12, 2017