Skip to content

unwind/python-lzjb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-lzjb: lzjb in pure Python

This is a port of Jeff Bonwick's lzjb compression algorithm to pure Python. This compression scheme is used in the ZFS filesystem.

One of its main features is very small memory requirements for decompression. This can make it a suitable choice when adding compression in memory-constrained environments, such as in embedded development.

The name is perhaps not optimal. I didn't want to come up with a "fancy" name that has no meaning. I know of the pylzjb project, which provides Python bindings for a C implementation of lzjb.

Status

This code is starting to feel quite mature and polished. This feeling is helped by the fact that it's very short, the core functions occupy less than 150 lines, including docstrings. The only thing I can think of to do would be more profiling/optimization, but it does seem to work already.

Installation

Like any Python package, setup.py is used. Installation is a two-step process:

  1. $ ./setup.py build
  2. $ ./setup.py install

Unlike pylzjb, the module install name for this project is simply lzjb. I think this makes sense, it should be kind of obvious that the imported module is for Python.

License

This is open source, distributed under the BSD 2-clause license.

Tests

To ensure compatibility with the public C code for LZJB compression, automatic testing is performed. A simple shell script runs python-lzjb against both the C code and itself, on a set of 30 files. The test script emits a simple matrix which quickly shows when something breaks.

This package is designed to work with both Python 2.x and 3.x from the same source. It has been tested on Python 2.7.17 and Python 3.7.5, by running the test script. The test "framework" is rather Unix-centric, apologies. It could/should probably be rewritten in Python to be more portable.

Performance

The main goal when implementing this has been correctness and (sort of) clarity by closely following the original C code. On my not-so-hot laptop (Intel® Core™ i5 M 480 @ 2.67GHz) it currently achieves around 1.1 MB/s when compressing.

API

The package's API is extremely simple. Data is managed as Python bytearray objects.

There are two groups of functions: size encoding/decoding, and data compression/decompression.

The size functions are mainly intended to help with creating suitable header data for compressed data. They support a simple variable-length integer encoding format which can be used to prepend compressed data with the size of the uncompressed, original, data. The compression/decompressions themselves do not support or expect any header data, that is up to the application to provide.

The text below is extracted from the source code's docstrings by the docbuilder.py program.

##Size encoding##

size_encode(size, dst = None)

Encodes the given size in little-endian variable-length encoding.

The dst argument can be an existing bytearray to append the size. If it's omitted (or None), a new bytearray is created and used.

Returns the destination bytearray.

size_decode(src)

Decodes a size (encoded with size_encode()) from the start of src.

Returns a tuple (size, len) where size is the size that was decoded, and len is the number of bytes from src that were consumed.

##Data compression##
compress(src, dst = None)

Compresses src, the source bytearray.

If dst is not None, it's assumed to be the output bytearray and bytes are appended to it using dst.append(). If it is None, a new bytearray is created.

The destination bytearray is returned.

decompress(src, dst = None)

Decompresses src, a bytearray of compressed data.

The dst argument can be an optional bytearray which will have the output appended. If it's None, a new bytearray is created.

The output bytearray is returned.

Inspiration

This was ported to Python based on:

Thanks of course to these authors for contributing their code as open source.

About

A Python port of the LZJB compression algorithm

Resources

License

Stars

Watchers

Forks

Packages

No packages published