python-lzjb: lzjb in pure Python

This is a port of Jeff Bonwick's lzjb compression algorithm to pure Python. This compression scheme is used in the ZFS filesystem.

One of its main features is very small memory requirements for decompression. This can make it a suitable choice when adding compression in memory-constrained environments, such as in embedded development.

The name is perhaps not optimal. I didn't want to come up with a "fancy" name that has no meaning. I know of the pylzjb project, which provides Python bindings for a C implementation of lzjb.

Status

This code is starting to feel quite mature and polished. This feeling is helped by the fact that it's very short, the core functions occupy less than 150 lines, including docstrings. The only thing I can think of to do would be more profiling/optimization, but it does seem to work already.

Installation

Like any Python package, setup.py is used. Installation is a two-step process:

$ ./setup.py build
$ ./setup.py install

Unlike pylzjb, the module install name for this project is simply lzjb. I think this makes sense, it should be kind of obvious that the imported module is for Python.

License

This is open source, distributed under the BSD 2-clause license.

Tests

To ensure compatibility with the public C code for LZJB compression, automatic testing is performed. A simple shell script runs python-lzjb against both the C code and itself, on a set of 30 files. The test script emits a simple matrix which quickly shows when something breaks.

This package is designed to work with both Python 2.x and 3.x from the same source. It has been tested on Python 2.7.17 and Python 3.7.5, by running the test script. The test "framework" is rather Unix-centric, apologies. It could/should probably be rewritten in Python to be more portable.

Performance

The main goal when implementing this has been correctness and (sort of) clarity by closely following the original C code. On my not-so-hot laptop (Intel® Core™ i5 M 480 @ 2.67GHz) it currently achieves around 1.1 MB/s when compressing.

API

The package's API is extremely simple. Data is managed as Python bytearray objects.

There are two groups of functions: size encoding/decoding, and data compression/decompression.

The size functions are mainly intended to help with creating suitable header data for compressed data. They support a simple variable-length integer encoding format which can be used to prepend compressed data with the size of the uncompressed, original, data. The compression/decompressions themselves do not support or expect any header data, that is up to the application to provide.

The text below is extracted from the source code's docstrings by the docbuilder.py program.

##Size encoding##

size_encode(size, dst = None)
size_decode(src)

##Data compression##

compress(src, dst = None)
decompress(src, dst = None)

Inspiration

This was ported to Python based on:

The original C code
The JavaScript port, which adds the inclusion of the uncompressed data size as a prefix

Thanks of course to these authors for contributing their code as open source.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
doc		doc
test		test
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
lzjb.py		lzjb.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

python-lzjb: lzjb in pure Python

Status

Installation

License

Tests

Performance

API

Inspiration

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

unwind/python-lzjb

Folders and files

Latest commit

History

Repository files navigation

python-lzjb: lzjb in pure Python

Status

Installation

License

Tests

Performance

API

Inspiration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages