Bzip3

bzip3 is a free and open source data compressor developed by Kamila Szewczyk based on the Burrows-Wheeler transform.
Features
bzip3 was designed to outperform bzip2 compression ratio-wise. bzip3 does not consider the compression levels as a metric of its performance, allowing the user to explicitly specify the compression block size instead (with the premise that the bigger the block, the better the compression becomes). The bzip3 package includes parallel (multi-threaded) implementations of both compression and decompression. On text data, bzip3 generally outperforms most common data compressors, including bzip2, zstd, LZMA and RAR.
Like gzip or bzip2, bzip3 is only a data compressor. It is not an archiver like tar; it has no facilities for handling multiple files or encryption.
Implementation
bzip3 uses a combined approach of various data compression algorithms that occur in the following order:
* A special variant of Run-length encoding (RLE) performed on the initial data, which estimates the savings that collapsing every run of a byte would give and adjusts accordingly before performing the operation.
* The Lempel Ziv + Prediction algorithm (LZP), a special variant of the Reduced-Offset Lempel Ziv algorithm (ROLZ) with long minimum match length (40 bytes).
* Burrows-Wheeler transform (BWT). The BWT is computed in linear time by first constructing a suffix array of the text and then deducing the BWT string as <math>BWT=S</math>. This approach has a linear time complexity, as opposed to the approach used by bzip2, which runs in <math>O(N^2 \log(N))</math> time complexity.
* Arithmetic coding using a context modelling predictor.
bzip3 performance is generally symmetric, however parallel encoding of blocks as of version 1.1.4 gives almost linear speedup on multi-CPU and multi-core computers, because each block is encoded independently.
The reference implementation is written in C and there exist bindings to Python, PHP, Racket and Lua.
File format
Files produced by bzip3 start with ASCII "<code>BZ3v1</code>", followed by the 32-bit maximum block size.
 
< Prev   Next >