TheDeveloperBlog.com


PAQ Compression Algorithms, Benchmark

Example. Archiving is boring. Often we compress a folder and store it in a directory in case a natural disaster occurs and our computer is decimated. With compression, the total size of these archives is reduced.

And: Dealing with one file at a time, rather than an entire directory of thousands of files, is much easier.

However, programmers like to optimize things—make them faster or smaller. Naturally a better compression algorithm would help with archiving. In the past I used 7-Zip, which does have good compression ratios.

7-Zip Command-Line

Enter PAQ. This algorithm is much slower than LZMA ones, but it works and gives much better ratios. An improvement of 20% over 7-Zip is common in archive output size. This depends on the file contents.

Content mixing: PAQ chooses models (compression codes) based on context. It even considers file types, like JPG or text.

All PAQ compressors use a context mixing algorithm.... A large number of models independently predict the next bit of input. The predictions are combined using a neural network and arithmetic coded.

Data Compression Programs: mattmahoney.net

Benchmark. I used a program called fp8_v2.exe. This is an implementation of the PAQ8 algorithm. To use it, drag a folder to the executable and an output file, with extension fp8, is created in the same folder.

PAQ8 Download Page
Example benchmark

Uncompressed     11,325 KB
7Z Ultra LZMA     4,899 KB        8 s
FP8               4,158 KB      131 s

Content: I compressed the files for this website, which come out to about 11 megabytes. There are 2,285 files.

JPEG: There are 118 JPEG images in the file set. With PAQ, special optimizations are applied to these files.

PNG: There are 808 PNG files. These are already heavily compressed with PNG-specific methods.

PNG

Time required. The PAQ algorithm was much slower than LZMA. It required 131 seconds to compress the directory, while 7-Zip required just 8 seconds. On larger archives, this cost would become prohibitive.


Script. Compression algorithms are complex. I have not implemented PAQ myself, but I have written Python code to invoke a PAQ8 compressor (as in a script). The program uses subprocess in Python.

PAQ, Python

Summary. If an archive will be stored for many years, a PAQ algorithm may be worth using. But the time cost for creating these archives, and expanding them, is otherwise prohibitive. PAQ presents a clear win in compression ratios.