TheDeveloperBlog.com

Home | Contact Us

CSharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript

PAQ Compression Algorithms, Benchmark

This article is about the PAQ compression algorithm. It provides a short description and a compression ratio benchmarks.

PAQ compression. Compression, like optimization, is undecidable.

A compression algorithm is only optimal until a better one is discovered. The PAQ algorithm, a content-mixing, collaborative approach, yields superb compression ratios (at the loss of time).

Example. Archiving is boring. Often we compress a folder and store it in a directory in case a natural disaster occurs and our computer is decimated. With compression, the total size of these archives is reduced.

And: Dealing with one file at a time, rather than an entire directory of thousands of files, is much easier.

However, programmers like to optimize things—make them faster or smaller. Naturally a better compression algorithm would help with archiving. In the past I used 7-Zip, which does have good compression ratios.

7-Zip Command-Line

 

Enter PAQ. This algorithm is much slower than LZMA ones, but it works and gives much better ratios. An improvement of 20% over 7-Zip is common in archive output size. This depends on the file contents.

Content mixing: PAQ chooses models (compression codes) based on context. It even considers file types, like JPG or text.

All PAQ compressors use a context mixing algorithm.... A large number of models independently predict the next bit of input. The predictions are combined using a neural network and arithmetic coded.

Data Compression Programs: mattmahoney.net

Benchmark. I used a program called fp8_v2.exe. This is an implementation of the PAQ8 algorithm. To use it, drag a folder to the executable and an output file, with extension fp8, is created in the same folder.

PAQ8 Download Page

Example benchmark

Uncompressed     11,325 KB
7Z Ultra LZMA     4,899 KB        8 s
FP8               4,158 KB      131 s

Content: I compressed the files for this website, which come out to about 11 megabytes. There are 2,285 files.

JPEG: There are 118 JPEG images in the file set. With PAQ, special optimizations are applied to these files.

PNG: There are 808 PNG files. These are already heavily compressed with PNG-specific methods.

PNG

Time required. The PAQ algorithm was much slower than LZMA. It required 131 seconds to compress the directory, while 7-Zip required just 8 seconds. On larger archives, this cost would become prohibitive.

Script. Compression algorithms are complex. I have not implemented PAQ myself, but I have written Python code to invoke a PAQ8 compressor (as in a script). The program uses subprocess in Python.

PAQ, Python

Summary. If an archive will be stored for many years, a PAQ algorithm may be worth using. But the time cost for creating these archives, and expanding them, is otherwise prohibitive. PAQ presents a clear win in compression ratios.