C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
A compression algorithm is only optimal until a better one is discovered. The PAQ algorithm, a content-mixing, collaborative approach, yields superb compression ratios (at the loss of time).
Example. Archiving is boring. Often we compress a folder and store it in a directory in case a natural disaster occurs and our computer is decimated. With compression, the total size of these archives is reduced.
And: Dealing with one file at a time, rather than an entire directory of thousands of files, is much easier.
However, programmers like to optimize things—make them faster or smaller. Naturally a better compression algorithm would help with archiving. In the past I used 7-Zip, which does have good compression ratios.
Enter PAQ. This algorithm is much slower than LZMA ones, but it works and gives much better ratios. An improvement of 20% over 7-Zip is common in archive output size. This depends on the file contents.
Content mixing: PAQ chooses models (compression codes) based on context. It even considers file types, like JPG or text.
All PAQ compressors use a context mixing algorithm.... A large number of models independently predict the next bit of input. The predictions are combined using a neural network and arithmetic coded.
Data Compression Programs: mattmahoney.net
Benchmark. I used a program called fp8_v2.exe. This is an implementation of the PAQ8 algorithm. To use it, drag a folder to the executable and an output file, with extension fp8, is created in the same folder.
Example benchmark Uncompressed 11,325 KB 7Z Ultra LZMA 4,899 KB 8 s FP8 4,158 KB 131 s
Content: I compressed the files for this website, which come out to about 11 megabytes. There are 2,285 files.
JPEG: There are 118 JPEG images in the file set. With PAQ, special optimizations are applied to these files.
PNG: There are 808 PNG files. These are already heavily compressed with PNG-specific methods.
Time required. The PAQ algorithm was much slower than LZMA. It required 131 seconds to compress the directory, while 7-Zip required just 8 seconds. On larger archives, this cost would become prohibitive.
Script. Compression algorithms are complex. I have not implemented PAQ myself, but I have written Python code to invoke a PAQ8 compressor (as in a script). The program uses subprocess in Python.
Summary. If an archive will be stored for many years, a PAQ algorithm may be worth using. But the time cost for creating these archives, and expanding them, is otherwise prohibitive. PAQ presents a clear win in compression ratios.