TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

PAQ Compression Algorithms, Benchmark

This article is about the PAQ compression algorithm. It provides a short description and a compression ratio benchmarks.

PAQ compression. Compression, like optimization, is undecidable.

A compression algorithm is only optimal until a better one is discovered. The PAQ algorithm, a content-mixing, collaborative approach, yields superb compression ratios (at the loss of time).

Example. Archiving is boring. Often we compress a folder and store it in a directory in case a natural disaster occurs and our computer is decimated. With compression, the total size of these archives is reduced.

And: Dealing with one file at a time, rather than an entire directory of thousands of files, is much easier.

However, programmers like to optimize things—make them faster or smaller. Naturally a better compression algorithm would help with archiving. In the past I used 7-Zip, which does have good compression ratios.

7-Zip Command-Line

 

Enter PAQ. This algorithm is much slower than LZMA ones, but it works and gives much better ratios. An improvement of 20% over 7-Zip is common in archive output size. This depends on the file contents.

Content mixing: PAQ chooses models (compression codes) based on context. It even considers file types, like JPG or text.

All PAQ compressors use a context mixing algorithm.... A large number of models independently predict the next bit of input. The predictions are combined using a neural network and arithmetic coded.

Data Compression Programs: mattmahoney.net

Benchmark. I used a program called fp8_v2.exe. This is an implementation of the PAQ8 algorithm. To use it, drag a folder to the executable and an output file, with extension fp8, is created in the same folder.

PAQ8 Download Page

Example benchmark

Uncompressed     11,325 KB
7Z Ultra LZMA     4,899 KB        8 s
FP8               4,158 KB      131 s

Content: I compressed the files for this website, which come out to about 11 megabytes. There are 2,285 files.

JPEG: There are 118 JPEG images in the file set. With PAQ, special optimizations are applied to these files.

PNG: There are 808 PNG files. These are already heavily compressed with PNG-specific methods.

PNG

Time required. The PAQ algorithm was much slower than LZMA. It required 131 seconds to compress the directory, while 7-Zip required just 8 seconds. On larger archives, this cost would become prohibitive.

Script. Compression algorithms are complex. I have not implemented PAQ myself, but I have written Python code to invoke a PAQ8 compressor (as in a script). The program uses subprocess in Python.

PAQ, Python

Summary. If an archive will be stored for many years, a PAQ algorithm may be worth using. But the time cost for creating these archives, and expanding them, is otherwise prohibitive. PAQ presents a clear win in compression ratios.


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf