C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
compression algorithm. It is effective at compressing text files containing natural language text. The 7-Zip open-source compression utility provides several compression options. It includes the PPMd algorithm.
Original file: URL: http://shakespeare.mit.edu/macbeth/full Compressed gzip on server: 55186 bytes Decompressed html on disk: 195747 bytes PPMd Test: Format: 7z Compression level: Ultra Dictionary size: 192 MB Word size: 32 Solid block size: 'solid' SIZE: 37211 bytes GZIP Test: Format: GZip Compression level: Ultra Dictionary size: 32 KB Word size: 258 bytes SIZE: 52530 bytes LZMA Test: Format: 7z Compression level: Ultra Dictionary size: 64 MB Word size: 273 Solid block size: 'solid' SIZE: 47921 bytes BZIP2 Test: Format: 7z Compression level: Ultra Dictionary size: 900 KB Solid block size: 'solid' SIZE: 39892 bytes
Intro. The acronym PPMd stands for Prediction by Partial Matching. It describes an algorithm that chooses how to compress data further in the stream by the data it has most recently encountered.
Also: Wikipedia describes this approach as an adaptive statistical data compression technique.
Prediction by partial matching
7-Zip. It is simple to use PPMd in recent versions of the 7-Zip open-source compression utility for Windows. To compress a file with the PPMd algorithm, right-click on the file in Windows explorer and select 7-Zip > Add to archive.
Then: When the archive format is specified as 7z, you can select "PPMd" for the compression method.
Results. The document tested was the Macbeth play hosted on the MIT campus network. This document is fairly large (around 200 KB) when uncompressed. On the MIT servers, the document is gzipped and is about 55000 bytes.
When the document was compressed with PPMd, the size was reduced to about 37000 bytes, which was a better ratio than GZIP, BZIP2, and LZMA yielded. PPMd here had an improvement of about 32.57% percent over the GZIP algorithm used at MIT.
Uses. The PPMd algorithm yields outstanding compression ratios for text files, which make up most of the world's important information. It could greatly decrease the bandwidth on many internet sites if it were implemented in all software.
However: This is not imminent. Its best use is to compress private text files or database dumps containing important information.
Summary. The PPMd compression algorithm implemented in 7-Zip yields excellent results on text files. The PPMd algorithms were used as the basis for the PAQ algorithms. These implement the best lossless compression.
Note: They received the Hutter Prize, which promotes artificial intelligence research.
And: The results show how PPMd can be useful in decreasing text file compression sizes.