In compression we apply algorithms that change data to require less physical memory. This slows down parts of programs. But it makes other parts faster: less data needs transferring.
Gzip. An important compression algorithm is GZIP. Many web pages are transferred with GZIP compression. This reduces the time required to load pages. In Python, we can use the gzip module.
And: We open the source file and then open an output file. We then apply the gzip open() method to write the compressed file.
Tip: The with statement is helpful here. This statement ensures that system resources are properly freed.
Based on: Python 3 Python program that uses gzip import gzip # Open source file. with open("C:\perls.txt", "rb") as file_in: # Open output file. with gzip.open("C:\perls.gz", "wb") as file_out: # Write output. file_out.writelines(file_in) Result perls.txt size: 4404 bytes perls.gz size: 2110 bytes
Decompress. The gzip module has two main functions in Python. It compresses data and decompresses data. In this next example we decompress the same file that was written in the previous program.
And: In the output, we find the original file length is the same. No data was lost. We also have the file, in string format, in memory.
So: Instead of written the decompressed file to disk and reading that in, we can directly use the string contents.
Python program that decompresses file import gzip # Use open method. with gzip.open("C:\perls.gz", "rb") as f: # Read in string. content = f.read() # Print length. print(len(content)) Output 4404
7-Zip. Python comes with built-in support for compression. But we do not need to use that support. We can use external programs, like the 7-Zip compression software, to compress files. We invoke an external binary file.
Here, we create a Windows command line that invokes the 7za.exe program. Please download 7za.exe from the 7-Zip website. I recommend 7-Zip: it is open source and has excellent compression ratios. We specify 7za.exe on the C volume.
Source: The source string is used to build up the command line. This is the uncompressed file. You will need to change this.
Target: The target is another location on the disk. The compression version is written there.
We use the subprocess module to call the executable. The subprocess.call method is an easy to way to invoke an external program. It works in all platforms that Python supports. More information on subprocess is available.
Python program that uses subprocess, 7-Zip import subprocess exe = "C:\\7za.exe" source = "C:\profiles\strong.bin" target = "C:\profiles\strong.7z" subprocess.call(exe + " a -t7z \"" + target + "\" \"" + source + "\" -mx=9") Output 7-Zip (A) 9.07 beta Copyright (c) 1999-2009 Igor Pavlov 2009-08-29 Scanning Creating archive C:\profiles\strong.7z Compressing strong.bin Everything is Ok
PAQ. This example uses subprocess to launch a more powerful compressor, PAQ. A PAQ executable is available in downloadable archives on the Internet. Here I use a PAQ8 implementation. The same command compresses, and expands, a file.
Raw: Please notice how the raw string syntax is used for paths—this should be used for Windows-style paths.
Python program that calls PAQ compressor import subprocess # This program handles compressed (fp8) files and non-compressed ones. # ... It decompresses or compresses. exe = r"C:\fp8_v2.exe" source = r"C:\profiles\file.bin" subprocess.call(exe + " " + source) Output Creating archive C:\profiles\file.bin.fp8 with 1 file(s)... File list (18 bytes) Compressed from 18 to 22 bytes. 1/1 Filename: C:/profiles/file.bin (3836648 bytes) Block segmentation: 0 | default | 28016 bytes [0 - 28015] 1 | jpeg | 8917 bytes [28016 - 36932] 2 | default | 3799715 bytes [36933 - 3836647] Compressed from 3836648 to 114930 bytes. Total 3836648 bytes compressed to 114958 bytes. Time 44.26 sec, used 180892539 bytes of memory Close this window or press ENTER to continue...
Summary. Data compression becomes increasingly important. As time passes, the quantity of data increases. And with Python and the gzip module, we have an easy way to use a powerful, compatible algorithm.
Thus: Compression collapses space. And it often reduces the time required to process data.