Python Compression Examples: GZIP, 7-Zip

These Python examples use the gzip module and 7-Zip to compress data. They compare output sizes.

Compression trades time for space.

In compression we apply algorithms that change data to require less physical memory. This slows down parts of programs. But it makes other parts faster: less data needs transferring.

 

 

Gzip. An important compression algorithm is GZIP. Many web pages are transferred with GZIP compression. This reduces the time required to load pages. In Python, we can use the gzip module.

 

And: We open the source file and then open an output file. We then apply the gzip open() method to write the compressed file.

Tip: The with statement is helpful here. This statement ensures that system resources are properly freed.

Based on:

Python 3

Python program that uses gzip

import gzip

# Open source file.
with open("C:\perls.txt", "rb") as file_in:
    # Open output file.
    with gzip.open("C:\perls.gz", "wb") as file_out:
        # Write output.
        file_out.writelines(file_in)

Result

perls.txt size: 4404 bytes
perls.gz  size: 2110 bytes

 

Decompress. The gzip module has two main functions in Python. It compresses data and decompresses data. In this next example we decompress the same file that was written in the previous program.

 

And: In the output, we find the original file length is the same. No data was lost. We also have the file, in string format, in memory.

So: Instead of written the decompressed file to disk and reading that in, we can directly use the string contents.

Python program that decompresses file

import gzip

# Use open method.
with gzip.open("C:\perls.gz", "rb") as f:
    # Read in string.
    content = f.read()

    # Print length.
    print(len(content))

Output

4404

 

7-Zip. Python comes with built-in support for compression. But we do not need to use that support. We can use external programs, like the 7-Zip compression software, to compress files. We invoke an external binary file.

 

Here, we create a Windows command line that invokes the 7za.exe program. Please download 7za.exe from the 7-Zip website. I recommend 7-Zip: it is open source and has excellent compression ratios. We specify 7za.exe on the C volume.

7-Zip: Official Website

Source: The source string is used to build up the command line. This is the uncompressed file. You will need to change this.

Target: The target is another location on the disk. The compression version is written there.

We use the subprocess module to call the executable. The subprocess.call method is an easy to way to invoke an external program. It works in all platforms that Python supports. More information on subprocess is available.

Subprocess: Python.org

Python program that uses subprocess, 7-Zip

import subprocess

exe = "C:\\7za.exe"
source = "C:\profiles\strong.bin"
target = "C:\profiles\strong.7z"

subprocess.call(exe + " a -t7z \"" + target + "\" \"" + source + "\" -mx=9")

Output

7-Zip (A) 9.07 beta  Copyright (c) 1999-2009 Igor Pavlov  2009-08-29
Scanning

Creating archive C:\profiles\strong.7z

Compressing  strong.bin

Everything is Ok

 

PAQ. This example uses subprocess to launch a more powerful compressor, PAQ. A PAQ executable is available in downloadable archives on the Internet. Here I use a PAQ8 implementation. The same command compresses, and expands, a file.

PAQ Compression

 

Raw: Please notice how the raw string syntax is used for paths—this should be used for Windows-style paths.

Python program that calls PAQ compressor

import subprocess

# This program handles compressed (fp8) files and non-compressed ones.
# ... It decompresses or compresses.
exe = r"C:\fp8_v2.exe"
source = r"C:\profiles\file.bin"

subprocess.call(exe + " " + source)

Output

Creating archive C:\profiles\file.bin.fp8 with 1 file(s)...

File list (18 bytes)
Compressed from 18 to 22 bytes.

1/1  Filename: C:/profiles/file.bin (3836648 bytes)
Block segmentation:
 0           | default   |     28016 bytes [0 - 28015]
 1           | jpeg      |      8917 bytes [28016 - 36932]
 2           | default   |   3799715 bytes [36933 - 3836647]
Compressed from 3836648 to 114930 bytes.

Total 3836648 bytes compressed to 114958 bytes.
Time 44.26 sec, used 180892539 bytes of memory

Close this window or press ENTER to continue...

 

Summary. Data compression becomes increasingly important. As time passes, the quantity of data increases. And with Python and the gzip module, we have an easy way to use a powerful, compatible algorithm.

 

Thus: Compression collapses space. And it often reduces the time required to process data.

gzip: Python.org