TheDeveloperBlog.com


C# GZIP File Test

GZIP file test. A GZIP file has certain header bytes. We detect GZIP files by testing these bytes in the C# programming language. Files have two header bytes set. We evaluate other options for detecting GZIP.


Intro. We reference the "RFC 1952 GZIP File Format Specification Version 4.3". This document contains useful information about the file structure of GZIP files. It states that the first two bytes contain fixed values.

GZIP specification

ID1 (IDentification 1)
ID2 (IDentification 2)

ID1 = 31  (0x1f, \037)
ID2 = 139 (0x8b, \213)

Description of quote from the RFC. The above text indicates that the first two bytes of GZIP files are 31 and 139. We can therefore test those two bytes in C# for a valid GZIP file.

GZIP file format specification

Example. In this example, I use the following console program written in the C# language to test a file called "2About.gz", which contains GZIP data produced by 7-Zip. The Main method demonstrates how the IsGZipHeader method works.

Note: The first part of Main reads in a GZIP file, "2About.gz". The result value from IsGZipHeader on this file is true.

C# program that tests gzip files

using System;
using System.IO;

class Program
{
    static void Main()
    {
	byte[] gzip = File.ReadAllBytes("2About.gz");
	Console.WriteLine(GZipTool.IsGZipHeader(gzip)); // True

	byte[] text = File.ReadAllBytes("Program.cs");
	Console.WriteLine(GZipTool.IsGZipHeader(text)); // False
    }
}

/// <summary>
/// GZIP utility methods.
/// </summary>
public static class GZipTool
{
    /// <summary>
    /// Checks the first two bytes in a GZIP file, which must be 31 and 139.
    /// </summary>
    public static bool IsGZipHeader(byte[] arr)
    {
	return arr.Length >= 2 &&
	    arr[0] == 31 &&
	    arr[1] == 139;
    }
}

Output

True
False

GZipTool: This class is a public static class, which means you can access it from other namespaces and files. It contains IsGZipHeader.

IsGZipHeader: This method returns true if the byte data passed to it contains the values 31 and 139 in its first two bytes.


Discussion. There is an alternative way to check an entire file to see if it is a GZIP file and its data is valid. You can try to decompress it, and catch errors with a try-catch block. By trying to decompress the data, we also detect other errors.

However: This is much slower than testing two bytes. Performance is an important consideration in some programs.

Note on file extensions. In many systems, file extensions can be changed and are not reliable. I would not want a system to fail when a critical file was renamed to ".gzip" from ".gz".


Summary. We tested the first two bytes in a file to detect the signature bytes for GZIP. It is reliable and tested. It is much faster and more suitable for some situations than trying to blindly decompress the file.