TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

C# GZIP File Test

This C# article shows how to test files to see if they are GZIP files. It ignores the extension.

GZIP file test. A GZIP file has certain header bytes.

We detect GZIP files by testing these bytes in the C# programming language. Files have two header bytes set. We evaluate other options for detecting GZIP.

Intro. We reference the "RFC 1952 GZIP File Format Specification Version 4.3". This document contains useful information about the file structure of GZIP files. It states that the first two bytes contain fixed values.

GZIP specification

ID1 (IDentification 1)
ID2 (IDentification 2)

ID1 = 31  (0x1f, \037)
ID2 = 139 (0x8b, \213)

Description of quote from the RFC. The above text indicates that the first two bytes of GZIP files are 31 and 139. We can therefore test those two bytes in C# for a valid GZIP file.

GZIP file format specification

Example. In this example, I use the following console program written in the C# language to test a file called "2About.gz", which contains GZIP data produced by 7-Zip. The Main method demonstrates how the IsGZipHeader method works.

Note: The first part of Main reads in a GZIP file, "2About.gz". The result value from IsGZipHeader on this file is true.

C# program that tests gzip files

using System;
using System.IO;

class Program
{
    static void Main()
    {
	byte[] gzip = File.ReadAllBytes("2About.gz");
	Console.WriteLine(GZipTool.IsGZipHeader(gzip)); // True

	byte[] text = File.ReadAllBytes("Program.cs");
	Console.WriteLine(GZipTool.IsGZipHeader(text)); // False
    }
}

/// <summary>
/// GZIP utility methods.
/// </summary>
public static class GZipTool
{
    /// <summary>
    /// Checks the first two bytes in a GZIP file, which must be 31 and 139.
    /// </summary>
    public static bool IsGZipHeader(byte[] arr)
    {
	return arr.Length >= 2 &&
	    arr[0] == 31 &&
	    arr[1] == 139;
    }
}

Output

True
False

GZipTool: This class is a public static class, which means you can access it from other namespaces and files. It contains IsGZipHeader.

IsGZipHeader: This method returns true if the byte data passed to it contains the values 31 and 139 in its first two bytes.

Discussion. There is an alternative way to check an entire file to see if it is a GZIP file and its data is valid. You can try to decompress it, and catch errors with a try-catch block. By trying to decompress the data, we also detect other errors.

However: This is much slower than testing two bytes. Performance is an important consideration in some programs.

Note on file extensions. In many systems, file extensions can be changed and are not reliable. I would not want a system to fail when a critical file was renamed to ".gzip" from ".gz".

Summary. We tested the first two bytes in a file to detect the signature bytes for GZIP. It is reliable and tested. It is much faster and more suitable for some situations than trying to blindly decompress the file.


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf