TheDeveloperBlog.com


C# Regex File Tutorial

Regex file. A file can be parsed with Regex. The Regex can process each line to find all matching parts. This is useful for log files or output from other programs. Here is a tutorial on processing a file with regular expressions in the C# language.


Example. First, to use a regular expression on a file you must first read in the file into a string in the C# language. Here's a console program that opens a StreamReader on the file and reads in each line.

StreamReader

Note: The ReadLine() method will return each line separately, or null if there are no more data.

ReadLine Reads File Into List
C# program that uses StreamReader

using System.IO;

class Program
{
    static void Main()
    {
	// 1.
	// Open file for reading.
	using (StreamReader r = new StreamReader("ex081016.log"))
	{
	    // 2.
	    // Read each line until EOF.
	    string line;
	    while ((line = r.ReadLine()) != null)
	    {
		// 3.
		// Do stuff with line.
	    }
	}
    }
}


Example 2. Here we create the regular expression object. My research shows that using a single regular expression and reusing can be around 30% faster than the Regex.Match static method. This makes it worthwhile to use a single Regex.

Regex.Match Examples: Regular ExpressionsStatic Method
C# program that declares regular expression

using System.IO;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	// A.
	Regex g = new Regex(@"\s/Content/([a-zA-Z0-9\-]+?)\.aspx");
	// "\s/Content/"         : space and then Content directory
	// "([a-zA-Z0-9\-]+?)    : group of alphanumeric characters and hyphen
	// ?                     : don't be greedy, match lazily.
	// \.aspx                : file extension required for match

	// B.
	using (StreamReader r = new StreamReader("ex081016.log"))
	{
	    string line;
	    while ((line = r.ReadLine()) != null)
	    {
	    }
	}
    }
}

This example creates a Regex in part A. The Regex here is complicated but the comment tries to explain its parts. In part B, it has the same IO code. The file handling code is the same here as before.


Example 3. Here we put the regular expression logic into the StreamReader code to parse an entire file. We will use the Regex we created and use it to match each line. We only look for one Match here, but you can use Matches to do more than one.

Note: Parts X and Y above were added. X applies the Regex to each line and captures the groups. Y gets the value from the Groups.

Tip: Groups is indexed starting at 1. Never access Groups[0], which can result in lots of grief as your algorithm will not work.

C# program that matches lines

using System;
using System.IO;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	Regex g = new Regex(@"\s/Content/([a-zA-Z0-9\-]+?)\.aspx");
	using (StreamReader r = new StreamReader("ex081016.log"))
	{
	    string line;
	    while ((line = r.ReadLine()) != null)
	    {
		// X.
		// Try to match each line against the Regex.
		Match m = g.Match(line);
		if (m.Success)
		{
		    // Y.
		    // Write original line and the value.
		    string v = m.Groups[1].Value;
		    Console.WriteLine(line);
		    Console.WriteLine("\t" + v);
		}
	    }
	}
    }
}


Discussion. Here we look at some example output and the matched part is highlighted. The first part of the text is a single line. The regular expression captured the text between "Content/" and ".aspx". This is what it was supposed to do.

Example output

2008-10-16 23:59:50 W3SVC2915713 GET /Content/Trim-String-Regex.aspx - 80 66.249
.70.241 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot)
 - 200 3753 309

Further usages. There are more usages of this kind of code in programs. We can match lines in files such as logs, trace files, scientific calculations, CSV files, or any text file.

Tip: Processing each line separately may be faster because less memory must be accessed and fewer characters must be checked.


Summary. We used a regular expression on every line in a text file. I showed an accurate and simple way of matching every line in a text file. The code processes each line in the text file, looking for matches.

Review: We combined the StreamReader class with the Regex class in the base class library to parse large text files.