TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

C# Regex Performance

This C# article tests the performance of the Regex type. It uses RegexOptions.Compiled.

Regex performance is important.

It can be improved by using Regex as a field on classes. Another option is to use RegexOptions.Compiled. Avoiding static Regex calls also helps. There are many ways to optimize Regex calls.

Benchmark results

Static Regex method:     6895 ms
Instance Regex object:   6583 ms
Instance compiled Regex: 5679 ms [fastest]

Example. First we use the static Regex.Split method in System.Text.RegularExpressions. For the next three examples, we use Split, but other methods such as Matches, Match, and Replace have similar characteristics.

Here: This code uses the static Regex.Split method. Static methods are slower when storing state would save CPU cycles.

Regex.SplitStatic Method

And: It shows a simple Regex that Splits the input string into separate words. The \W+ means one or more non-word characters.

C# program that uses Regex.Split

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string s = "This is a simple /string/ for Regex.";
	string[] c = Regex.Split(s, @"\W+");
	foreach (string m in c)
	{
	    Console.WriteLine(m);
	}

    }
}

Output

This
is
a
simple
string
for
Regex

Example 2. Here we see faster approach than the above example. This example creates an expression with new Regex. It works the same, but has better performance. It stores the Regex as a method-level instance.

C# program that uses instance Regex

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string s = "This is a simple /string/ for Regex.";
	Regex r = new Regex(@"\W+");
	string[] c = r.Split(s);
	foreach (string m in c)
	{
	    Console.WriteLine(m);
	}
    }
}

Output

This
is
a
simple
string
for
Regex

Example 3. Next, we use a compiled regular expression, and store it at the class level. We see two new approaches here. The Regex is stored as a static field, meaning it can be reused throughout the application without recreating it.

RegexOptions.Compiled

C# program that uses static compiled Regex

using System;
using System.Text.RegularExpressions;

class Program
{
    static Regex _wordRegex = new Regex(@"\W+", RegexOptions.Compiled);

    static void Main()
    {
	string s = "This is a simple /string/ for Regex.";
	string[] c = _wordRegex.Split(s);
	foreach (string m in c)
	{
	    Console.WriteLine(m);
	}
    }
}

Output

This
is
a
simple
string
for
Regex

Benchmark. We check the performance characteristics of the regular expressions. The three Regex method calls above are compared here in one million iterations on the same method-level objects in the three examples.

Note: You can see the figures from the experiment above. The benchmark code is not available.

Discussion. Let's review some of the other work done by experts in the C# language and MSDN's resources. Microsoft's David Gutierrez states that there are three major options for regular expression performance.

The first option. First is interpreted regular expressions. The runtime parses the Regex into opcodes and then uses the interpreter. Creation time is low, and runtime performance is low.

Second is compiled. Here you use RegexOptions.Compiled. Takes 10x longer to startup, but yields 30% better runtime. Don't use for dynamically-generated Regexes. Creation time is highest, and runtime performance is high.

Finally: We see precompiled (Regex.CompileToAssembly). This is harder to set up. Creation time is low, and runtime performance is high.

BCL Team Blog

MSDN. We look at MSDN, which has little documentation here. It warns not to use RegexOptions.Compiled when also using CompileToAssembly. This means you can't combine compiled and precompiled code.

RegexOptions: MSDN

Summary. We optimized Regex.Split regular expressions. We encountered a situation where runtime performance can be enhanced by sacrificing startup time. There are many performance options for the Regex type.

Therefore: Using an instance method that is not compiled is best for most situations. It doesn't cost much during program startup.


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf