C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
It can be improved by using Regex as a field on classes. Another option is to use RegexOptions.Compiled. Avoiding static Regex calls also helps. There are many ways to optimize Regex calls.
Benchmark results Static Regex method: 6895 ms Instance Regex object: 6583 ms Instance compiled Regex: 5679 ms [fastest]
Example. First we use the static Regex.Split method in System.Text.RegularExpressions. For the next three examples, we use Split, but other methods such as Matches, Match, and Replace have similar characteristics.
Here: This code uses the static Regex.Split method. Static methods are slower when storing state would save CPU cycles.
And: It shows a simple Regex that Splits the input string into separate words. The \W+ means one or more non-word characters.
C# program that uses Regex.Split using System; using System.Text.RegularExpressions; class Program { static void Main() { string s = "This is a simple /string/ for Regex."; string[] c = Regex.Split(s, @"\W+"); foreach (string m in c) { Console.WriteLine(m); } } } Output This is a simple string for Regex
Example 2. Here we see faster approach than the above example. This example creates an expression with new Regex. It works the same, but has better performance. It stores the Regex as a method-level instance.
C# program that uses instance Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { string s = "This is a simple /string/ for Regex."; Regex r = new Regex(@"\W+"); string[] c = r.Split(s); foreach (string m in c) { Console.WriteLine(m); } } } Output This is a simple string for Regex
Example 3. Next, we use a compiled regular expression, and store it at the class level. We see two new approaches here. The Regex is stored as a static field, meaning it can be reused throughout the application without recreating it.
C# program that uses static compiled Regex using System; using System.Text.RegularExpressions; class Program { static Regex _wordRegex = new Regex(@"\W+", RegexOptions.Compiled); static void Main() { string s = "This is a simple /string/ for Regex."; string[] c = _wordRegex.Split(s); foreach (string m in c) { Console.WriteLine(m); } } } Output This is a simple string for Regex
Benchmark. We check the performance characteristics of the regular expressions. The three Regex method calls above are compared here in one million iterations on the same method-level objects in the three examples.
Note: You can see the figures from the experiment above. The benchmark code is not available.
Discussion. Let's review some of the other work done by experts in the C# language and MSDN's resources. Microsoft's David Gutierrez states that there are three major options for regular expression performance.
The first option. First is interpreted regular expressions. The runtime parses the Regex into opcodes and then uses the interpreter. Creation time is low, and runtime performance is low.
Second is compiled. Here you use RegexOptions.Compiled. Takes 10x longer to startup, but yields 30% better runtime. Don't use for dynamically-generated Regexes. Creation time is highest, and runtime performance is high.
Finally: We see precompiled (Regex.CompileToAssembly). This is harder to set up. Creation time is low, and runtime performance is high.
MSDN. We look at MSDN, which has little documentation here. It warns not to use RegexOptions.Compiled when also using CompileToAssembly. This means you can't combine compiled and precompiled code.
Summary. We optimized Regex.Split regular expressions. We encountered a situation where runtime performance can be enhanced by sacrificing startup time. There are many performance options for the Regex type.
Therefore: Using an instance method that is not compiled is best for most situations. It doesn't cost much during program startup.