C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Input: The input string contains the numbers 10, 20, 40 and 1, and the static Regex.Split method is called with two parameters.
Pattern: The string @"\D+" is a verbatim string literal that matches non-digit chars. An escaped uppercase letter like \D means NOT.
Regex.Split NumbersStaticC# program that uses Regex.Split
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
//
// String containing numbers.
//
string sentence = "10 cats, 20 dogs, 40 fish and 1 programmer.";
//
// Get all digit sequence as strings.
//
string[] digits = Regex.Split(sentence, @"\D+");
//
// Now we have each number string.
//
foreach (string value in digits)
{
//
// Parse the value to get the number.
//
int number;
if (int.TryParse(value, out number))
{
Console.WriteLine(value);
}
}
}
}
Output
10
20
40
1
Note: The example gets all operands and operators from an equation string. An operand is a character like * that acts on operands.
Tokens: With Regex, we implement a simple tokenizer. Lexical analysis and tokenization is done in many programs.
Warning: This may be an effective way to parse computer languages or program output, but it is not the fastest way.
C# program that tokenizes
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
//
// The equation.
//
string operation = "3 * 5 = 15";
//
// Split it on whitespace sequences.
//
string[] operands = Regex.Split(operation, @"\s+");
//
// Now we have each token.
//
foreach (string operand in operands)
{
Console.WriteLine(operand);
}
}
}
Output
3
*
5
=
15
Tip: It is often useful to combine regular expressions and manual looping and string operations.
C# program that collects uppercase words
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
//
// String containing uppercased words.
//
string sentence = "Bob and Michelle are from Indiana.";
//
// Get all words.
//
string[] uppercaseWords = Regex.Split(sentence, @"\W");
//
// Get all uppercased words.
//
var list = new List<string>();
foreach (string value in uppercaseWords)
{
//
// Check the word.
//
if (!string.IsNullOrEmpty(value) &&
char.IsUpper(value[0]))
{
list.Add(value);
}
}
//
// Write all proper nouns.
//
foreach (var value in list)
{
Console.WriteLine(value);
}
}
}
Output
Bob
Michelle
Indiana
Version 1: The SplitWordsOptimized method is called to lowercase the string and split apart its words.
Version 2: We use ToLower and Regex.Split to lowercase and split apart the string input.
Result: The SplitWordsOptimized method is several times faster—it avoids the regular expression engine entirely.
Warning: Make sure you verify SplitWordsOptimized works correctly in your program before using it.
C# program that benchmarks SplitWordsOptimized
using System;
using System.Diagnostics;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
static string[] SplitWordsOptimized(string value, bool lowercase)
{
// Count words.
int count = 0;
bool onWord = false;
for (int i = 0; i < value.Length; i++)
{
// If we are on the first char of a word, increase word count.
bool wordChar = char.IsLetterOrDigit(value[i]);
if (wordChar && !onWord)
{
onWord = true;
// Add to word.
count++;
}
// If not on word char, set bool to false.
if (!wordChar)
{
onWord = false;
}
}
// Allocate array.
string[] words = new string[count];
// Add words to array.
onWord = false;
var builder = new StringBuilder();
int wordIndex = 0;
for (int i = 0; i < value.Length; i++)
{
bool wordChar = char.IsLetterOrDigit(value[i]);
// Append to current word.
if (wordChar)
{
onWord = true;
if (lowercase)
{
builder.Append(char.ToLower(value[i]));
}
else
{
builder.Append(value[i]);
}
}
// If not on word char, set bool to false.
if ((onWord && !wordChar) ||
i == value.Length - 1)
{
onWord = false;
// Store the word, and clear the buffer.
words[wordIndex++] = builder.ToString();
builder.Clear();
}
}
return words;
}
const int _max = 1000000;
static void Main()
{
string data = "Hello, there, my-friend";
// Version 1: use SplitWordsOptimized.
var s1 = Stopwatch.StartNew();
for (int i = 0; i < _max; i++)
{
if (SplitWordsOptimized(data, true).Length != 4)
{
return;
}
}
s1.Stop();
// Version 2: use Regex.Split.
var s2 = Stopwatch.StartNew();
for (int i = 0; i < _max; i++)
{
if (Regex.Split(data.ToLower(), @"\W+").Length != 4)
{
return;
}
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) /
_max).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) /
_max).ToString("0.00 ns"));
}
}
Output
803.95 ns SplitWordsOptimized
2282.68 ns Regex.Split
Also: You can change the Regex.Split method call into an instance Regex. This enhances performance and reduces memory pressure.
Further: You can use the RegexOptions.Compiled enumerated constant for greater performance.
RegexOptions.Compiled