C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Delimiters include "\r\n" newline sequences and the comma and tab characters.
A string method, Split() separates at string and character delimiters. Even if we want just one part from a string, Split is useful. It returns a string array.
To begin, we examine the simplest Split method. We call Split on a string instance. This program splits on a single character. The array returned has four elements.
Here: The input string, which contains four words, is split on spaces. The result value from Split is a string array.
Foreach: The foreach-loop loops over this array and displays each word. The string array can be used as any other.
Based on: .NET 4.5 C# program that splits on spaces using System; class Program { static void Main() { string s = "there is a cat"; // Split string on spaces. // ... This will separate all the words. string[] words = s.Split(' '); foreach (string word in words) { Console.WriteLine(word); } } } Output there is a cat
Multiple characters. Next we use Regex.Split to separate based on multiple characters. There is an overloaded method if you need StringSplitOptions. This removes empty strings.
C# program that splits on lines with Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { string value = "cat\r\ndog\r\nanimal\r\nperson"; // Split the string on line breaks. // ... The return value from Split is a string array. string[] lines = Regex.Split(value, "\r\n"); foreach (string line in lines) { Console.WriteLine(line); } } } Output cat dog animal person
RemoveEmptyEntries. Regex methods are used to effectively Split strings. But string Split is often faster. This example specifies an array as the first argument to Split().
StringSplitOptions: This is an enum. It does not need to be allocated with a constructor—it is more like a special int value.
C# program that splits on multiple characters using System; class Program { static void Main() { // This string is also separated by Windows line breaks. string value = "shirt\r\ndress\r\npants\r\njacket"; // Use a new char array of two characters (\r and \n). // ... Breaks lines into separate strings. // ... Use RemoveEntryEntries to make sure not empty strings are added. char[] delimiters = new char[] { '\r', '\n' }; string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries); for (int i = 0; i < parts.Length; i++) { Console.WriteLine(parts[i]); } // Same as the previous example, but uses a string of 2 characters. parts = value.Split(new string[] { "\r\n" }, StringSplitOptions.None); for (int i = 0; i < parts.Length; i++) { Console.WriteLine(parts[i]); } } } Output (Repeated two times) shirt dress pants jacket
Char arrays. The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data.
Using string arrays. A string array can also be passed to the Split method. The new string array is created inline with the Split call.
RemoveEmptyEntries notes. For StringSplitOptions, we specify RemoveEmptyEntries. When two delimiters are adjacent, we can end up with an empty result.
So: We use RemoveEntryEmpties as the second parameter to avoid empty results. Here is the Visual Studio debugger.
Separate words. You can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars.
Here: This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.
Note: Here we show how to separate parts of a string based on any character set or range with Regex.Split.
Warning: Regex provides more power and control than the string Split methods. But the code is harder to read.
C# program that separates on non-word pattern using System; using System.Text.RegularExpressions; class Program { static void Main() { string[] w = SplitWords("That is a cute cat, man"); foreach (string s in w) { Console.WriteLine(s); } Console.ReadLine(); } /// <summary> /// Take all the words in the input string and separate them. /// </summary> static string[] SplitWords(string s) { // // Split on all non-word characters. // ... Returns an array of all the words. // return Regex.Split(s, @"\W+"); // @ special verbatim string syntax // \W+ one or more non-word characters together } } Output That is a cute cat man
Text files. Here we have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines to read lines, but StreamReader can be used instead.
Then: It displays the values of each line after the line number. The output shows how the file was parsed into the strings.
Contents of input file: TextFile1.txt Dog,Cat,Mouse,Fish,Cow,Horse,Hyena Programmer,Wizard,CEO,Rancher,Clerk,Farmer C# program that splits lines in file using System; using System.IO; class Program { static void Main() { int i = 0; foreach (string line in File.ReadAllLines("TextFile1.txt")) { string[] parts = line.Split(','); foreach (string part in parts) { Console.WriteLine("{0}:{1}", i, part); } i++; // For demonstration. } } } Output 0:Dog 0:Cat 0:Mouse 0:Fish 0:Cow 0:Horse 0:Hyena 1:Programmer 1:Wizard 1:CEO 1:Rancher 1:Clerk 1:Farmer
Directory paths. We can split the segments in a Windows local directory into separate strings. Please note that directory paths are complex. This code may not correctly handle all cases.
Tip: We could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.
C# program that splits Windows directories using System; class Program { static void Main() { // The directory from Windows. const string dir = @"C:\Users\Sam\Documents\Perls\Main"; // Split on directory separator. string[] parts = dir.Split('\\'); foreach (string part in parts) { Console.WriteLine(part); } } } Output C: Users Sam Documents Perls Main
Internals. What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. Methods call into an overload with three parameters.
Next: The parameters are checked for validity. It uses unsafe code to create a separator list, and a for-loop combined with Substring.
Benchmarks. I tested two strings (with 40 and 1200 chars). Speed varied on the contents of strings. The length of blocks, number of delimiters, and total size factor into performance.
Note: The Regex.Split option generally performed the worst. String.Split was consistently faster.
And: I felt that the second or third methods would be best. Regex also causes performance problems elsewhere.
Strings used in test: C# // // Build long string. // _test = string.Empty; for (int i = 0; i < 120; i++) { _test += "01234567\r\n"; } // // Build short string. // _test = string.Empty; for (int i = 0; i < 10; i++) { _test += "ab\r\n"; } Methods tested: 100000 iterations static void Test1() { string[] arr = Regex.Split(_test, "\r\n", RegexOptions.Compiled); } static void Test2() { string[] arr = _test.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); } static void Test3() { string[] arr = _test.Split(new string[] { "\r\n" }, StringSplitOptions.None); }
Benchmark results. For 1200-char strings, the speed difference is reduced. For short strings, Regex is slowest. For long strings it is fast.
Short strings: For short, 40-char strings, the Regex method is by far the slowest. The compilation time may cause this.
And: Regex may also lack certain optimizations present with string.Split. Smaller is better.
Arrays: In programs that use shorter strings, the methods that split based on arrays are faster. This avoids Regex compilation.
But: For longer strings or files that contain more lines, Regex is appropriate.
Benchmark of Split on long strings [1] Regex.Split: 3470 ms [2] char[] Split: 1255 ms [fastest] [3] string[] Split: 1449 ms Benchmark of Split on short strings [1] Regex.Split: 434 ms [2] char[] Split: 63 ms [fastest] [3] string[] Split: 83 ms
Delimiter arrays. Here we examine delimiter performance. My research finds it is worthwhile to declare, and allocate, the char array argument as a local variable.
Note: Storing the array of delimiters outside the loop is faster. This version, shown second, is requires 10% less time.
Slow version, before: C# // // Split on multiple characters using new char[] inline. // string t = "string to split, ok"; for (int i = 0; i < 10000000; i++) { string[] s = t.Split(new char[] { ' ', ',' }); } Fast version, after: C# // // Split on multiple characters using new char[] already created. // string t = "string to split, ok"; char[] c = new char[]{ ' ', ',' }; // <-- Cache this for (int i = 0; i < 10000000; i++) { string[] s = t.Split(c); }
StringSplitOptions. This affects the behavior of Split. The two values of StringSplitOptions (None and RemoveEmptyEntries) are integers (enums) that tell Split how to work.
Note: In this example, the input string contains five commas. These commas are the delimiters.
And: Two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.
First call: In the first call to Split, these fields are put into the result array. These elements equal string.Empty.
Second call: We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.
C# that uses StringSplitOptions using System; class Program { static void Main() { // Input string contain separators. string value1 = "man,woman,child,,,bird"; char[] delimiter1 = new char[] { ',' }; // <-- Split on these // ... Use StringSplitOptions.None. string[] array1 = value1.Split(delimiter1, StringSplitOptions.None); foreach (string entry in array1) { Console.WriteLine(entry); } // ... Use StringSplitOptions.RemoveEmptyEntries. string[] array2 = value1.Split(delimiter1, StringSplitOptions.RemoveEmptyEntries); Console.WriteLine(); foreach (string entry in array2) { Console.WriteLine(entry); } } } Output man woman child bird man woman child bird
Join. With this method, we can combine separate strings with a separating delimiter. Join() can be used to round-trip data. It is the opposite of split.
Replace. Split does not handle escaped characters. We can instead use Replace on a string input to substitute special characters for any escaped characters.
IndexOf, Substring. Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective.
StringReader. This class can separate a string into lines. It can lead to performance improvements over using Split. The code required is often more complex.
A summary. With Split, we separate strings. We solve problems. Split divides (separates) strings. And it keeps code as simple as possible.