TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to C-SHARP

C# Split String Examples

Use the string.Split method. Call Split with arguments to separate on newlines, spaces and words.
Split. Bamboo grows in sections. Each part is connected, but also separate. In a sense the stem is an array of segments. The forest here is dense.
In a string too we often find parts. These are separated with a delimiter. We can split lines and words from a string based on chars, strings or newlines.
First example. We examine the simplest Split method. It receives a char array (one that uses the params keyword) but we can specify this with a single char argument.

Part 1: We assign the data string to a value containing 3 spaces—it has 4 words separated by spaces.

Part 2: This program splits on a single character. The result value from Split is a string array—it contains 4 elements.

Char

Part 3: The foreach-loop iterates over the array and displays each word. The string array can be used in the same way as any other.

Foreach
C# program that splits on spaces using System; class Program { static void Main() { // Part 1: the input string. string data = "there is a cat"; // Part 2: split string on spaces (this will separate all the words). string[] words = data.Split(' '); // Part 3: loop over result array. foreach (string word in words) { Console.WriteLine("WORD: " + word); } } } Output WORD: there WORD: is WORD: a WORD: cat
Multiple characters. Next we use Regex.Split to separate a string based on multiple characters. Regex.Split can also handle metacharacters, not just values we specify directly.

Argument 1: The first argument to Regex.Split is the string we wish to split. Regex.Split is a static method.

Argument 2: The second argument is the delimiter sequence. Here we split on a newline sequence.

C# program that splits on lines with Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { string value = "cat\r\ndog\r\nanimal\r\nperson"; // Split the string on line breaks. // ... The return value from Split is a string array. string[] lines = Regex.Split(value, "\r\n"); foreach (string line in lines) { Console.WriteLine(line); } } } Output cat dog animal person
RemoveEmptyEntries. Sometimes Split() can return an array with empty strings in it—this can be unwanted. This can happen when 2 delimiters are adjacent.

StringSplitOptions: This is an enum. It does not need to be allocated with a constructor—it is more like a special int value.

Enum

Argument 1: Here we pass arrays for the first argument to string Split(). A char array, and string array, are used.

Argument 2: We use RemoveEntryEmpties as the second parameter to avoid empty results. They are not added to the array.

C# program that splits on multiple characters using System; class Program { static void Main() { // ... Parts are separated by Windows line breaks. string value = "shirt\r\ndress\r\npants\r\njacket"; // Use a char array of 2 characters (\r and \n). // ... Break lines into separate strings. // ... Use RemoveEmptyEntries so empty strings are not added. char[] delimiters = new char[] { '\r', '\n' }; string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries); Console.WriteLine(":::SPLIT, CHAR ARRAY:::"); for (int i = 0; i < parts.Length; i++) { Console.WriteLine(parts[i]); } // ... Same but uses a string of 2 characters. string[] partsFromString = value.Split(new string[] { "\r\n" }, StringSplitOptions.None); Console.WriteLine(":::SPLIT, STRING:::"); for (int i = 0; i < parts.Length; i++) { Console.WriteLine(parts[i]); } } } Output :::SPLIT, CHAR ARRAY::: shirt dress pants jacket :::SPLIT, STRING::: shirt dress pants jacket
Separate words. We can separate words with Split. Often the best way to separate words in a C# string is to use a Regex that acts upon non-word chars.Regex.Split

Here: This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.

Tip: Regex provides more power and control than the string Split methods. But the code is harder to read.

Argument 1: The first argument to Regex.Split is the string we are trying to split apart.

Argument 2: This is a Regex pattern. We can specify any character set (or range) with Regex.Split.

C# program that separates on non-word pattern using System; using System.Text.RegularExpressions; class Program { static void Main() { const string sentence = "Hello, my friend"; // Split on all non-word characters. // ... This returns an array of all the words. string[] words = Regex.Split(sentence, @"\W+"); foreach (string value in words) { Console.WriteLine("WORD: " + value); } } } Output WORD: Hello WORD: my WORD: friend Regex description: @ Special verbatim string syntax. \W+ One or more non-word characters together.
Text files. Here we have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines to read lines, but StreamReader can be used instead.StreamReader

Then: It displays the values of each line after the line number. The output shows how the file was parsed into the strings.

C# program that splits lines in file using System; using System.IO; class Program { static void Main() { int i = 0; foreach (string line in File.ReadAllLines("TextFile1.txt")) { string[] parts = line.Split(','); foreach (string part in parts) { Console.WriteLine("{0}:{1}", i, part); } i++; // For demonstration. } } } Contents of input file: TextFile1.txt Dog,Cat,Mouse,Fish,Cow,Horse,Hyena Programmer,Wizard,CEO,Rancher,Clerk,Farmer Output 0:Dog 0:Cat 0:Mouse 0:Fish 0:Cow 0:Horse 0:Hyena 1:Programmer 1:Wizard 1:CEO 1:Rancher 1:Clerk 1:Farmer
Directory paths. We can split the segments in a Windows local directory into separate strings. Please note that directory paths are complex. This code may not correctly handle all cases.

Tip: We could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.

Path
C# program that splits Windows directories using System; class Program { static void Main() { // The directory from Windows. const string dir = @"C:\Users\Sam\Documents\Perls\Main"; // Split on directory separator. string[] parts = dir.Split('\\'); foreach (string part in parts) { Console.WriteLine(part); } } } Output C: Users Sam Documents Perls Main
StringSplitOptions. This affects the behavior of Split. The two values of StringSplitOptions (None and RemoveEmptyEntries) are integers (enums) that tell Split how to work.

Note: In this example, the input string contains five commas. These commas are the delimiters.

And: Two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.

First call: In the first call to Split, these fields are put into the result array. These elements equal string.Empty.

Second call: We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.

C# program that uses StringSplitOptions using System; class Program { static void Main() { // Input string contain separators. string value1 = "man,woman,child,,,bird"; char[] delimiter1 = new char[] { ',' }; // <-- Split on these // ... Use StringSplitOptions.None. string[] array1 = value1.Split(delimiter1, StringSplitOptions.None); foreach (string entry in array1) { Console.WriteLine(entry); } // ... Use StringSplitOptions.RemoveEmptyEntries. string[] array2 = value1.Split(delimiter1, StringSplitOptions.RemoveEmptyEntries); Console.WriteLine(); foreach (string entry in array2) { Console.WriteLine(entry); } } } Output man woman child bird man woman child bird
Benchmark, Split. Here we test strings with 40 and 1200 chars. Speed varied on the contents of strings. The length of blocks, number of delimiters, and total size factor into performance.

Version 1: This code uses Regex.Split to separate the strings apart. It is tested on both a long string and a short string.

Version 2: Uses the string.Split method, but with the first argument being a char array. Two chars are in the char array.

Version 3: Uses string.Split as well, but with a string array argument. The 3 versions are compared.

Result: Splitting with a char array is the fastest for both short and long strings. Regex.Split is slowest (but has more features).

C# program that tests string.Split performance using System; using System.Diagnostics; using System.Text.RegularExpressions; class Program { const int _max = 100000; static void Main() { // Get long string. string value1 = string.Empty; for (int i = 0; i < 120; i++) { value1 += "01234567\r\n"; } // Get short string. string value2 = string.Empty; for (int i = 0; i < 10; i++) { value2 += "ab\r\n"; } // Put strings in array. string[] tests = { value1, value2 }; foreach (string test in tests) { Console.WriteLine("Testing length: " + test.Length); // Version 1: use Regex.Split. var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string[] result = Regex.Split(test, "\r\n", RegexOptions.Compiled); if (result.Length == 0) { return; } } s1.Stop(); // Version 2: use char array split. var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string[] result = test.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); if (result.Length == 0) { return; } } s2.Stop(); // Version 3: use string array split. var s3 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string[] result = test.Split(new string[] { "\r\n" }, StringSplitOptions.None); if (result.Length == 0) { return; } } s3.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s3.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } } } Output Testing length: 1200 21442.64 ns Regex.Split 5562.63 ns Split char[] 6556.60 ns Split string[] Testing length: 40 2236.22 ns Regex.Split 371.55 ns Split char[] 423.46 ns Split string[]
Benchmark, array argument. Here we examine delimiter performance. It is worthwhile to declare, and allocate, the char array argument as a local variable.

Version 1: This code creates a new char array with 2 elements on each Split call. These must all be garbage-collected.

Version 2: This version uses a single char array, created before the loop. It reuses the cached char array each time.

Result: By caching a char array (or string array), we can improve split call performance by a small amount.

C# program that tests Split, cached char array using System; using System.Diagnostics; class Program { const int _max = 10000000; static void Main() { string value = "a b,c"; char[] delimiterArray = new char[] { ' ', ',' }; // Version 1: split with a new char array on each call. var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string[] result = value.Split(new char[] { ' ', ',' }); if (result.Length == 0) { return; } } s1.Stop(); // Version 2: split using a cached char array on each call. var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string[] result = value.Split(delimiterArray); if (result.Length == 0) { return; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } } Output 87.61 ns Split, new char[] 84.34 ns Split, existing char[]
Arrays. The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data.Char Array

And: A string array can also be passed to the Split method. The new string array is created inline with the Split call.

Array
Internals. What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. Methods call into an overload with three parameters.

Next: The parameters are checked for validity. It uses unsafe code to create a separator list, and a for-loop combined with Substring.

For
Join. With this method, we can combine separate strings with a separating delimiter. Join() can be used to round-trip data. It is the opposite of split.Join
Replace. Split does not handle escaped characters. We can instead use Replace on a string input to substitute special characters for any escaped characters.Replace
IndexOf, Substring. Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective.IndexOfSubstring
StringReader. This class can separate a string into lines. It can lead to performance improvements over using Split. The code required is often more complex.StringReader
A summary. With Split, we separate strings. We solve problems. Split divides (separates) strings. And it keeps code as simple as possible.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf