TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to C-SHARP

C# Remove Duplicate Chars

Remove duplicate characters in strings. Use a char array for performance optimization.
Duplicate chars. Duplicate chars occur in strings. We can remove them. The string may have 2 A chars—we want it to have only one. There are many implementations of this logic.
Some implementations (like those that allocate fewer strings) are faster. But this often depends on the data as well. Typically, using a char array is the best approach.
An example. The essential logic in removing duplicate characters is to track all the chars that have been encountered and avoid adding further characters that have been encountered.

Tip: The method shown here is not focused on performance, but rather being reliable and simple to understand and debug.

And: After this, we examine performance issues, but many programs will not need in-depth performance tuning here.

C# program that removes duplicate chars using System; class Program { static void Main() { string value1 = RemoveDuplicateChars("The Dev Codes"); string value2 = RemoveDuplicateChars("Google"); string value3 = RemoveDuplicateChars("Yahoo"); string value4 = RemoveDuplicateChars(".NET Framework 3.5 SP1"); string value5 = RemoveDuplicateChars("Line1\nLine2\nLine3"); Console.WriteLine(value1); Console.WriteLine(value2); Console.WriteLine(value3); Console.WriteLine(value4); Console.WriteLine(value5); } static string RemoveDuplicateChars(string key) { // Remove duplicate chars using string concats. // ... Store encountered letters in this string. string result = ""; foreach (char value in key) { // See if character is in the result already. if (result.IndexOf(value) == -1) { // Append to the result. result += value; } } return result; } } Output Dot NePrls Gogle Yaho .NET Framewok35SP1 Line1 23
Some notes, RemoveDuplicateChars. We append each char to the "result" string. If the char already exists in the result, it is a duplicate—we do not need to append it again.

IndexOf: The IndexOf method in the RemoveDuplicateChars method is notable here. If the specific character is not found, it returns -1.

IndexOf

Note: Thanks to Nicolas Siatras for helping improve the string-based method here. Only 1 string is needed in the method.

Fast method. Sometimes this logic must be optimized for performance. This next method avoids lots of string copying. It uses character array buffers.

And: It fills these arrays as it goes along, instead of appending to strings. It avoids many string copies this way.

Benchmark. We test performance on a string that is 16 characters long. Both methods will degrade with longer strings, particularly those with more duplicated characters.

Note: In some cases, you can use lookup tables (or even Dictionary) to optimize.

Dictionary

So: You could only scan the result string instead of using a separate table. This would be faster.

Result: The version that uses char arrays (RemoveDuplicateCharsFast) is many times faster than the string version.

C# program that benchmarks duplicate char methods using System; using System.Diagnostics; class Program { const int _max = 1000000; static void Main() { var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string value = RemoveDuplicateChars("datagridviewtips"); if (value == null) { return; } } s1.Stop(); var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { string value = RemoveDuplicateCharsFast("datagridviewtips"); if (value == null) { return; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.Read(); } static string RemoveDuplicateChars(string key) { // See comments in first example. string result = ""; foreach (char value in key) { if (result.IndexOf(value) == -1) { result += value; } } return result; } static string RemoveDuplicateCharsFast(string key) { // Remove duplicate chars using char arrays. int keyLength = key.Length; // Store encountered letters in this array. char[] table = new char[keyLength]; int tableLength = 0; // Store the result in this array. char[] result = new char[keyLength]; int resultLength = 0; foreach (char value in key) { // Scan the table to see if the letter is in it. bool exists = false; for (int i = 0; i < tableLength; i++) { if (value == table[i]) { exists = true; break; } } // If the letter is new, add to the table and the result. if (!exists) { table[tableLength] = value; tableLength++; result[resultLength] = value; resultLength++; } } // Return the string at this range. return new string(result, 0, resultLength); } } Output 529.37 ns RemoveDuplicateChars 143.59 ns RemoveDuplicateCharsFast (char[] arrays)
A summary. For applications where performance is not critical, testing a string as we go along is adequate—and easy to understand. But the char array approach is often faster.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf