TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

C# HashSet Programs: Constructor, Overlaps

These C# examples use HashSet, an optimized set collection. They test Overlaps, SymmetricExceptWith and benchmark HashSet.

HashSet is an optimized set collection. It helps eliminates duplicate strings or elements in an array.

It provides a simple syntax for taking the union of elements in a set. This is performed in its constructor.

Constructor

Example. This program contains a source array that contains several duplicated strings. It eliminates duplicate strings in the array. The program calls the HashSet constructor to transform the array elements into a set data structure.

Note: This internally calls the UnionWith method to eliminate the duplications. ToArray transforms the HashSet into a new array.

ToArray

C# program that uses HashSet on duplicates

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main()
    {
	// Input array that contains three duplicate strings.
	string[] array1 = { "cat", "dog", "cat", "leopard", "tiger", "cat" };

	// Display the array.
	Console.WriteLine(string.Join(",", array1));

	// Use HashSet constructor to ensure unique strings.
	var hash = new HashSet<string>(array1);

	// Convert to array of strings again.
	string[] array2 = hash.ToArray();

	// Display the resulting array.
	Console.WriteLine(string.Join(",", array2));
    }
}

Output

cat,dog,cat,leopard,tiger,cat
cat,dog,leopard,tiger

In this example, the input array contains six strings (four unique). The string "cat" is repeated three times. This may not be desirable to some programs and algorithms. The HashSet constructor eliminates the non-unique elements.

String Array

Next, the HashSet constructor receives a single parameter, which must implement the IEnumerable<string> generic interface. The constructor takes the union of elements, which removes non-unique strings such as "cat".

Generic ClassString Literal

Also: The program displays string arrays onto the console or as single strings using the string.Join static method.

Static Method

Tip: Join receives the result of the ToArray extension method, which was invoked on the HashSet instance.

Join

Overlaps. Next, one method on the HashSet is Overlaps. This method returns true or false. It tests to see if any of the HashSet's elements are contained in the IEnumerable argument's elements. Only one equal element is required.

IEnumerable Type

Next: The element 3 is in the HashSet. This means Overlaps returns true for array2, but false for array3.

C# program that uses Overlaps

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
	int[] array1 = { 1, 2, 3 };
	int[] array2 = { 3, 4, 5 };
	int[] array3 = { 9, 10, 11 };

	HashSet<int> set = new HashSet<int>(array1);
	bool a = set.Overlaps(array2);
	bool b = set.Overlaps(array3);

	// Display results.
	Console.WriteLine(a);
	Console.WriteLine(b);
    }
}

Output

True
False

SymmetricExceptWith. HashSet has advanced set logic. Next we find out what SymmetricExceptWith does. This method changes HashSet so that it contains only the elements in one or the other collection—not both.

Tip: This example shows the use of the var-keyword. This simplifies the syntax of the HashSet declaration statement.

Var

C# program that uses SymmetricExceptWith

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main()
    {
	char[] array1 = { 'a', 'b', 'c' };
	char[] array2 = { 'b', 'c', 'd' };

	var hash = new HashSet<char>(array1);
	hash.SymmetricExceptWith(array2);

	// Write char array.
	Console.WriteLine(hash.ToArray());
    }
}

Output

ad

Dictionary. Set logic can also be implemented by using a Dictionary type instead of the HashSet type itself. The main problem caused by using Dictionary is that you must specify a value type. This may lead to more confusing code.

Also: The Dictionary code will have more lines, but performance would be similar. The hash lookup loops are equivalent.

Dictionary

Allocations. Using Dictionary and HashSet results in allocations on the managed heap. This means that for small source inputs, the HashSet and Dictionary will be slower than simple nested loops.

But: When the source input becomes large with thousands of elements, hashed collections are faster.

Dictionary Lookup Performance

Performance. Is there any performance benefit to using HashSet instead of Dictionary when you just need the simplest set functionality? In the C# language, a Dictionary with bool values can work as a set. Often a Dictionary can replace the HashSet.

Here, we test a HashSet(string) against a Dictionary(string, bool). We add several strings as keys and see if those keys exist (with Contains or ContainsKey). After the first loop iteration, no new elements are added.

C# program that tests HashSet performance

using System;
using System.Collections.Generic;
using System.Diagnostics;

class Program
{
    const int _max = 10000000;
    static void Main()
    {
	var h = new HashSet<string>(StringComparer.Ordinal);
	var d = new Dictionary<string, bool>(StringComparer.Ordinal);
	var a = new string[] { "a", "b", "c", "d", "longer", "words", "also" };

	var s1 = Stopwatch.StartNew();
	for (int i = 0; i < _max; i++)
	{
	    foreach (string s in a)
	    {
		h.Add(s);
		h.Contains(s);
	    }
	}
	s1.Stop();
	var s2 = Stopwatch.StartNew();
	for (int i = 0; i < _max; i++)
	{
	    foreach (string s in a)
	    {
		d[s] = true;
		d.ContainsKey(s);
	    }
	}
	s2.Stop();
	Console.WriteLine(h.Count);
	Console.WriteLine(d.Count);

	Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) /
	    _max).ToString("0.00 ns"));
	Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) /
	    _max).ToString("0.00 ns"));
	Console.Read();
    }
}

Output

7
7
529.99 ns
517.05 ns

The Dictionary had slightly better performance in this test than did the HashSet. In fact, in most tests where the two collections offered similar functionality, the Dictionary was faster.

Dictionary StringComparer Optimization

Tip: My guideline is that Dictionary should be used instead of HashSet in places where advanced HashSet functionality is not needed.

Summary. HashSet can be applied to elegantly eliminate duplicates in an array. The HashSet class has many methods and uses. Its constructor takes a union of a collection that implements the IEnumerable generic interface.


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf