Java Remove Duplicates From ArrayList

Remove duplicates from an ArrayList of strings. Use a HashSet in removeDuplicates.

Remove duplicates. No one else is here. The ancient ruins are abandoned. You see many pillars standing together—they are duplicates. In the late afternoon, you consider these pillars.

Sometimes duplicates are useful. But often they are not. Duplicates in an ArrayList can be removed. A hashing mechanism (like HashSet) that records encountered elements is helpful.

A method. The removeDuplicates method accepts an ArrayList of Strings. In it, we loop through that argument list. If our HashSet contains the String, we do nothing.

Add: But if our HashSet does not contain the String, we add it to the HashSet (so it is detected next time).

Result: We also build up our result ArrayList. This collection will retain the ordering of the original, but not duplicates.

ArrayList

Info: RemoveDuplicates() could be easily adapted to work upon Integers or other types. Try changing the String type to Integer.

Java program that removes duplicates import java.util.ArrayList; import java.util.HashSet; public class Program { static ArrayList<String> removeDuplicates(ArrayList<String> list) { // Store unique items in result. ArrayList<String> result = new ArrayList<>(); // Record encountered Strings in HashSet. HashSet<String> set = new HashSet<>(); // Loop over argument list. for (String item : list) { // If String is not in set, add it to the list and the set. if (!set.contains(item)) { result.add(item); set.add(item); } } return result; } public static void main(String[] args) { ArrayList<String> list = new ArrayList<>(); list.add("dog"); list.add("cat"); list.add("dog"); list.add("dog"); list.add("cat"); list.add("bird"); // Remove duplicates from ArrayList of Strings. ArrayList<String> unique = removeDuplicates(list); for (String element : unique) { System.out.println(element); } } } Output dog cat bird

HashSet constructor. This approach removes all duplicates, but the ordering of the elements is lost. We pass an ArrayList to the HashSet constructor, and then convert back to an ArrayList.HashSet

Note: This code is simpler and shorter. It does not suffer from any huge performance issues.

Reordering: Using this direct and simple approach causes the elements to be reordered. The HashSet will not retain ordering.

Java program that uses HashSet constructor import java.util.ArrayList; import java.util.HashSet; public class Program { public static void main(String[] args) { // Create ArrayList with one duplicate element. ArrayList<String> values = new ArrayList<>(); values.add("cat"); values.add("dog"); values.add("cat"); values.add("bird"); // Create HashSet from ArrayList. HashSet<String> set = new HashSet<>(values); // Create ArrayList from the set. ArrayList<String> result = new ArrayList<>(set); // The result. System.out.println(result.toString()); } } Output [cat, bird, dog]

Has duplicates. With a nested for-loop, we can check an ArrayList (or other array) for duplicate elements. We only search following (not preceding) elements for an exact match.

Tip: If an element and a following element are equal, the ArrayList has a duplicate. Otherwise, no duplicates exist.

Tip 2: This loop-based method is sometimes faster than using a set like HashSet. Its performance is poor on large collections.

Thus: Calling hasDuplicates on small lists where no duplicates are likely is often a good optimization.

Java program that checks for duplicates import java.util.ArrayList; public class Program { public static boolean hasDuplicates(ArrayList<String> list) { for (int i = 0; i < list.size(); i++) { // Loop over all following elements. for (int x = i + 1; x < list.size(); x++) { // If two elements equal, there is a duplicate. if (list.get(i) == list.get(x)) { return true; } } } // No duplicates found. return false; } public static void main(String[] args) { // This ArrayList has a duplicate. ArrayList<String> values = new ArrayList<>(); values.add("green"); values.add("blue"); values.add("red"); values.add("pink"); values.add("pink"); // See if duplicates exist. if (hasDuplicates(values)) { System.out.println(true); } // Remove all pink strings. values.removeIf(element -> element == "pink"); // No more duplicates. System.out.println(hasDuplicates(values)); } } Output true false

Testing versus removal. Here is an optimization. If an ArrayList rarely contains duplicates, and has few elements, we can avoid modifying it. We can just test for duplicates.

Version 1: This code loops over the ArrayList to see if any duplicates exist. It does not try to remove the duplicates.

Version 2: This code removes the duplicates without first checking whether any duplicates are present.

Result: In this benchmark, we find that on a 4-element ArrayList, calling hasDuplicates is much faster than removeDuplicates.

Warning: If lists usually have duplicates, hasDuplicates() will not help performance. And it will be slow on long lists due to looping.

Java program that times duplicate test, duplicate removal import java.util.*; public class Program { static boolean hasDuplicates(ArrayList<String> list) { // See if duplicates exist. for (int i = 0; i < list.size(); i++) { for (int x = i + 1; x < list.size(); x++) { if (list.get(i) == list.get(x)) { return true; } } } return false; } static ArrayList<String> removeDuplicates(ArrayList<String> list) { // Remove Duplicates: place them in new list (see above example). ArrayList<String> result = new ArrayList<>(); HashSet<String> set = new HashSet<>(); for (String item : list) { if (!set.contains(item)) { result.add(item); set.add(item); } } return result; } public static void main(String[] args) { ArrayList<String> elements = new ArrayList<>(); elements.add("one"); elements.add("two"); elements.add("three"); elements.add("four"); long t1 = System.currentTimeMillis(); // Version 1: test to see if duplicates exist. for (int i = 0; i < 10000000; i++) { if (hasDuplicates(elements)) { System.out.println(false); } } long t2 = System.currentTimeMillis(); // Version 2: process list even if no duplicates. for (int i = 0; i < 10000000; i++) { ArrayList<String> copy = removeDuplicates(elements); if (copy == null) { System.out.println(false); } } long t3 = System.currentTimeMillis(); // ... Benchmark results. System.out.println(t2 - t1); System.out.println(t3 - t2); } } Output 128 ms: hasDuplicates() 1409 ms: removeDuplicates()

Performance notes. For small lists, simply looping over elements already added, and not adding duplicate ones, is faster. But this approach would lead to problems with large collections.

Also: Another strategy is to not add duplicates in the first place. A custom method could prevent duplicates from being added.

Sets versus loops. With set-based methods, we can remove duplicate Strings from an ArrayList. Other approaches, which involve nested loops to test elements, are superior in some cases.

For duplicate removal ("dedupe") there are many possible options. This is a good thing. It means we can select the one that is best for our specific use case.

The Dev Codes

.Net

.NET Array Dictionary List String 2D Async DataTable Dates DateTime Enum File For Foreach Format IEnumerable If IndexOf Lambda LINQ Parse Path Process Property Regex Replace Sort Split Static StringBuilder Substring Switch Tuple

Java

Core Array ArrayList HashMap String 2D Cast Character Console Deque Duplicates File For Format HashSet If IndexOf Lambda Math ParseInt Process Random Regex Replace Sort Split StringBuilder Substring Switch Vector While

TheDeveloperBlog.com

Java Remove Duplicates From ArrayList

Related Links:

.Net

Java

Related Links