TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to PYTHON

Python Remove Duplicates From List

Use a set to remove duplicate elements from a list without changing the order of elements.
Remove duplicates. A list contains duplicate elements. How can we remove them? Some approaches may lead to the elements becoming reordered, but this is not necessary.
Methods. We use a for-loop and check a set and append to a new list. And in the second example, we use the set built-in—this code is simpler but changes order.List
First, we introduce the remove_duplicates method. It receives a list and loops over its values. It maintains two collections: an output list and a set.

Set: The set, seen, tracks which elements have already been encountered. Sets have only unique elements.

Set

So: We append elements to our list that have not been seen yet. This means all duplicates are removed, but ordering is not affected.

Python program that removes duplicates from list def remove_duplicates(values): output = [] seen = set() for value in values: # If value has not been encountered yet, # ... add it to both list and set. if value not in seen: output.append(value) seen.add(value) return output # Remove duplicates from this list. values = [5, 5, 1, 1, 2, 3, 4, 4, 5] result = remove_duplicates(values) print(result) Output [5, 1, 2, 3, 4]
Set built-in. This code is simpler, but it reorders elements in the resulting list. We pass the list into the set built-in method.

And: It converts the list into a set—and sets must have only unique elements. So this removes duplicates.

Then: We convert the set back into a list. In these two conversions, the elements are reordered (sorted).

Python program that removes duplicates, reorders # Our input list. values = [5, 5, 1, 1, 2, 3, 4, 4, 5] # Convert to a set and back into a list. set = set(values) result = list(set) print(result) Output [1, 2, 3, 4, 5]
Has duplicates. This code checks a list for duplicates, but does not remove anything. It uses a nested loop. The inner for-loop only checks the following elements, not the preceding ones.

Note: This could be a performance disaster on extremely large collections. But for small things, it is effective.

Note 2: For large collections, using a set or dictionary to check for duplicates is a better option.

Python program that implements has_duplicates def has_duplicates(values): # For each element, check all following elements for a duplicate. for i in range(0, len(values)): for x in range(i + 1, len(values)): if values[i] == values[x]: return True return False # Test the has_duplicates method. print(has_duplicates([10, 20, 30, 40])) print(has_duplicates([1, 2, 3, 1, 2])) print(has_duplicates([40, 30, 20, 40])) print(has_duplicates(["cat", "dog", "bird", "dog"])) print(has_duplicates([None, 0, 1, 2])) Output False True True True False
Benchmark, has duplicates. Should we ever check before removing duplicates? On a six-element list, we test the performance of has_duplicates.

Version 1: This version of the code checks to see if duplicates exist in the list before removing them.

Version 2: Here we just remove duplicates immediately, without checking to see if any duplicates exist.

Result: For a six-element list with no duplicates, using nested for-loops to check was faster than using the set built-in.

However: This test assumes no duplicates are ever found. This is a worthwhile optimization if duplicates are rare and lists are small.

Python program that times has_duplicates method, set import time def has_duplicates(values): # Same as above example. for i in range(0, len(values)): for x in range(i + 1, len(values)): if values[i] == values[x]: return True return False # Contains no duplicates. elements = [100, 200, 300, 400, 500, 600] print(time.time()) # Version 1: test before using set. for i in range(0, 10000000): if has_duplicates(elements): unique = list(set(elements)) else: unique = elements print(time.time()) # Version 2: always use set. for i in range(0, 10000000): unique = list(set(elements)) print(time.time()) Output 1420055657.667 1420055657.838 has_duplicates: 0.171 s [PyPy] 1420055658.061 set, list: 0.223 s
Some notes. Programs often can do the same thing in different ways. Some approaches may be more efficient. Others may be more concise or easy to read.
Duplicates are often unwanted. We removed duplicates from lists. For removing duplicates from a list, a custom method may be needed to retain ordering of elements.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf