C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
It associates keys to values. Each key must have a value. Dictionaries are used in many programs.
With square brackets, we assign and access a value at a key. With get() we can specify a default result. Dictionaries are fast. We create, mutate and test them.
Get example. There are many ways to get values. We can use the "[" and "]" characters. We access a value directly this way. But this syntax causes a KeyError if the key is not found.
Instead: We can use the get() method with one or two arguments. This does not cause any annoying errors. It returns None.
Argument 1: The first argument to get() is the key you are testing. This argument is required.
Argument 2: The second, optional argument to get() is the default value. This is returned if the key is not found.
Based on: Python 3 Python program that gets values plants = {} # Add three key-value tuples to the dictionary. plants["radish"] = 2 plants["squash"] = 4 plants["carrot"] = 7 # Get syntax 1. print(plants["radish"]) # Get syntax 2. print(plants.get("tuna")) print(plants.get("tuna", "no tuna found")) Output 2 None no tuna found
Get, none. In Python "None" is a special value like null or nil. I like None. It is my friend. It means no value. Get() returns None if no value is found in a dictionary.
Note: It is valid to assign a key to None. So get() can return None, but there is actually a None value in the dictionary.
Key error. Errors in programs are not there just to torment you. They indicate problems with a program and help it work better. A KeyError occurs on an invalid access.
Python program that causes KeyError lookup = {"cat": 1, "dog": 2} # The dictionary has no fish key! print(lookup["fish"]) Output Traceback (most recent call last): File "C:\programs\file.py", line 5, in <module> print(lookup["fish"]) KeyError: 'fish'
In-keyword. A dictionary may (or may not) contain a specific key. Often we need to test for existence. One way to do so is with the in-keyword.
True: This keyword returns 1 (meaning true) if the key exists as part of a key-value tuple in the dictionary.
False: If the key does not exist, the in-keyword returns 0, indicating false. This is helpful in if-statements.
Python program that uses in animals = {} animals["monkey"] = 1 animals["tuna"] = 2 animals["giraffe"] = 4 # Use in. if "tuna" in animals: print("Has tuna") else: print("No tuna") # Use in on nonexistent key. if "elephant" in animals: print("Has elephant") else: print("No elephant") Output Has tuna No elephant
Len built-in. This returns the number of key-value tuples in a dictionary. The data types of the keys and values do not matter. Len also works on lists and strings.
Caution: The length returned for a dictionary does not separately consider keys and values. Each pair adds one to the length.
Python program that uses len on dictionary animals = {"parrot": 2, "fish": 6} # Use len built-in on animals. print("Length:", len(animals)) Output Length: 2
Len notes. Let us review. Len() can be used on other data types, not just dictionaries. It acts upon a list, returning the number of elements within. It also handles tuples.
Keys, values. A dictionary contains keys. It contains values. And with the keys() and values() methods, we can store these elements in lists.
Next: A dictionary of three key-value pairs is created. This dictionary could be used to store hit counts on a website's pages.
Views: We introduce two variables, named keys and values. These are not lists—but we can convert them to lists.
Python program that uses keys hits = {"home": 125, "sitemap": 27, "about": 43} keys = hits.keys() values = hits.values() print("Keys:") print(keys) print(len(keys)) print("Values:") print(values) print(len(values)) Output Keys: dict_keys(['home', 'about', 'sitemap']) 3 Values: dict_values([125, 43, 27]) 3
Keys, values ordering. Elements returned by keys() and values() are not ordered. In the above output, the keys-view is not alphabetically sorted. Consider a sorted view (keep reading).
Sorted keys. In a dictionary keys are not sorted in any way. They are unordered. Their order reflects the internals of the hashing algorithm's buckets.
But: Sometimes we need to sort keys. We invoke another method, sorted(), on the keys. This creates a sorted view.
Python program that sorts keys in dictionary # Same as previous program. hits = {"home": 124, "sitemap": 26, "about": 32} # Sort the keys from the dictionary. keys = sorted(hits.keys()) print(keys) Output ['about', 'home', 'sitemap']
Items. With this method we receive a list of two-element tuples. Each tuple contains, as its first element, the key. Its second element is the value.
Tip: With tuples, we can address the first element with an index of 0. The second element has an index of 1.
Program: The code uses a for-loop on the items() list. It uses the print() method with two arguments.
Python that uses items method rents = {"apartment": 1000, "house": 1300} # Convert to list of tuples. rentItems = rents.items() # Loop and display tuple items. for rentItem in rentItems: print("Place:", rentItem[0]) print("Cost:", rentItem[1]) print("") Output Place: house Cost: 1300 Place: apartment Cost: 1000
Items, assign. We cannot assign elements in the tuples. If you try to assign rentItem[0] or rentItem[1], you will get an error. This is the error message.
Python error: TypeError: 'tuple' object does not support item assignment
Items, unpack. The items() list can be used in another for-loop syntax. We can unpack the two parts of each tuple in items() directly in the for-loop.
Here: In this example, we use the identifier "k" for the key, and "v" for the value.
Python that unpacks items # Create a dictionary. data = {"a": 1, "b": 2, "c": 3} # Loop over items and unpack each item. for k, v in data.items(): # Display key and value. print(k, v) Output a 1 c 3 b 2
For-loop. A dictionary can be directly enumerated with a for-loop. This accesses only the keys in the dictionary. To get a value, we will need to look up the value.
Items: We can call the items() method to get a list of tuples. No extra hash lookups will be needed to access values.
Here: The plant variable, in the for-loop, is the key. The value is not available—we would need plants.get(plant) to access it.
Python that loops over dictionary plants = {"radish": 2, "squash": 4, "carrot": 7} # Loop over dictionary directly. # ... This only accesses keys. for plant in plants: print(plant) Output radish carrot squash
Del built-in. How can we remove data? We apply the del method to a dictionary entry. In this program, we initialize a dictionary with three key-value tuples.
Then: We remove the tuple with key "windows". When we display the dictionary, it now contains only two key-value pairs.
Python that uses del systems = {"mac": 1, "windows": 5, "linux": 1} # Remove key-value at "windows" key. del systems["windows"] # Display dictionary. print(systems) Output {'mac': 1, 'linux': 1}
Del, alternative. An alternative to using del on a dictionary is to change the key's value to a special value. This is a null object refactoring strategy.
Update. With this method we change one dictionary to have new values from a second dictionary. Update() also modifies existing values. Here we create two dictionaries.
Pets1, pets2: The pets2 dictionary has a different value for the dog key—it has the value "animal", not "canine".
Also: The pets2 dictionary contains a new key-value pair. In this pair the key is "parakeet" and the value is "bird".
Result: Existing values are replaced with new values that match. New values are added if no matches exist.
Python that uses update # First dictionary. pets1 = {"cat": "feline", "dog": "canine"} # Second dictionary. pets2 = {"dog": "animal", "parakeet": "bird"} # Update first dictionary with second. pets1.update(pets2) # Display both dictionaries. print(pets1) print(pets2) Output {'parakeet': 'bird', 'dog': 'animal', 'cat': 'feline'} {'dog': 'animal', 'parakeet': 'bird'}
Copy. This method performs a shallow copy of an entire dictionary. Every key-value tuple in the dictionary is copied. This is not just a new variable reference.
Here: We create a copy of the original dictionary. We then modify values within the copy. The original is not affected.
Python that uses copy original = {"box": 1, "cat": 2, "apple": 5} # Create copy of dictionary. modified = original.copy() # Change copy only. modified["cat"] = 200 modified["apple"] = 9 # Original is still the same. print(original) print(modified) Output {'box': 1, 'apple': 5, 'cat': 2} {'box': 1, 'apple': 9, 'cat': 200}
Fromkeys. This method receives a sequence of keys, such as a list. It creates a dictionary with each of those keys. We can specify a value as the second argument.
Values: If you specify the second argument to fromdict(), each key has that value in the newly-created dictionary.
Python that uses fromkeys # A list of keys. keys = ["bird", "plant", "fish"] # Create dictionary from keys. d = dict.fromkeys(keys, 5) # Display. print(d) Output {'plant': 5, 'bird': 5, 'fish': 5}
Dict. With this built-in function, we can construct a dictionary from a list of tuples. The tuples are pairs. They each have two elements, a key and a value.
Tip: This is a possible way to load a dictionary from disk. We can store (serialize) it as a list of pairs.
Python that uses dict built-in # Create list of tuple pairs. # ... These are key-value pairs. pairs = [("cat", "meow"), ("dog", "bark"), ("bird", "chirp")] # Convert list to dictionary. lookup = dict(pairs) # Test the dictionary. print(lookup.get("dog")) print(len(lookup)) Output bark 3
Memoize. One classic optimization is called memoization. And this can be implemented easily with a dictionary. In memoization, a function (def) computes its result.
And: Once the computation is done, it stores its result in a cache. In the cache, the argument is the key. And the result is the value.
Memoization, continued. When a memoized function is called, it first checks this cache to see if it has been, with this argument, run before.
And: If it has, it returns its cached—memoized—return value. No further computations need be done.
Note: If a function is only called once with the argument, memoization has no benefit. And with many arguments, it usually works poorly.
Get performance. I compared a loop that uses get() with one that uses both the in-keyword and a second look up. Version 2, with the "in" operator, was faster.
Version 1: This version uses a second argument to get(). It tests that against the result and then proceeds if the value was found.
Version 2: This version uses "in" and then a lookup. Twice as many lookups occur. But fewer statements are executed.
Python that benchmarks get import time # Input dictionary. systems = {"mac": 1, "windows": 5, "linux": 1} # Time 1. print(time.time()) # Get version. i = 0 v = 0 x = 0 while i < 10000000: x = systems.get("windows", -1) if x != -1: v = x i += 1 # Time 2. print(time.time()) # In version. i = 0 v = 0 while i < 10000000: if "windows" in systems: v = systems["windows"] i += 1 # Time 3. print(time.time()) Output 1345819697.257 1345819701.155 (get = 3.90 s) 1345819703.453 (in = 2.30 s)
String key performance. In another test, I compared string keys. I found that long string keys take longer to look up than short ones. Shorter keys are faster.
Performance, loop. A dictionary can be looped over in different ways. In this benchmark we test two approaches. We access the key and value in each iteration.
Version 1: This version loops over the keys of the dictionary with a while-loop. It then does an extra lookup to get the value.
Version 2: This version instead uses a list of tuples containing the keys and values. It actually does not touch the original dictionary.
But: Version 2 has the same effect—we access the keys and values. The cost of calling items() initially is not counted here.
Python that benchmarks loops import time data = {"michael": 1, "james": 1, "mary": 2, "dale": 5} items = data.items() print(time.time()) # Version 1: get. i = 0 while i < 10000000: v = 0 for key in data: v = data[key] i += 1 print(time.time()) # Version 2: items. i = 0 while i < 10000000: v = 0 for tuple in items: v = tuple[1] i += 1 print(time.time()) Output 1345602749.41 1345602764.29 (version 1 = 14.88 s) 1345602777.68 (version 2 = 13.39 s)
Benchmark, loop results. We see above that looping over a list of tuples is faster than directly looping over a dictionary. This makes sense. With the list, no lookups are done.
Frequencies. A dictionary can be used to count frequencies. Here we introduce a string that has some repeated letters. We use get() on a dictionary to start at 0 for nonexistent values.
So: The first time a letter is found, its frequency is set to 0 + 1, then 1 + 1. Get() has a default return.
Python that counts letter frequencies # The first three letters are repeated. letters = "abcabcdefghi" frequencies = {} for c in letters: # If no key exists, get returns the value 0. # ... We then add one to increase the frequency. # ... So we start at 1 and progress to 2 and then 3. frequencies[c] = frequencies.get(c, 0) + 1 for f in frequencies.items(): # Print the tuple pair. print(f) Output ('a', 2) ('c', 2) ('b', 2) ('e', 1) ('d', 1) ('g', 1) ('f', 1) ('i', 1) ('h', 1)
A summary. A dictionary is usually implemented as a hash table. Here a special hashing algorithm translates a key (often a string) into an integer.
For a speedup, this integer is used to locate the data. This reduces search time. For programs with performance trouble, using a dictionary is often the initial path to optimization.