Here we’re going to see collections module. This module has a great assortment of data types. We’ll break it down into three main sections:
namedtuple,deque,CounterChainMap,defaultdict,OrderedDictUserDict,UserList,UserStr
1. namedtuple, deque, Counter
1.1. namedtuple
Tuples are common in Python and offer great functionality, but can be a little confusing when you’re picking specific data out of them using indices. Let’s look at this example that describes a book with title, author, and publication date and we want to pull data out of it:
>>> bnw = ("Brave New World", "Aldous Huxley", 1931)
>>> title = bnw[0]
>>> author = bnw[1]
>>> publication_date = bnw[2]Sometimes it’s hard to understand what each position corresponds to. There are a few alternatives to this: construct a dictionary, create a new class, or use namedtuple. Let’s re-create the same previous example:
from collections import namedtuple
Book = namedtuple("Book", ["title", "author", "year"])
bnw = Book("Brave New World", "Aldous Huxley", 1931)where in namedtuple constructor we listed the positional entries for each field. Now, we’re able to pull out data from this data structure with dot notation:
>>> bnw.author
'Aldous Huxley'
>>> bnw.title
'Brave New World'
>>> bnw.year
1931Note that you can convert a namedtuple into a dictionary with _asdict() method:
>>> bnw._asdict()
{'title': 'Brave New World', 'author': 'Aldous Huxley', 'year': 1931}1.2. deque
This data structure is basically a list-like container with fast appends and pops on either end of it, which is most commonly used for flexible stacks and queues. Let’s look at an example (deque accepts an iterable as argument):
from collections import deque
dq = deque((3,4,5))deque allows us to operate on both ends of it. For instance we can use append() and appendleft() methods:
>>> dq.append(6)
>>> dq.appendleft(2)
>>> dq
deque([2, 3, 4, 5, 6])And still use pop() and popleft():
>>> dq.pop()
>>> dq.popleft()
>>> dq
deque([3, 4, 5])Now we exploit these features to create a new function checking if a word is a palindrome:
def is_palindrome(word):
dq = deque(word)
while len(dq) > 1:
if dq.popleft() != dq.pop():
return False
return TrueThen:
>>> is_palindrome("racecar")
True
>>> is_palindrome("alphabet")
False1.3. Counter
The core functionality of Counter is pretty simple: just pass some iterable in and we get a counter object back:
>>> from collections import Counter
>>> Counter("Hello world!")
Counter({'l': 3,
'o': 2,
'H': 1,
'e': 1,
' ': 1,
'w': 1,
'r': 1,
'd': 1,
'!': 1})What is telling us is how many times it encountered a single element within that iterable and, since strings are broken down by default on a per character basis, it’s going to give us the amount of times that encountered each of these characters.
We also could have done the same thing by splitting it into individual words (splitting a string returns a list of words, which is still an iterable):
>>> Counter("Hello world!".split())
Counter({'Hello': 1, 'world!': 1}) Counter objects have a number of helper functions available, like elements() that repeats everything in the Counter by the how many times it has been counted:
>>> count = Counter("Hello world, my name is Simone")
>>> list(count.elements()) # .elements() return an iterable, so we'll expand it with list() constructor
['H',
'e',
'e',
'e',
'l',
'l',
'l',
'o',
'o',
'o',
' ',
' ',
' ',
' ',
' ',
'w',
'r',
'd',
',',
'm',
'm',
'm',
'y',
'n',
'n',
'a',
'i',
'i',
's',
'S']An example usage of elements() is when you need an alternative way to populate a list with a default value some amount of times:
>>> list(Counter(x=10).elements())
['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']Other methods:
>>> count.most_common(3)
[(' ', 5), ('e', 3), ('l', 3)]
>>> count.total() # get a total for all elements encountered
302. ChainMap, defaultdict, OrderedDict
2.1. defaultdict
Dictionaries in Python are likely the best choice to use in a lot of situation, but they can still be a bit awkward to work with. Say, for example, we wanted to replicate the counter behaviour on our own:
word_list = ["orange", "apple", "watermelon", "apple", "watermelon", "grape", "apple"]
counter = {}
for word in word_list:
if word in counter:
counter[word] +=1
else:
counter[word] = 1Let’s see the result:
>>> counter
{'orange': 1, 'apple': 3, 'watermelon': 2, 'grape': 1}This is a situation where defaultdict can help because we can specify what the default values should be:
>>> from collections import defaultdict
>>> ct = defaultdict(int)
>>> ct
defaultdict(int, {})Now, if you try to access a non-existing key of this data structure, we’ll get the default value without raising a KeyError exception:
>>> ct["apple"]
0
>>> ct
defaultdict(int, {'apple': 0})Notice that apple now is a key of this dictionary-like data structure. This implies that we can set a value for these keys without checking and see if the value already exists:
>>> ct["grape"] += 1
>>> ct
defaultdict(int, {'apple': 0, 'grape': 1})Due of this behaviour, we can change the code above in a more concise one because we can remove the if statement that checks if the key exists in the dictionary:
word_list = ["orange", "apple", "watermelon", "apple", "watermelon", "grape", "apple"]
counter = defaultdict(int)
for word in word_list:
counter[word] +=12.2. OrderedDict
Prior to Python 3.7, dictionary order wasn’t something that was guaranteed by the language itself. So, OrderedDict were used to guarantee a specific order with dictionaries. Since Python 3.7 dictionary ordering is now guaranteed, so OrderedDict aren’t nearly as commonly used anymore. However, it does still have some interesting functionalities to know. Let’s create a simple one:
from collections import OrderedDict
tasks = [
("Task 1", "To do"),
("Task 2", "To do"),
("Task 3", "To do")
]
task_dict = OrderedDict(tasks)
task_dict["Task 2"] = "Complete"One great functionality is that you can move around the orders of the keys. For instance, with move_to_end() method you can push down a key-value pair till the last key available:
>>> task_dict.move_to_end("Task 2")
>>> task_dict
OrderedDict([('Task 1', 'To do'), ('Task 3', 'To do'), ('Task 2', 'Complete')])We can also to the inverse. If you create a new key it goes, by default, to the end of the dictionaries but, if you set last=False in the move_to_end() method, it moves that key at the beginning of the dictionary:
>>> task_dict["Task 4"] = "To do"
>>> task_dict.move_to_end("Task 4", last=False)
OrderedDict([('Task 4', 'To do'),
('Task 1', 'To do'),
('Task 3', 'To do'),
('Task 2', 'Complete')])2.3. ChainMap
This dictionary helper is definitely under-utilized, but it’s very powerful. ChainMap are a great way to combine multiple dictionaries together without modifying the original dictionaries. Let’s give an example:
>>> from collections import ChainMap
>>> d1 = {"orange": 1, "apple": 3, "watermelon": 2, "grape": 1}
>>> d2 = {"banana": 1, "apple": 2, "grape": 1}
>>> cm = ChainMap(d1, d2)
>>> cm
ChainMap({'orange': 1, 'apple': 3, 'watermelon': 2, 'grape': 1}, {'banana': 1, 'apple': 2, 'grape': 1})The resulting cm combines the contents of d1 and d2, allowing you to access keys from both dictionaries as if they were one.
There are different rules you have to keep in mind when working with ChainMap:
- there is a precedence order for the dictionaries within the map, meaning the earlier on the dictionaries added to the chain map, the higher priority it has. So, although
d1andd2both haveapple, if you access the value ofapple, according to the chain map, we see we get the value fromd1:
>>> cm["apple"]
3- if we were to update this value, even though
appleexists in bothd1andd2, onlyd1was modified due to its higher priority:
>>> cm["apple"] = 5
>>> cm
ChainMap({'orange': 1, 'apple': 5, 'watermelon': 2, 'grape': 1}, {'banana': 1, 'apple': 2, 'grape': 1})
>>> d1
{'orange': 1, 'apple': 5, 'watermelon': 2, 'grape': 1}
>>> d2
{'banana': 1, 'apple': 2, 'grape': 1}- if a key doesn’t exist in the first dictionary, updating the
ChainMapadds it to the first dictionary by default:
>>> cm["banana"] = 4
>>> cm
ChainMap({'orange': 1, 'apple': 5, 'watermelon': 2, 'grape': 1, 'banana': 4}, {'banana': 1, 'apple': 2, 'grape': 1})In this example, 'banana' was added to D1 even though it already existed in D2. The ChainMap never updates secondary dictionaries; it only works with the first dictionary in the list.
Using ChainMap for CLI Application Settings Management
When designing a command-line application, it’s common to pull configuration values from three main sources:
- Static configuration files (lowest priority)
- Environment variables (medium priority)
- Command-line arguments (highest priority)
These sources have an order of precedence, from least mutable to most easily mutable. This ensures that command-line arguments override environment variables, and environment variables override static files.
To demonstrate this, the following Python script sets up a ChainMap to combine these settings efficiently.
1. Static Configuration File Simulation
The first dictionary static_file_settings contains default settings:
static_file_settings = {"host": "localhost", "port": 8080, "debug": False}2. Environment Variables
The script then attempts to load environment variables using os.environ.get(). Missing variables are purged to keep the dictionary clean:
# Environment variable settings
env_var_settings = {
"host": os.environ.get("APP_HOST"),
"port": os.environ.get("APP_PORT"),
"debug": os.environ.get("APP_DEBUG"),
}
# Remove any environment variables that are missing
env_var_settings = {k: v for k, v in env_var_settings.items() if v is not None}3. Command-Line Arguments
The script parses command-line arguments using argparse and removes missing arguments:
# Command-line argument settings
parser = argparse.ArgumentParser(description="Command-line application with settings.")
parser.add_argument("--host", help="Host address")
parser.add_argument("--port", type=int, help="Port number")
parser.add_argument("--debug", action="store_true", help="Enable debug mode")
cmd_line_args = vars(parser.parse_args())
# Remove any command-line arguments that are missing
cmd_line_args = {k: v for k, v in cmd_line_args.items() if v is not None}Combining Settings with ChainMap
The order of precedence is maintained by combining the dictionaries using ChainMap in the following order:
# Combine dictionaries using ChainMap
combined_settings = ChainMap(cmd_line_args, env_var_settings, static_file_settings)This ensures command-line arguments override environment variables, which override static files.
Testing the Application
# Accessing command-line application settings
print("Settings:")
print(f"Host: {combined_settings['host']}")
print(f"Port: {combined_settings['port']}")
print(f"Debug Mode: {combined_settings['debug']}")Test Results
- Running the script without any environment variables or command-line arguments gives default settings with
python collections_script.py:
Host: localhost
Port: 8080
Debug Mode: False - Let’s set
APP_PORT=123withpython APP_PORT=123 collections_script.pyand we’ll see that running the application now shows that port is overridden:
Host: localhost
Port: 123
Debug Mode: False - Passing command-line arguments with
python APP_PORT=123 collections_script.py --port 59:
Host: localhost
Port: 59
Debug Mode: False Conclusion
By using ChainMap, settings remain independent while allowing a priority-based override system. The rest of the application can still access individual settings dictionaries (env_var_settings, static_file_settings) if needed. This keeps the application modular, clean, and easy to extend.
User* base classes
The Python collections module provides a set of base classes like UserDict, UserList, and UserString, which are designed to create customized versions of built-in data structures. While it’s common practice to directly inherit from dict, list, or str, using these User* classes can still be beneficial.
Why Use UserDict?
The key difference between UserDict and directly inheriting from dict is the presence of the internal .data attribute. This attribute serves as a reference to the underlying dictionary, making it easier to manage and extend dictionary-like behavior without directly interacting with the built-in dict implementation. This can simplify custom logic while ensuring compatibility with dictionary-like methods.
Example: Custom Dictionary Access with Box
In this example, we implement a Box class that inherits from UserDict. The goal is to allow dot notation access for dictionary keys, similar to libraries like python-box or AttrDict.
Implementing Box Class
- Attribute Access (
__getattr__): Enables dot notation to access dictionary keys. - Attribute Assignment (
__setattr__): Allows setting dictionary keys via dot notation. - Attribute Deletion (
__delattr__): Supports deleting dictionary keys with dot notation.
from collections import UserDict
class Box(UserDict):
def __getattr__(self, key):
# Access dictionary keys using attributes
if key in self.data:
return self.data[key]
raise AttributeError(f"'Box' object has no attribute '{key}'")
def __setattr__(self, key, value):
# Set dictionary keys using attributes
if key == "data": # Prevent infinite recursion
super().__setattr__(key, value)
else:
self.data[key] = value
def __delattr__(self, key):
# Delete dictionary keys using attributes
if key in self.data:
del self.data[key]
else:
raise AttributeError(f"'Box' object has no attribute '{key}'")
We create an instance of Box representing a Star Trek character:
# Create a dictionary for a character
character = Box({"name": "Spock", "species": "Vulcan", "rank": "Commander"})
# Access attributes using dot notation
print(character.name) # Spock
print(character.species) # Vulcan
print(character.rank) # Commander
# Modify an attribute
character.rank = "Captain"
print(character.rank) # Captain
# Delete an attribute
del character.rank
print(character) # {'name': 'Spock', 'species': 'Vulcan'}
Key Takeaways
- Using
UserDicthelps avoid complexities from directly inheritingdict, thanks to its internaldataattribute. - The
Boxclass demonstrates how to extend dictionary behavior with dot notation access, making dictionaries easier to use in more intuitive ways. - This pattern can simplify custom dictionary implementations while maintaining compatibility with standard dictionary operations.