The Python Collection module is a built-in module that provides various specialized container data types such as lists, dictionaries, tuples, and sets. These data structures provide alternatives to the standard data types in Python and offer additional functionality for efficient and convenient data manipulation.
Here are some of the commonly used data structures in the Collection module:
- Counter: A dictionary subclass that helps count hashable objects. It returns a dictionary with keys being the unique elements in the input sequence and values being the count of their occurrences.
- defaultdict: A dictionary subclass that provides a default value for a nonexistent key. It takes a function that returns the default value for any missing key.
- deque: A double-ended queue that supports adding and removing elements from either end of the queue in O(1) time.
- namedtuple: A subclass of tuple that has named fields, allowing you to access elements by name instead of position.
- ChainMap: A dictionary-like class that groups multiple dictionaries together into a single unit, allowing them to be treated as one.
- OrderedDict: A dictionary subclass that remembers the order in which keys were inserted. It returns the keys in the order they were added.
- Heapq: A module that provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. It allows you to create a heap data structure for efficient access to the smallest or largest element.
These data structures can make your code more efficient, readable, and maintainable. The Collection module is a powerful tool for data manipulation and is worth exploring if you work with Python on a regular basis.
namedtuple():
namedtuple()
is a factory function provided by the Python Collection module. It creates a subclass of the built-in tuple
data type with named fields. A namedtuple
is essentially a lightweight object with named attributes, like a struct in C or a record in Pascal.
Here is an example of how to create a namedtuple
:
from collections import namedtuple Point = namedtuple('Point', ['x', 'y']) p = Point(1, 2) print(p.x, p.y)
In this example, we define a Point
class using namedtuple()
, with x
and y
as the field names. We then create an instance of this class and print its attributes.
The output of this code will be:
1 2
In addition to the tuple
methods, namedtuple
provides some useful attributes and methods:
_fields
: A tuple of field names for thenamedtuple
._make(iterable)
: Create a new instance of thenamedtuple
from an iterable._asdict()
: Return an ordered dictionary representing thenamedtuple
._replace(**kwargs)
: Create a new instance of thenamedtuple
with some fields replaced by new values.
Using namedtuple
can make your code more readable and self-documenting by giving meaningful names to the attributes of your data structures. It is particularly useful when you have a small, fixed number of fields that you need to access by name.
OrderedDict():
OrderedDict()
is a subclass of the built-in dict
class provided by the Python Collection module. It maintains the order of key-value pairs as they are added to the dictionary, unlike a regular dict
which does not guarantee the order of its elements.
Here is an example of how to use OrderedDict()
:
from collections import OrderedDict d = OrderedDict() d['one'] = 1 d['two'] = 2 d['three'] = 3 print(d)
The output of this code will be:
OrderedDict([('one', 1), ('two', 2), ('three', 3)])
In this example, we create an empty OrderedDict
and add three key-value pairs to it. When we print the dictionary, the key-value pairs are in the same order in which we added them.
In addition to the methods provided by a regular dict
, OrderedDict
provides the following additional methods:
move_to_end(key, last=True)
: Move an existing key to either end of the dictionary. Iflast
isTrue
, move the key to the end; otherwise, move it to the beginning.popitem(last=True)
: Remove and return a key-value pair. Iflast
isTrue
, remove and return the last item; otherwise, remove and return the first item.
OrderedDict()
can be useful in situations where you need to maintain the order of elements in a dictionary, such as when you are processing data that has a specific order or when you need to produce output that must be in a specific order.
defaultdict():
defaultdict()
is a subclass of the built-in dict
class provided by the Python Collection module. It is a dictionary that automatically creates a new value for a nonexistent key based on a given factory function.
Here is an example of how to use defaultdict()
:
defaultdict() is a subclass of the built-in dict class provided by the Python Collection module. It is a dictionary that automatically creates a new value for a nonexistent key based on a given factory function. Here is an example of how to use defaultdict():
The output of this code will be:
defaultdict(<class 'int'>, {'a': 1, 'b': 2, 'c': 3})
In this example, we create a defaultdict
with the default value of 0. We then add three key-value pairs to the dictionary, each time incrementing the value for the corresponding key by a different amount. When we print the dictionary, we see that the values for the keys are as expected.
In addition to the methods provided by a regular dict
, defaultdict
provides the following additional method:
default_factory
: Returns the function used to create values for nonexistent keys.
defaultdict()
is useful when you need to create a dictionary with default values for all nonexistent keys. This can simplify code by avoiding the need to explicitly initialize every key with a default value. Common use cases for defaultdict()
include counting occurrences of elements in a list or processing data that may have missing values.
Counter():
Counter()
is a subclass of the built-in dict
class provided by the Python Collection module. It is a dictionary that counts the occurrences of elements in a list or any iterable. The elements are stored as keys and their counts as values.
Here is an example of how to use Counter()
:
from collections import Counter lst = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1] c = Counter(lst) print(c)
The output of this code will be:
Counter({1: 4, 2: 3, 3: 2, 4: 1})
In this example, we create a list of integers and pass it to Counter()
to count the occurrences of each element. When we print the resulting dictionary, we see that it contains each element as a key and the number of times it occurs as the value.
In addition to the methods provided by a regular dict
, Counter()
provides the following additional methods:
most_common([n])
: Returns a list of the n most common elements and their counts, in descending order. If n is not specified, returns all elements in descending order.subtract([iterable-or-mapping])
: Subtract counts from another iterable or mapping.
Counter()
is useful for a variety of tasks where you need to count the occurrences of elements, such as analyzing data, counting words in text, or calculating frequencies of events.
deque():
deque()
is a class provided by the Python Collection module that represents a double-ended queue, which is a data structure that allows insertion and deletion of elements from both ends with O(1) time complexity.
Here is an example of how to use deque()
:
from collections import deque d = deque([1, 2, 3]) d.appendleft(0) d.append(4) print(d)
The output of this code will be:
deque([0, 1, 2, 3, 4])
In this example, we create a deque with three elements and then insert an element at the beginning of the deque using appendleft()
and an element at the end of the deque using append()
. When we print the deque, we see that it contains all five elements in the order we added them.
In addition to the methods provided by a regular list
, deque()
provides the following additional methods:
append(x)
: Add an element to the right end of the deque.appendleft(x)
: Add an element to the left end of the deque.pop()
: Remove and return the rightmost element from the deque. Raises an error if the deque is empty.popleft()
: Remove and return the leftmost element from the deque. Raises an error if the deque is empty.rotate(n)
: Rotate the deque n steps to the right. If n is negative, rotate to the left.
deque()
is useful in situations where you need to efficiently add or remove elements from both ends of a collection, such as implementing a queue or a stack. It can also be used for implementing algorithms that require access to both ends of a collection, such as breadth-first search.
Chainmap Objects:
ChainMap
is a class provided by the Python Collection module that allows the chaining of multiple dictionaries or other mappings into a single mapping. It provides a convenient way to combine several dictionaries into one, with the keys from each dictionary in the chain being searched in the order they are added.
Here is an example of how to use ChainMap
:
from collections import ChainMap d1 = {'a': 1, 'b': 2} d2 = {'b': 3, 'c': 4} c = ChainMap(d1, d2) print(c['a']) print(c['b']) print(c['c'])
The output of this code will be:
1 2 4
In this example, we create two dictionaries d1
and d2
and then create a ChainMap
with both dictionaries. When we access keys in the ChainMap
, the value for the key is searched first in d1
and then in d2
. If a key is present in both dictionaries, the value from d1
is used.
In addition to the methods provided by a regular dict
, ChainMap
provides the following additional method:
maps
: Return a list of the underlying mappings in the chain.
ChainMap
is useful in situations where you need to combine multiple dictionaries into one and prefer to keep the individual dictionaries separate rather than merging them into a single dictionary. This can be useful for passing a configuration to a function or when dealing with a hierarchy of settings, such as those in a web application framework.