What we’ll see

  • By implementing Special Methods, your objects can behave like the built-in types, enabling the expressive coding style the community considers Pythonic.

Python Data Model

Data Model

Data Model: description of Python as a framework. It formalizes the interfaces of the building blocks of the language itself, such as sequences, functions, iterators, classes, and so on.

Python Data Model allows to write len(collection) instead of collection.len(). This apparent oddity is the tip of the entire Python Data Model and it’s the key to everything we call Pythonic.

To create new classes, we leverage the Python Data Model and the Python Interpreter invokes Special Methods to perform basic object operations, often triggered by special syntax. The special method names are always written with leading and trailing double underscores. For example, in order to evaluate my_collection[index], the interpreter calls my_collection.__getitem__(index). We implement special methods when we want our objects to support and interact with fundamental language constructs.

Special methods are also called Magic Methods or Dunder Methods (“dunder” is a shortcut for double underscore before and after).

Example about what “Pythonic” means

Le’t show the power of implementing __getitem__ and __len__ inside a class:

import collections
 
Card = collections.namedtuple("Card", ["rank", "suit"])
 
class FrenchDeck:
	ranks = [str(n) for n in range(2, 11)] + list("JQKA")
	suits = "spades diamonds clubs hearts".split()
 
	def __init__(self):
		self._cards = [Card(rank, suit) for suit in self.suits for rank in self.ranks]
 
	def __len__(self):
		return len(self._cards)
 
	def __getitem__(self, position):
		return self._cards[position]

Note that the reason of using collections.namedtuple is to provide a nice a representation for the cards in the deck:

>>> beer_card = Card('7', 'diamonds')
>>> beer_card
Card(rank='7', suit='diamonds')

Let’s instantiate a deck and see how many items it contains:

>>> deck = FrenchDeck()
>>> len(deck)
52

Let’s read a specific card from the deck:

>>> deck[0]
Card(rank='A', suit='hearts')
 
>>> deck[-1]
Card(rank='2', suit='spades')

Furthermore, since __getitem__ delegates to the [] operator, the deck automatically supports slicing, and it makes it iterable:

>>> deck[:3]
[Card(rank='2', suit='spades'), Card(rank='3', suit='spades'), Card(rank='4', suit='spades')]
 
>>> for card in deck:
...     print(card)
Card(rank='2', suit='spades')
Card(rank='3', suit='spades')
Card(rank='4', suit='spades')
Card(rank='5', suit='spades')

To pick a random card from the deck we can use the Python built-in function random.choice to get a random number from a sequence:

from random import choice
 
>>> choice(deck)
Card(rank='10', suit='clubs')

Note that if the FrenchDeck class does not define both the __len__ and __getitem__ methods, the choice function will not work.

If a collection has no __contains__ methof, the in operator does a sequential scan. Case in point: in works with our FrenchDeck class because it is iterable:

>>> Card('Q', 'hearts') in deck
True
 
>>> Card('7', 'beast') in deck
False

You can also use the build-in sorted() function:

suit_values = dict(spades=3, hearts=2, diamonds=1, clubs=0)
 
def spades_high(card):
	rank_value = FrenchDeck.ranks.index(card.rank)
	return rank_value * len(suit_values) + suit_values[card.suit]
 
for card in sorted(deck, key=spades_high):
	print(card)

Output:

Card(rank='2', suit='clubs')
Card(rank='2', suit='diamonds')
Card(rank='2', suit='hearts')
Card(rank='2', suit='spades')
Card(rank='3', suit='clubs')
...

Recap

Advantages of using Dunder Methods:

  • users of your classes don’t have to memorize arbitrary method names for standard operations (.size() or .lenght() or what to get the number of items?)
  • it’s easier to benefit from the rich Python Standard Library and avoid reinventing the wheel, like random.choice.
  • by implementing the special methods __len__ and __getitme__, the FrenchDeck behaves like a standard Python Sequence with obvious advantages:
    • iteration, slicing from core language features
    • usage of random.choice, reversed, sorted

How Special Methods are used

Dunder methods are meant to be called the the Python Interpreter, not by any users. You don’t write an_object.__len__, but you write len(an_object) and, if an_object is an instance of a user-defined class (so not a built-in class!!), then Python calls the __len__method you implemented.

But the interpreter takes a shortcut when dealing for built-in types like list, str, bytearray, or extension like the NumPy array. Python variable-sized collections written in C include a struct called PyVarObject, which is an ob_size field holding the number of items in the collection. So, if an_object is an instance of one of those built-ins, then len(an_object) retrieves the value of the ob_size field, and this is much faster than calling a method.

Special method call is implicit. For example the statement for i in x: causes the invocation of iter(x), which in turn calls x.__iter__() if it’s available, or x.__getitem__() as it did in our example.

The only special method that is frequently called by user code directly is __init__.

It’s always better to call built-in functions (e.g. len, iter, str, etc) instead of their corresponding special methods because they often provide other services and are faster than those method calls.

Now we’ll see some of the most important uses of special methods:

  • Emulating Numeric Types
  • String representation of objects
  • Boolean value of an object
  • Implementing collections

Emulating Numeric Types

Let’ represent two-dimentional vectors and how to implement math operators:

import math
 
class Vector:
	def __init__(self, x=0, y=0) -> None:
		self.x = x
		self.y = y
 
	def __repr__(self) -> str:
		return f"Vector({self.x!r}, {self.y!r})"
 
	def __abs__(self):
		return math.hypot(self.x, self.y)
 
	def __bool__(self):
		return bool(abs(self))
 
	def __add__(self, other):
		x = self.x + other.x
		y = self.y + other.y
		return Vector(x, y)
 
	def __mul__(self, scalar):
		return Vector(self.x*scalar, self.y*scalar)
 
 
v1 = Vector(2,4)
v2 = Vector(2,1)
 
print("__abs__ usage: ", abs(v1))
print("__add__ usage: ", v1+v2)
print("__mul__ usage: ", v1*3)
  • __abs__ special method is called by abs built-in;
  • __add__ special method is called by + built-in;
  • __mul__ special method is called by * built-in. Output:
__abs__ usage: 4.47213595499958
__add__ usage: Vector(4, 5)
__mul__ usage: Vector(6, 12)

String Representation

The__repr__ special method is called by repr built-in to get the string representation of the object for inspection:

  • print(v1) without __repr__:
    <__main__.Vector object at 0x10881a770>
    
  • print(v1) with __repr__ defined as in the code above:
    Vector(2, 4)
    

The string returned by __repr__ should be unambiguous and, if possible, match the source code necessary to recreate the represented object. In contrast __str__ is called by the str() built-in and implicitly used by the print function. Sometimes string returned by __repr__ is user friendly, so you don’t need to code __str__. If you only implement one of the two special methods, choose __repr__.

Boolean Value of a Custom Type

To determine whether a value x is truthy or falsy, Python applies bool(x), which returns either ´True´ or False.

bool(x) calls x.__bool__ special method. By default, instances of user-defined classes called in bool() function are returned as True unless __bool__ or __len__ methods are implemented. If __bool__ is not implemented, Python tries to invoke x.__len__, and, if that returns zero, it bool(x) returns False; otherwise it returns True.

In our implementation, __bool__returns False if the magnitude of the vector is zero:

v1 = Vector(2,4)
v2 = Vector(0,0)
aList = [1,2,3]
 
>>> bool(v1)
True
 
>>> bool(v2)
False
 
>>> bool(aList)
True

Collection API

The following are the interfaces of the essential collection types in Python. All these classes are ABCs - abstract base classes.

Method names in italic are abstract; the remaining have concrete implementations: The Collection abstract class in the figure unifies the three essential interfaces that every collection should implement:

  • Iterable to support for, unpacking, and other forms of iteration;
  • Sized to support the len()built-in function;
  • Container to support the in operator.

Three very important specializations of Collections are:

  • Sequence, formalizing the interface of built-ins like list and str;
  • Mapping, implemented by dict, collections.defaultdict, etc.;
  • Set, the interface of the set and frozenset built-in types.

Only Sequence is Reversible, because sequences support arbitrary ordering of their contents.

Special Method

Here’s the The Python Language Reference official Python Documentation where more than 80 special method names are listed: https://docs.python.org/3/reference/datamodel.html

Why len is not a Method

In How Special Methods are used previous paragraph, we have already anticipated how len(x) runs differently depending on the type it is applying on. Let’s make a recap of this:

  • it runs very fast when x is an instance of a built-in objects. No method (specifically __len__()) is called for the built-in objects of CPython: the length is simply read from a field in a C struct. Since len() is a common operation applied on built-in classes, it must work efficiently.
  • it calls __len__() for our own custom objects.

Appendix: Further Reading

  • The Data Model chapter of The Python Language Reference.
  • Python in a Nutshell, 3rd ed. by Alex Martelli et al.
  • Python Essential Reference, 4th ed. and Python Cookbook, 3rd ed. by David Beazley.
  • Here’s an interesting Stack Overflow thread that gathers many questions about “functions vs methods” in Python.