Iteration is a core concept for processing data in Python: programs apply computations to data series. If the data doesn’t fit into memory, we need to fetch the items lazily–one at a time and on demand. That’s what an iterator does. This chapter shows how the Iterator Design Pattern is built into the Python language so you never need to code it by hand.
Every standard collection in Python is iterable. An iterable is an object that provides an iterator, which Python uses to support operations like:
- for loop
List,dict, andsetcomprehensions- unpacking assignments
- Construction of collection instances
Iterables vs Iterators
An iterable is an object that can be “iterated over.” This means you can get its elements one by one. Common examples of built-in iterables in Python include:
- Lists:
[1, 2, 3]- Tuples:
('a', 'b', 'c')- Strings:
"hello"- Dictionaries:
{'key': 'value'}- Sets:
{10, 20, 30}- File objects
The defining characteristic of an iterable is that it has a special method called
__iter__(). When you calliter()on an iterable (that’s the same as callingobject.__iter__()), it returns an iterator object.An iterator is an object that represents a stream of data. It is the object that actually performs the iteration. It has a state and remembers its position in the iteration process. The key methods of an iterator are:
__iter__(): This method returns the iterator object itself. This is what makes an iterator also an iterable.__next__(): This method returns the next item from the stream. When there are no more items to return, it raises aStopIterationexception. This exception signals to theforloop (or other iteration constructs) that the iteration is complete.
A Sequence of Words
Let’s introduce a Sentence class that implements the sequence protocol (to adheres to the sequence protocol, a class must implement __getitem__() and __len__() methods. TODO: I CANNOT FIND THE SOURCE OF IT). This class is iterable because all instances of sequence class in Python are iterable, as we’ve seen in Collection API.
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __getitem__(self, index):
return self.words[index]
def __len__(self):
return len(self.words)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)- The
__init__method takes a string, uses a regular expression (RE_WORD.findall) to find all the words, and stores them in a list calledself.words. - The
__getitem__method allows you to access words by their index, just like with a list (e.g.,s[0]). This method is what makes the class a sequence. - The
__len__method returns the number of words, which completes the sequence protocol. - The
__repr__method creates a clean string representation of the object, usingreprlib.reprto shorten the text if it’s too long.
Let’s see how it is used:
>>> s = Sentence('"The time has come," the Walrus said,')
>>> s
Sentence('"The time ha... Walrus said,')
>>> for word in s:
... print(word)
The
time
has
come
the
Walrus
saidSince Sentence is also a sequence, it’s possible to extract words by index:
>>> s[0]
'The'sequence instances are also Iterable. Let’s see why.
Why Sequences are Iterable: the iter() function
Whenever Python needs to loop over an object, it calls the built-in function iter().
The iter() function’s behavior depends on the object it’s given:
- If the object has an
__iter__method,iter()calls it to get an iterator. - If
__iter__is not found, Python checks for an__getitem__method. If present, it creates an iterator that fetches items by index, starting from 0. AnIndexErrorsignals the end of the iteration. - If that fails, Python raises
TypeError, usually saying'C' object is not iterable.
This is why all sequences are iterable: because all sequences instances have __getitem__().
While __getitem__ makes an object iterable for backward compatibility, it’s recommended to implement __iter__because that’s the standard for modern Python. The isinstance(obj, abc.Iterable) check will only return True if the object has an __iter__ method: having just __getitem__ is not enough:
class Spam:
def __getitem__(self, i):
print('->', i)
raise IndexError()Then:
>>> spam_can = Spam()
>>> iter(spam_can)
<iterator object at 0x10a878f70>
>>> list(spam_can)
-> 0
[]
>>> from collections import abc
>>> isinstance(Spam(), abc.Iterable)
FalseIf we defined the special method __iter__() in the class Spam, the check isinstance(Spam(), abc.Iterable) would return True.
This flexible approach is an example of duck typing: an object is iterable if it behaves like one (either by having __iter__ or __getitem__). So, Spam() is not recognized as iterable, even if it behaves as such.
This approach contrasts with a more formal approach called goose-typing: an object is considered iterable in this method if it implements the __iter__ method. No subclassing or registration is required, because abc.Iterable implements the __subclasshook__. This is demonstrated using isinstance and abc.Iterable:
class GooseSpam:
def __iter__(self):
passThen:
>>> from collections import abc
>>> issubclass(GooseSpam, abc.Iterable)
True
>>> isinstance(GooseSpam(), abc.Iterable)
TrueThe most accurate way to check if an object is iterable is to use a
try...except TypeErrorblock arounditer(obj).
Using iter() with a Callable
The iter() function has a second, less common form that takes two arguments: a callable and a sentinel value.
- The callable (e.g., a function) is repeatedly called with no arguments, and it produces values.
- The iteration stops when the callable returns the sentinel value. The sentinel itself is not included in the iteration.
This is a powerful way to create an iterator for tasks like reading data in chunks until a specific termination value is returned. The example of rolling a die until a 1 is rolled shows this clearly:
from random import randint
def d6():
return randint(1, 6)
d6_iter = iter(d6, 1)Then:
>>> for roll in d6_iter:
... print(roll)
2
2
5This loop will print random numbers from 2 to 6 until a 1 is generated.
Iterable vs Iterators
An iterable is any object that can produce an iterator when passed to the built-in
iter()function. In practice, an object is considered iterable if:
- It implements an
__iter__()method that returns an iterator. - Or, it behaves like a sequence by implementing
__getitem__()with support for zero-based indexes. In that case,__iter__()isn’t strictly required for the object to be iterable (though the goose-typing approach doesn’t recognize this shortcut).
Let the relationship between iterables and iterators be clear:
Python obtains iterators from iterables.
Here’s a for loop iterating over a sequence, specifically a str. The str 'ABC' is the iterable, and the iterator is behind the scene (we don’t explicitly see it):
>>> s = 'ABC'
>>> for char in s:
... print(char)
A
B
CA for loop automatically handles the iteration process. The code for char in s: on an iterable like the string 'ABC' is the same as this manual while loop:
s = 'ABC'
string_iterator = iter(s) # Get the iterator from the iterable
while True:
try:
print(next(string_iterator)) # Get the next item
except StopIteration: # This exception signals the end
del string_iterator
breakiter(s)creates the iterator.next(it)retrieves the next item.StopIterationis raised by the iterator when there are no more items.delpurpose is only to release the reference tostring_iterator
The Iterator Interface. The standard interface for an iterator requires two methods:
__next__: Returns the next item or raisesStopIterationwhen finished.__iter__: Returns the iterator itself (self), which makes an iterator also iterable. This is why you can pass an iterator directly to aforloop.
This interface is formalized by:
- the
collections.abc.Iteratorabstract base class (ABC), which declares the__next__abstract method - and subclass
Iterable, which declares the__iter__abstract method:
Thanks to a __subclasshook__ method, isinstance(obj, abc.Iterator) can determine if an object is an iterator just by checking if it has both __iter__ and __next__ methods. It does not need to be a true subclass of Iterator. Here’s the source code for collections.abc.Iterator:
class Iterator(Iterable):
__slots__ = ()
@abstractmethod
def __next__(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
raise StopIteration
def __iter__(self):
return self
@classmethod
def __subclasshook__(cls, C):
if cls is Iterator:
return _check_methods(C, '__iter__', '__next__')
return NotImplemented__subclasshood__supports structural type checks (ch.13 TODO) withisinstanceandissubclass.
Going back to Sentence class, let’s see how the iterator is built by iter() and consumed by next():
>>> s3 = Sentence('Life of Brian')
>>> it = iter(s3)
>>> it
<iterator at 0x1051065f0>
>>> next(it)
'Life'
>>> next(it)
'of'
>>> next(it)
'Brian'
>>> next(it)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[19], line 1
----> 1 next(it)
StopIteration:
>>> list(it)
[]
>>> list(iter(s3))
['Life', 'of', 'Brian']Remember that Sentence is iterable thanks to the special treatment the iter() built-in gives to sequences (thanks to __getitem__ method ).
A key characteristic of iterators is that they are stateful and cannot be reset. Once an iterator is exhausted (it has raised StopIteration), it stays that way. If you need to iterate over the same data again, you must create a new iterator from the original iterable. Calling iter() on an iterator will just return the same (potentially exhausted) iterator.
Sentence Classes with __iter__
Let’s look at different ways to apply the standard iterator protocol to Sentence class iterable:
- by implementing the Iterator Design Pattern
- by using generator functions.
Sentence Take #2: A Classic Iterator
This version of Sentence uses the classic Iterator design pattern. It separates the iterable from the iterator.
The Sentence class itself is an iterable because it implements an __iter__ method that returns a new SentenceIterator instance each time it’s called. That’s how an iterable and an iterator are related:
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return f'Sentence({reprlib.repr(self.text)})'
def __iter__(self):
return SentenceIterator(self.words)
class SentenceIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index += 1
return word
def __iter__(self):
return self__init__: The iterator is initialized with a list of words and anindexto track its position.__next__: This method fetches the next word from the list. If theindexgoes out of bounds, it raises aStopIterationexception to signal the end of the iteration.__iter__: This method returnsself, meaning an iterator is also an iterable of itself. Implementation of this dunder method is not actually needed for this example to work, but it’s the right thing to do because iterators are supposed to implement both__next__and__iter__methods, so that our iterator passes theissubclass(SentenceIterator, abc.Iterator)test.
This approach involves manual state management (e.g., tracking the index), which is a lot of work.
Don’t Make the Iterable an Iterator for Itself
A common mistake is to combine the iterable and the iterator into a single class. This is considered an antipatternbecause it prevents multiple independent traversals. For example, if you try to iterate over the same object in two different loops, both loops would share the same iterator state, leading to unexpected behavior.
Think of a playlist as an iterable collection of songs. You might want to create two different iterators from it: one that plays the songs in order, and another that shuffles them. If the playlist itself is also the iterator, you can’t have both traversals at the same time, since they would interfere with each other’s position in the sequence.
Sentence Take #3: A Generator Function
A Generator Function is the most “Pythonic” way to make a class iterable. This approach is more concise than creating a separate iterator class. Any function that contains the yield keyword is a generator function. When called, it returns a generator object that automatically implements the iterator protocol.
Here is the code for the Sentence class that uses a generator function for its __iter__ method:
class Sentence:
#...
def __iter__(self):
for word in self.words:
yield wordThis is a much simpler way to implement iteration. The for loop and the yield keyword handle the iteration state and the StopIteration exception automatically, removing the need for a separate iterator class.
Alex Martelli pointed out that a more correct alternative for the body of __iter__() method would be return iter(self.words), as self.words is already an iterable, and he’s right. However, the for loop with yield is used here to introduce the generator function syntax.
How a Generator Works
A generator function acts as a generator factory, returning a generator object when called. The key feature of a generator is that its execution can be paused and resumed.
>>> def gen_123():
... yield 1
... yield 2
... yield 3
>>> gen_123
<function gen_123 at 0x...>
>>> gen_123()
<generator object gen_123 at 0x...>The process of a generator is:
- When you call
next()on a generator object, the function’s code runs until it reaches ayieldstatement. The value afteryieldis then returned, and the function’s state is paused. - The next time
next()is called, the function resumes from where it left off. - When the function’s code finishes and there are no more
yieldstatements, the generator object raises aStopIterationexception, which signals the end of the sequence.
Here is a simple example that illustrates this behavior:
def gen_AB():
print('start')
yield 'A'
print('continue')
yield 'B'
print('end.')Then:
>>> for c in gen_AB():
... print('-->', c)
start
--> A
continue
--> B
end.This example demonstrates that the print statements within the generator function are executed at different points during the loop. The for loop implicitly calls g=iter(gen_AB()) and then calls next() at each iteration, which resumes the function’s execution. This confirms that a generator function’s state is saved between yield calls. When the generator function runs to the ned, the generator object raises StopIteration. The for loop machinery catches that exception, and the loop terminates cleanly.
Now, it should be clear how Sentence.__iter__ in the previous example works:__iter__ is a generator function which, when called, builds a generator object that implements the Iterator interface, so the SentenceIterator class is no longer needed.
A key distinction is that a generator yields values, it does not return them in the usual sense. A return statement in a generator function’s body causes the generator to raise StopIteration.