The Code Bits: Introduction to Generators in Python

In this post, we will learn what generators are, how to create them, how they work and how to use them in Python.

Generator function

Generators are functions that allow us to create iterators in Python. They provide a convenient, simple and memory-efficient approach to creating iterators. These are useful when dealing with large amounts of data.

Before starting with generators, it would be good to understand how a for-loop works in Python. It will be also be useful to know what iterable, iterator and the iterator protocol are.

An example: Generate even numbers using a generator function

Let us start with a simple example. We will be creating a generator function which generates a specific count of even numbers starting from a given value. We will be using this same example throughout this post.

def generate_even_numbers(start, count):     # Make sure that the first number is even.     start = start if start % 2 == 0 else start + 1      while count > 0:         yield start         start += 2         count -= 1 

Note that we used a yield statement within the function body to return our data. If you don’t understand it right away, no need to worry, we will get to its roots soon enough!

Let us see how we would use this generator function in a for-loop.

>>> generator_iterator = generate_even_numbers(0, 3) >>> for num in generator_iterator: ...     print(num) ... 0 2 4 

As you can see, we were able to use the value returned by the generator function in a for-loop, so it must have been an iterable.

Generator function returns a generator iterator

Let us check the type of the value returned by the generator function.

>>> generator_iterator = generate_even_numbers(0, 3) >>> type(generator_iterator) <class 'generator'> 

Okay, so the value returned is of type ‘generator’. This value is usually referred to as the generator iterator, even though the term generator is sometimes used interchangeably to refer to both the generator function as well as the generator iterator.

Now let us confirm that the generator_iterator is indeed an iterator. As per the iterator protocol, an iterator must:

  1. return its elements one by one when next() method is called on it. When all the elements are exhausted, it must raise StopIteration.
  2.  >>> generator_iterator = generate_even_numbers(0, 3) >>> next(generator_iterator) 0 >>> next(generator_iterator) 2 >>> next(generator_iterator) 4 >>> next(generator_iterator) Traceback (most recent call last):   File "", line 1, in  StopIteration 
  3. return itself when iter() method is called on it.
  4.  >>> generator_iterator = generate_even_numbers(0, 3) >>> generator_iterator <generator object generate_even_numbers at 0x10cb431b0> >>> iter(generator_iterator) <generator object generate_even_numbers at 0x10cb431b0> 

So now we know that the generator function is a convenient way to create an iterator. But what makes this function different from our normal methods in Python? How does it return an iterator? The answer lies in the yield statement.

How does the generator function work?

Let us revisit our generator example, now with some prints so that we can clearly understand how it works.

def generate_even_numbers(start, count):     print("In the generator function")      # Make sure that the first number is even.     start = start if start % 2 == 0 else start + 1      while count > 0:         print("[count:{}] Hello! Before I yield....".format(count))         yield start         print("[count:{}] Hey! I am back!!".format(count))         start += 2         count -= 1     print("[count:{}] That's all I have got...".format(count)) 

Now let us see its usage.

>>> generator_iterator = generate_even_numbers(3, 2) >>> for num in generator_iterator: ...     print("Processing even number: {}".format(num)) ... In the generator function [count:2] Hello! Before I yield.... Processing even number: 4 [count:2] Hey! I am back!! [count:1] Hello! Before I yield.... Processing even number: 6 [count:1] Hey! I am back!! [count:0] That's all I have got... 

There are a couple of things you should notice:

  1. Lazy evaluation
  2. How the yield statement works

Let us discuss these.

Lazy evaluation

Calling the function generate_even_numbers(3, 2) just returns a generator iterator. It does not start executing the function. This is called lazy evaluation. They start executing and yielding values only when it is needed, that is, when next() is called. As a result, only one element of the iterator is held in memory at a time. This makes them memory efficient and hence useful when dealing with large amounts of data.

How does the yield statement work?

By now, you may have gathered that the only special thing about the generator function with respect to normal functions is that they use yield to return their values. However, the yield statement is very much different from a normal return statement.

The yield statement makes a function a generator.

When next() is called on the generator iterator, the generator function executes till a yield statement is encountered. When the yield statement is reached, the execution state of the function is remembered (including the local variables and any try statements) and the function’s execution is temporarily suspended. The value associated with the yield statement is returned by the next() method.

When next() is called again, the generator function resumes execution. The saved local execution state is recollected and the statement next to yield is executed first. Then it continues executing till the next yield statement is encountered. Thus goes the process.

Finally, if there is no more yield in the generator function when next() is called, it ends up raising StopIteration. At this point, the for-loop would exit.

A simpler example: Generator function to yield some strings

Let us make sure that all of that is clear with a simpler example.

def generate_hello_world():     print("....Started executing the generator function")     yield "Hello"     print("....Between yields!")     yield "World"     print("....Done with yields!") 

Let us see how to use the iterator returned by the generator function using next() method.

 >>> """ We get the generator iterator """ >>> generator_iterator = generate_hello_world()  >>> """ When next() is called, the function executes till the first yield statement """ >>> next(generator_iterator) ....Started executing the generator function 'Hello'  >>> """ When next() is called again, it picks up where it left off and executes till the next yield statement """ >>> next(generator_iterator) ....Between yields! 'World'  >>> """ When there are no more yields, calling next() raises StopIteration """ >>> next(generator_iterator) ....Done with yields! Traceback (most recent call last):   File "", line 1, in  StopIteration 

Now let us see how to use the generator function in a for-loop.

>>> for word in generate_hello_world(): ...     print(word) ... ....Started executing the generator function Hello ....Between yields! World ....Done with yields! >>> 

On a side note, pay attention to how we did not use a separate variable to hold the generator iterator as in our previous examples. We directly called the generator function with the for-loop. This is doable because of how a for-loop works in Python. The expression following “in” is evaluated only once. This expression is expected to result in an iterable. In this case, it will result in the generator iterator. Then the method iter() is called on the iterable to get the iterator associated with it. Then next() is called repeatedly on the iterator until the iterator is exhausted.

Conclusion

In this post, we learned how to create generator functions in Python, how they work and how to use them.

Planet Python

Leave a Reply

Your email address will not be published. Required fields are marked *