“A generator in Python is a way to create a sequence of values that can be generated once at a time instead of all at once”
This can be useful when working with large amounts of data or when you don’t want to store all values in memory at the same time.
Let us learn through an example which clears up the concept of generators. Let’s say you have a large file containing a million lines of text and you want to print out each line which contains a specific word given by the user. We can do it with this approach.
filename = 'test.txt'
word = 'apple'
with open(filename) as file:
contents = file.readlines()
for line in contents:
if word in line:
print(line)
contents
word
is searched in each line of the contents.The problem with this solution arises when the file is too large. In the above code, we are trying to store all the contents of the file inside a single variable but where memory management is important, this code is very inefficient.
We will see a code which utilizes a generator to solve the problem.
def find_lines(filename, word):
with open(filename) as file:
for line in file:
if word in line:
yield line
lines_with_word = find_lines('test.txt','apple')
for line in lines_with_word:
print(line)
In this code, if the word is present then the line is yielded by the generator using the yield keyword. The above code will read the file line by line and only yield the lines that contain apple.
What exactly is yield?
A yield keyword allows a function to generate a sequence of values, one at a time without losing their internal state. Thus a function with a yield keyword is a generator function.
When a function encounters yield statement, it temporarily suspends it’s execution and returns the yielded value as a result.Unlike a regular return statement the function’s state is saved allowing it to resume execution from where it left off when the next value is requested.
def generate_numbers():
yield 1
yield 2
yield 3
numbers = generate_numbers()
In this example, the generate_numbers() function is a generator function. It uses yield to define three points in the code where the function will yield a value. When generate_numbers() is called, it doesn’t execute the function body immediately. Instead, it returns a generator object.
To retrieve the values generated by the generator, we can use a for loop or the next() function:
print(next(numbers)) # Output: 1
print(next(numbers)) # Output: 2
print(next(numbers)) # Output: 3
Here, everytime the yield statement is ran through during iteration or when next() is called, the generator function’s execution is paused and a specified value is yielded and the generator’s state is saved. The present state of the generator will be the present number it is working on. When the next value is requested, the generator resumes execution from where it left off, continuing the loop or subsequent statements.
Summary:
Generators are a powerful feature in Python for working with large datasets or files. They provide a memory efficient and on-demand generation of values making them suitable for various tasks such as data handling/processing and efficient iteration.