top of page

Crash Prevention using the 'yield' Keyword in Python

Introduction


We all know that functions terminate with the keyword return.

def sum(a, b):
	return a + b

But another such terminator exists in Python, allowing us to return values lazily, rather than all at once (like return does).


Today, we'll explore the yield keyword and why it can prove a worthy asset in programming.


Reading Files


Suppose we were reading a file called "file.txt."

def main():
	myFile = fileToList("file.txt")

	# modifying data
	for data in myFile:
		# some modification is performed

def fileToList(file):	
	fileList = []
	with open(file, 'r') as f:
		for line in f:
			fileList.append(line.rstrip())
	return fileList


if __name__ == "__main__":
	main()

Pretty basic. FileToList iterates over a file and appends each line of the file to a list (stripping each line of any trailing whitespace and newline characters).


Finally, in main, the list that was returned is looped over and modified.


But what if the size of "file.txt" (or, more realistically, some database) grows to perhaps 100,000,000,000 lines in length?

Crash.


Your computer may struggle to add such a large number of items to a single list, making it, at the very least, slow and inefficient.


Using Yield


As mentioned before, yield returns items lazily, meaning the function doesn't immediately stop the function when an item is returned.


Instead, it is called over and over until the last yield is executed (usually through some kind of loop / iterator) at which point the program terminates or "returns" in the traditional sense.


Let's look at a revised version of our program using yield:

def main():
	# modifying data
	for data in fileToList("file.txt"):
		# some modification is performed


def fileToList(file):	
	with open(file, "r") as f:
		for line in f:
			yield line.rstrip()


if __name__ == "__main__":
	main()

Now, we're returning one line at a time using yield. In main, while it may seem like fileToList("file.txt") is returning a list, in reality, yield is creating a new, iterable generator object that we don't really have to worry about.


What's important is, not only is much of our previous code abstracted away, but also we won't have to worry about crashes or inefficiencies with large datasets because yield returns one item at a time!


Final Thoughts


Hopefully you can now find some optimal use for yield in your own programming endeavors. Thanks for reading!

コメント


bottom of page