Yield Keyword in C#

Prasad Raveendran
1y
3.8k
0
1
25
Blog

Introduction

In C#, the yield keyword is used in combination with an iterator to create custom iterator blocks. An iterator is a special kind of method that allows you to iterate over a collection of items, generating values on-the-fly, and returning them one by one. The yield keyword simplifies the implementation of custom iterators, making the code more readable and efficient.

The entire source code can be downloaded from GitHub

The general syntax of an iterator method using the yield keyword is as follows.

public IEnumerable<ElementType> CustomIterator()
{
    // Initialization (optional)

    foreach (var item in collection)
    {
        // Pre-yield logic (optional)

        yield return value;

        // Post-yield logic (optional)
    }

    // Cleanup (optional)
}

Let's break down the parts of the iterator method.

The return type should be IEnumerable<T> or IEnumerator<T>, where T is the type of elements that will be yielded.
The method body consists of a series of yield return statements that generate and return values one by one.
The yield return statement produces the next element of the iteration sequence.
The method can also include a yield break statement, which immediately ends the iterator without any more yields.

Example

Before delving into the concept of "yield" and its applications, let's first explore the workings of regular iteration.

namespace YieldTutorial
{
    internal class ProcessLoop
    {

        public static void DisplayRecords()
        {
            //all records will be loaded into studList, then process each record in this collection.
            var studList= GetStudents(1_000_000);
            foreach (var record in studList)
            {
                if (record.Id < 1000)
                    Console.WriteLine($"Id :{record.Id}, FirstName : {record.FirstName}, LastName : {record.LastName} ");
                else
                    break;
            }
        }        

        private static IEnumerable<Student> GetStudents(int upperLimit)
        {
            var studList =new List<Student>();
            for(int i = 0; i < upperLimit; i++)
            {
                studList.Add(new Student
                {
                    Id = i,
                    FirstName = $"FName{i}",
                    LastName = $"LName{i}"
                });
            }
            return studList;
        }
        
    }
}

The DisplayRecords method does the following.

Calls the GetStudents method to fetch a collection of student records (maximum of 1 million records).
Iterates each record in the studList using a foreach loop.
Within the loop, it checks if the Id of the current student is less than 1000.
If the condition is true, it prints the student's Id, FirstName, and LastName.
If the condition is false (meaning the Id is 1000 or greater), it breaks out of the loop.

The GetStudents method is a private method responsible for generating student records up to a specified upper limit. Based on the loop index, it creates a list of students and assigns each student a unique Id, FirstName, and LastName.

Overall, the DisplayRecords method fetches student records and prints the details of students whose Id is less than 1000.

It's important to note that this implementation of iteration eagerly generates all the student records into the studList before starting the iteration. In scenarios where there are a significant number of records or infinite sequences, this approach may lead to unnecessary memory consumption. The yield keyword could be used to lazily generate the records, improving memory efficiency and performance in certain cases.

Now, let us look at the example below, which has used the "yield" keyword.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace YieldTutorial
{
    internal class ProcessLoopWithYield
    {

        public static void DisplayRecordsWithYield()
        {
            //all records will be loaded into studList, then process each record in this collection.
            var studList = GetStudentByYield(1_000_000);
            foreach (var record in studList)
            {
                if (record.Id < 1000)
                    Console.WriteLine($"Id :{record.Id}, FirstName : {record.FirstName}, LastName : {record.LastName} ");
                else
                    break;
            }
        }

        private static IEnumerable<Student> GetStudentByYield(int upperLimit)
        {

            for (int i = 0; i < upperLimit; i++)
            {
                yield return new Student
                {
                    Id = i,
                    FirstName = $"FName{i}",
                    LastName = $"LName{i}"
                };
            }
        }

    }
}

Here's how the code works:

The DisplayRecordsWithYield method is like the previous version. It calls the GetStudentByYield method to fetch a collection of student records (up to 1 million records) and then iterates over each record using a foreach loop.
The GetStudentByYield method is where the yield keyword comes into play. Instead of eagerly generating all student records into a list as before, it now uses the yield return statement to lazily produce each student record on-the-fly.
Inside the for loop in GetStudentByYield, for each iteration, a new Student object is created with a unique Id, FirstName, and LastName. The yield return statement returns this student record, but it doesn't terminate the method. Instead, it will pause the execution of the method and yield the current student record to the caller of the iterator (in this case, the foreach loop in DisplayRecordsWithYield).
As the foreach loop in DisplayRecordsWithYield iterates over the returned student records, it prints the details of each student whose Id is less than 1000, just like in the previous version.

The key difference is that, with the introduction of the yield keyword, the GetStudentByYield method now generates and returns student records only when they are requested by the foreach loop. This approach significantly reduces memory consumption, as it doesn't pre-load all student records into memory. Instead, it generates them on-the-fly, one at a time, as the loop requests them.

This lazy and memory-efficient iteration can be beneficial when working with large datasets or infinite sequences, where loading all data into memory at once might not be feasible.

The benchmarking is done using two methods.

DisplayRecords: This method demonstrates the previous approach, where all student records are eagerly generated and loaded into a list before the iteration. The foreach loop then iterates over the list to display student details. This method serves as the baseline for comparison.
DisplayRecordsWithYield: This method showcases the new approach that utilizes the yield keyword to lazily generate and return student records one at a time during iteration. This approach should have better memory efficiency.

To execute the benchmark and compare the two methods, you need to use the BenchmarkDotNet library and run the benchmark. This library will automatically execute both methods multiple times and provide performance statistics, including memory usage.

Benchmark code

using BenchmarkDotNet.Attributes;

namespace YieldTutorial
{
    [MemoryDiagnoser]
    public class YieldBenchMark
    {
        [Benchmark]
        public void DisplayRecords()
        {
            //all records will be loaded into studList, then process each record in this collection.
            var studList = GetStudents(1_000_000);
            foreach (var record in studList)
            {
                if (record.Id < 1000)
                    Console.WriteLine($"Id :{record.Id}, FirstName : {record.FirstName}, LastName : {record.LastName} ");
                else
                    break;
            }
        }

        [Benchmark]
        public void DisplayRecordsWithYield()
        {
           
            var studList = GetStudentByYield(1_000_000);
            foreach (var record in studList)
            {
                if (record.Id < 1000)
                    Console.WriteLine($"Id :{record.Id}, FirstName : {record.FirstName}, LastName : {record.LastName} ");
                else
                    break;
            }
        }


        private IEnumerable<Student> GetStudents(int upperLimit)
        {
            var studList = new List<Student>();
            for (int i = 0; i < upperLimit; i++)
            {
                studList.Add(new Student
                {
                    Id = i,
                    FirstName = $"FName{i}",
                    LastName = $"LName{i}"
                });
            }
            return studList;
        }

        private IEnumerable<Student> GetStudentByYield(int upperLimit)
        {
            
            for (int i = 0; i < upperLimit; i++)
            {
                yield return new Student
                {
                    Id = i,
                    FirstName = $"FName{i}",
                    LastName = $"LName{i}"
                };
            }           
        }
    }
}

When benchmarking code, it's essential to compile and run your code in Release mode rather than Debug mode. The reason for this is that the Debug mode introduces additional optimizations and metadata, which can significantly impact the performance metrics, making the benchmark results inaccurate and less reliable.

Bench Mark in yield

You can see a massive difference in memory allocation from this screenshot above.

It's always good to benchmark and compare different implementations in scenarios where performance and memory efficiency are critical considerations.