A Diagnosis Of Parallel.Foreach

In today’s article, we will see some interesting facts about the “Parallel” class in C# and .Net framework.

Why do we need this?

In today’s ever-changing world of technology where users want access to their data as fast as possible, we always know what techniques to use to make our code optimized and run faster. In c# one of the most popular ways to manipulate multiple data is the use of Parallel class and after this article, you will have a bit better understanding of Parallel class and its best use case.

Let’s get started.

Let’s create a .net core console application.

After it adds the following package in your project using NuGet package manager.

“BenchmarkDotNet”

A diagnosis of Parallel Foreach

Now for our first performance test. We will create a class “ParallelOperations” and add two following methods to it.

[Benchmark]
public int[] NormalFor() {
    var arr = new int[1_000_000];
    for (int i = 0; i < 1_000_000; i++) {
        arr[i] = i;
    }
    return arr;
}

The above method notices the very first line, which is an attribute to make our code understand that this is a method that we need to run the performance benchmarks on.

As for the code, we are just iterating from 0 to 1 million and assigning those values in an array and then returning an array.

Now let’s take a look at the second method,

[Benchmark]
public int[] ParallelForEach() {
    var arr = new int[1_000_000];
    Parallel.For(0, 1_000_000, i => {
        arr[i] = i;
    });
    return arr;
}

Again, this method has a benchmark attribute to it so we can see its performance as well.

As for the code, we are doing the same thing as the previous method but rather than a traditional for loop we are using the Parallel.For.

So, our entire class will look something like this,

A diagnosis of Parallel Foreach

Now to run this class as a benchmark, we need to go to the Program.cs file and add the following line.

var summary = BenchmarkRunner.Run<ParallelOperations>();

Our Program class looks like this now,

A diagnosis of Parallel Foreach

Now make sure to run the project in the release mode otherwise you will get the error in the debug mode.

**Beware, Results may surprise you**

After running the above code, here is the output.

A diagnosis of Parallel Foreach

As you can see in the mean column, the normal for loop gets executed much faster than the Parallel.For method.

This clearly suggests that performance might not improve just because you used parallelization. There are multiple factors that execution of Parallel.For depends upon such as quality of your CPU, how many threads your CPU has. In this case, I have a pretty decent CPU and thread cores.

In executing the code inside Parallel.For, the operation will be segmented into multiple buckets to be run and then it needs to be aggregated back and there’s an overhead execution time in that.

So don’t just assume that because something is parallel it is actually faster or better running.

Now let’s test another scenario, we will try to call an API to get its data, deserialize it using Newtonsoft.Json, and will try to call it using For and Parallel. For to compare the performances. Calling an API, deserializing it is quite a realistic use case in real-world projects.

So, let’s see our code,

We will create a new class “ApiParallelOperations” in which we will add a private method to call the API and it will not be the benchmark method.

public async Task < int > GetData(HttpClient http) {
    List < UserModel > users = new List < UserModel > ();
    string url = "https://localhost:44363/api/users";
    var res = await http.GetAsync(url);
    if (res.IsSuccessStatusCode) {
        var stringResponse = await res.Content.ReadAsStringAsync();
        users = JsonConvert.DeserializeObject < List < UserModel >> (stringResponse);
    }
    return users.Count;
}

Note
API is called in the above code is running on my pc.

Now let’s see our benchmark methods.

[Benchmark]
public async Task < List < int >> NormalForApi() {
    var list = new List < int > ();
    var tasks = Enumerable.Range(0, 1000).Select(_ => new Func < Task < int >> (() => GetData(httpClient))).ToList();
    foreach(var item in tasks) {
        list.Add(await item());
    }
    return list;
}

In the above code, we have just created a list of tasks (1000) and by wrapping the GetData method in new Func we make sure that it will not be called immediately and then below it inside the foreach(for) when we await the method while adding it to the list it gets called.

Now our parallel method will be,

[Benchmark]
public List < int > ParallelApi() {
    var list = new List < int > ();
    var tasks = Enumerable.Range(0, 1000).Select(_ => new Func < int > (() => GetData(httpClient).GetAwaiter().GetResult())).ToList();
    Parallel.For(0, tasks.Count, i => {
        list.Add(tasks[i]());
    });
    return list;
}

In here as well we are doing pretty much the same thing, the only difference is we are using GetAwaiter().GetResult() while creating tasks list because Parallel.For doesn’t allow async-await keyword.

So, our entire class looks something like this,

A diagnosis of Parallel Foreach

Now let’s run the code and see the results.

A diagnosis of Parallel Foreach

Now here as we can see Parallel. For is clearly faster in this case.

Note
Time performance will differ on each user’s system because API was running on my local system and spiking the resources of my system.

In a real-world scenario, especially in a microservices environment, your service will most likely be running in a Kubernetes cluster. So for this scenario, there is recommended change below,

[Benchmark]
public List < int > ParallelApiWithMaxDegreeOfParallelism() {
    var list = new List < int > ();
    var tasks = Enumerable.Range(0, 1000).Select(_ => new Func < int > (() => GetData(httpClient).GetAwaiter().GetResult())).ToList();
    Parallel.For(0, tasks.Count, new ParallelOptions() {
        MaxDegreeOfParallelism = 4
    }, i => {
        list.Add(tasks[i]());
    });
    return list;
}

In the above method, we have limited the max number of parallelism your code can run.

This is basically to make sure that your code doesn’t bottleneck the cluster or the system where your code is running.

Now let’s run the code and see the difference

A diagnosis of Parallel Foreach

So as you can see the timings make sense with what we have coded.

NormalForApi uses the traditional for loop to call the http service and it takes the longest

ParallelApi uses the Parallel. For loop with unlimited number of parallelism and executes the same amount of http service calls in much less time.

ParallelApiWithMaxDegreeOfParallelism uses the Parallel. For loop with the maximum number of parallelism set as 4 and as you can see the time to execute the same amount of services is slightly more than ParallelApi method but it is still better performing than traditional for loop.

Summary

In today’s article, we have seen the uses of Parallel. For and compared its performance with traditional for loop in different scenarios.

If you wish to see the code, please click here!