SIMD, A Parallel Processing At Hardware Level In C#

SIMD, A Parallel Processing At Hardware Level In C#

SIMD is an acronym for Single Instruction Multiple Data.

It is a feature that lets us perform an operation on multiple chunks of data in one execution on a single core of CPUs like you see in the image:

SIMD, A Parallel Processing At Hardware Level In C#

So for example if you want to sum elements in two arrays one by one and make a new array as the result you can use SIMD instead of using traditional ways like LINQ or For loop, and because it is located in the hearts of CPUs. It does its job at hardware level so, in comparison to traditional ways, more performance can be gained without doing unnecessary multi-threading, but as you have already known, in software developing we can't say we prefer SIMD to multi-threading! because all of these patterns have their own usage and their own cons and pros.

SIMD does its job with bigger registers in hearts of CPUs which can hold more bits to execute at one execution cycle, e.g. 256 bit of a register can hold 8x 32 bits of data. and CPUs know how to handle these registers by their **the instruction of set extensions (e.g. SSE and AVX extensions)**, and as I know both Intel and AMD CPUs have had these features for many years. 

C# gives us some accelerated SIMD types like Vector4,3,2, Matrix2x3, Plane, etc. and each of them can do some specific operations.

Note that if you want to use SIMD operations you have to use **RyuJIT** compiler which is included in .NET Core and in .NET Framework 4.6 and later.

You'd better know that this feature has already been implemented in other languages too. So let me wrap it up with an example in C#.

Test-Case: we want to sum 10000 elements of two arrays one by one and put the result in a new array. we do this with 3 different ways and print out the DotnetBenchmark.

[MemoryDiagnoser]
public class Counter
{
    private readonly int[] _left;
    private readonly int[] _right;

    public Counter()
    {
        _left = Faker.BuildArray(10000);
        _right = Faker.BuildArray(10000);
    }

    [Benchmark]
    public int[] VectorSum()
    {
        var vectorSize = Vector<int>.Count;
        var result = new Int32[_left.Length];
        for (int i = 0; i < _left.Length ; i += vectorSize)
        {
            var v1 = new Vector<int>(_left, i);
            var v2 = new Vector<int>(_right, i);
            (v1 + v2).CopyTo(result, i);
        }
        return result;
    }

    [Benchmark]
    public int[] LinQSum()
    {
        var result = _left.Zip(_right, (l, r) => l + r).ToArray();
        return result;
    }

    [Benchmark]
    public int[] ForSum()
    {
        var result = new Int32[_left.Length];  
        for (int i = 0; i <= _left.Length - 1; i++)
        {
            result[i] = _left[i] + _right[i];
        }
        return result;
    }
}

public static class Faker
{
    public static int[] BuildArray(int length)
    {
        var list = new List<int>();
        var rnd = new Random(DateTime.Now.Millisecond);     
        for (int i = 1; i <= length; i++)
        {
            list.Add(rnd.Next(1,99));
        }
        return list.ToArray();
    }
}

SIMD, A Parallel Processing At Hardware Level In C#

SIMD, A Parallel Processing At Hardware Level In C#

As you can see on the benchmark, SIMD has defeated both For-Loop and LINQ methods by 2x, 27x faster, and actually I've written these methods readable and easy and they are managed, and you can find even better performance in unmanaged codes, At the end I think it would be better to use this feature when you were ensured by the performance that you are going to gain. 


Similar Articles