Token Caching in .NET 8 with Microsoft Entra ID

Introduction

The main benefit of using the Token cache is to improve your application's performance by reducing the need for repeated network calls to acquire access tokens. With token cache capability, subsequent requests can reuse the token from memory, resulting in faster response and reduced latency. This blog will show how to configure the token cache in a .NET 8 web application that uses Microsoft Entra ID authentication.

Configure Token Cache

The token cache implementation for the Microsoft Entra ID authentication is simplified with .NET 8.

Two types of cache implementation can be applied to the token cache.

  1. In-Memory
  2. Distributed

In-Memory

Tokens are cached in memory, basically on the server where the application is deployed. An in-memory cache is recommended for development in the testing environment because it will not persist long. That means the cache will be lost whenever the application restarts or scales. By caching tokens locally, the application reduces the frequency of token requests sent to the Microsoft Entra ID endpoint, decreasing the load on the identity provider.

Sample code for adding in-memory token cache for the ASP.NET Core web application.

builder.Services.AddAuthentication(OpenIdConnectDefaults.AuthenticationScheme)
    .AddMicrosoftIdentityWebApp(builder.Configuration.GetSection("AzureAd"))
        .EnableTokenAcquisitionToCallDownstreamApi(initialScopes)
            .AddMicrosoftGraph(builder.Configuration.GetSection("MicrosoftGraph"))
            .AddInMemoryTokenCaches();

The AddInMemoryTokenCaches() method will attach the in-memory token cache capability for the Microsoft Identity Authentication.

For production, we can use it for daemon applications and applications that use client credential grants or app-only tokens.

Distributed

A distributed cache is a caching mechanism shared across multiple application servers and is typically managed as an external service separate from the app servers accessing it. Unlike in-memory, the token cache is distributed across different instances, which makes it more reliable and persistent. This means it doesn’t lose the cached data

whenever the application restarts or during app service scaling.

This mechanism is highly recommended for the production environment.

We have many providers to implement distributed memory caching.

  1. Redis cache
  2. SQL Server
  3. NCache
  4. Azure CosmosDB

Sample code for implementing a distributed token cache for the ASP.NET Core web application.

builder.Services.AddAuthentication(OpenIdConnectDefaults.AuthenticationScheme)
    .AddMicrosoftIdentityWebApp(builder.Configuration.GetSection("AzureAd"))
        .EnableTokenAcquisitionToCallDownstreamApi(initialScopes)
            .AddMicrosoftGraph(builder.Configuration.GetSection("MicrosoftGraph"))
            .AddDistributedTokenCaches();

// Requires the Microsoft.Extensions.Caching.Cosmos NuGet package
services.AddCosmosCache((CosmosCacheOptions cacheOptions) =>
{
    cacheOptions.ContainerName = Configuration["CosmosCacheContainer"];
    cacheOptions.DatabaseName = Configuration["CosmosCacheDatabase"];
    cacheOptions.ClientBuilder = new CosmosClientBuilder(Configuration["CosmosConnectionString"]);
    cacheOptions.CreateIfNotExists = true;
});

The AddDsitributedTokenCache() method enables a distributed token cache. It is an adapter against the ASP.NET Core IDistributedCache implementation. The AddCosmosCache() method implements the Cosmos DB as a provider for token cache.

Summary

We have seen the implementation of token caching mechanisms for .NET 8 web applications using Microsoft Entra ID Authentication. We discussed two primary strategies: in-memory caching and distributed caching. In-memory caching stores tokens in the server's memory, offering quick access and reduced latency, but it has limitations in scalability, memory constraints, and data persistence. On the other hand, distributed caching stores tokens in an external cache like Redis or SQL Server. It enhances scalability and reliability by allowing multiple servers to share the same cache, though it introduces slightly higher latency and added complexity.