The Distributed Cache (DC) is a new component that has been added to SharePoint 2013. Social networking tools, such as My Sites, and social content technologies, such as microblogs, activity feeds, news feeds, authentication tokens etc., are examples of social computing features. Thus, it's one of the most critical parts for SharePoint 2013 in terms of social computing.
The Distributed Cache service uses Windows AppFabric caching technology behind the scene.
The cache could consume a ton of memory for the application and web servers. While implementing DC service, there are two modes that could be used:
- Collocated mode – in this mode, the Distributed Cache service runs together with other services on the application server.
- Dedicated mode – in this mode, all services other than the Distributed Cache service are stopped on the application server that runs the Distributed Cache service.
Microsoft recommends using dedicated mode in the SharePoint Farm.
Capacity planning is an important factor which you will implement in the SharePoint farm.
These are Microsoft recommended Distributed Cache capacities:
Deployment size | Small farm | Medium farm | Large farm |
Total number of users | < 10,000 | < 100,000 | < 500,000 |
Recommended cache size for the Distributed Cache service | 1 GB | 2.5 GB | 12 GB |
Total memory allocation for the Distributed Cache service (double the recommended cache size above, plus reserve 2 GB for the OS) | 2 GB | 5 GB | 34 GB |
Recommended architectural configuration | Dedicated server or co-located on a front-end serve | Dedicated server | Dedicated server
|
Minimum cache hosts per farm | 1 | 1 | 2 |
Note: In the Distributed Cache service, cache size should not exceed 16 GB. So, Microsoft recommends that you use two servers while working in a large farm environment.
While implementing the DC, it is better to have dedicated farm even for a small farm.
What I found in TechNet, troubleshooting for DC is not very documented, especially when you run into issues. Fortunately, there are blogs that help in troubleshooting the DC.
My SharePoint Server 2013 farm is, as follows:
OS: Windows Server 2012
SharePoint Version:
SharePoint Server 2013 Standard, Build number: 15.0.4420.1017 (RTM)
SQL Server:
SQL Server 2012
A) App Server, 8 GB RAM
B) Web Front End 01, 3 GB RAM
C) WEB Front End 02, 3 GB RAM
First things first. I will list down all the pre-requisites for Distributed Cache to function properly, so that you do not pull out your hair and become frustrated like me! :)
- Warning while setting DC service.
Do not restart the AppFabric Caching in the services console. Microsoft strongly recommends this and if you do this, you might need to rebuild your farm.
- Always use PowerShell the Distributed cache commandlets.
- Firewall Ports
- Distributed Cache requires following high ports. (22233, 22234, 22235, 22236)
Note: If the firewall has been opened of above ports, use PowerShell using Distributed Cache Commandlets, the DC ports will be opened automatically. - ICMPv4 and ICMPv6 have to be opened for DC to function properly.
Besides this following ports have to be opened as well: 8, 138, 139, 445 Ports required
- Firewalls in the organization
If the Network topology has 2 – 3 firewalls for SharePoint farm, all Firewalls have to be opened as well.
Search and User Profile requirements
- Search: Continuous crawl has to be enabled.
- User Profile: The service account of the application pool of the web application for My Site should have Full Control.
- Use Stop-SPDistributedCacheServiceInstance –Graceful to stop any of the Distributed cache instances for any SharePoint server.
- Assign the Distributed Cache memory when you set up the Distributed cache instance for all SharePoint servers. DC eats memory like crazy and users will complain later on.
- Remote Services to be enabled.
I will cover both collocated and dedicated modes for DC configuration.
-
In collocated configuration, each server in the farm will have DC instance with the STARTED status.
- Whereas in the dedicated configuration, you can choose either one server to be dedicated Distributed Cache servers and other web serversMUST have STOPPED status. The Distributed Cache instance MUST be available on all SharePoint servers.
Issue #1 Error: cacheHostInfo is null or removing existing DC instance Remove-SPDistributedCacheServiceInstance
Fix:
Forcefully delete the Distributed Cache Instance as follows:
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName –and ($_.server.name) -eq "SP2013App"}
$serviceInstance.Delete()
Add-SPDistributedCacheServiceInstance
Issue #2 Error Starting the Distributed instance Cache
While you provision DC instance, you may receive the above error.
Fix:
Remove and add the DC instance.
#Removing the service from SharePoint on local host.
Stop-SPDistributedCacheServiceInstance –Graceful Remove-SPDistributedCacheServiceInstance$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($_.server.name) -eq $env:computername}$serviceInstance.delete()
#Add DC Instance
$SPFarm = Get-SPFarm
$cacheClusterName = "SPDistributedCacheCluster_" + $SPFarm.Id.ToString()
$cacheClusterManager = [Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheClusterInfoManager]::Local
$cacheClusterInfo = $cacheClusterManager.GetSPDistributedCacheClusterInfo($cacheClusterName);
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.Service.Tostring()) -eq $instanceName -and ($_.Server.Name) -eq $env:computername}
$serviceInstance.Delete()
Add-SPDistributedCacheServiceInstance
Issue #3 ErrorCode<ERRPS002>:SubStatus<ES0001>:Invalid provider and connection string read. Please provide the values manually.
Fix:
Somehow, the connection string has been missing and we need to manually add the database entry for AppFabric as follows:
- Run (Windows + R) and enter Regedit
- HKEY_LOCAL_MACHINE >> SOFTWARE >> MICROSOFT >> AppFabric >> V1.0 >> CONFIGURATION
- Enter Connection String and Provider as follows:
Connection String:
Data Source=spsql;Initial Catalog=SPFarm_SharePoint_Config;Integrated Security=True;Enlist=False
Provider:
SPDistributedCacheClusterProvider
Then use PowerShell to verify the Distributed Cache
Use-CacheCluster
Get-CacheHost
Issue #4 Page load take 6 seconds.
Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage 'DistributedViewStateCache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One ormore specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.) ---> System.ServiceModel.ProtocolException
Page load took more than 6 seconds in Developer Dashboard, as shown:
and you can see there is exactly 6 seconds in the developer dashboard.
In my SharePoint environment, I was getting the following errors as all in collocated mode for DC.
Fix:
It took more than 4 weeks to find the actual issue for me. To troubleshoot the Distributed cache, we need to know what incorrect settings were in my environment:
As mentioned, I have 3 SharePoint Servers,1 Application, and 2 web front-ends.
a) On App Server
Use-CacheCluster
Get-CacheHost
Only APP server status is UP.Apps02: UP
Wfe01: Unknown
Wfe02: Unknown
And other WFE server were showing below errors:
Error: SubStatus(ES0001): Cache host SP13WFE01.contoso.com is not reachable. Error: SubStatus(ES0001): Cache host SP13WFE02.contoso.com is not reachable.
b) first Frond End Server
Apps02: Unknown
Wfe01: Down
Wfe02: Unknown
c) Second Frond End Server
Apps02: Unknown
Wfe01: Unknown
Wfe02: Down
App02
| Wfe01
| Wfe02
|
Apps02: UP Wfe01: Unknown Wfe02: Unknown
| Apps02: Unknown Wfe01: Down Wfe02: Unknown
| Apps02: Unknown Wfe01: Unknown Wfe02: Down |
Clearly, each cache host is not able to connect to each other in above errors. So on each SharePoint server, the current server (Apps02) shows UP services status, whereas other WFEs shows UNKNOWN status. Same applies to WFE01 and WFE02. During my troubleshooting, I found if any server has UNKNOWN status, it means some configuration has to be fixed.
Collated mode
Step 1: Inbound rule for Distributed Cache ports (22233 - 2223) for each server in Firewall.
Perform this for each server.
Now, in my SharePoint farm WFE02 shows these settings
we have to open Firewall for WFE01 as well.
Step 2: Start the Remote services on each server as shown:
Step 3: Turn on Ping for all SharePoint servers.
Now, each SharePoint server has server status as UP.
Use-CacheCluster
Get-CacheHost
App Server:
WFE01:
WFE02:
This works perfectly in the collated mode for Distributed Cache.
Verify the page load and in my environment page load took 288.69 milliseconds with Distributed Cache started.
To simulate Dedicated Distributed Cache server, I stopped the DC instance for both, the WFEs and only Application server, to manage the Distributed Cache instance.
APP02
WFE01
WFE02
I hope this article helps someone.