Issue
Another problematic day with SharePoint administration. I asked our DBA to send us the database size report about our services databases (which we asked once a while), it was shocking for us. Our configuration Database size is 509 GB (only data file mdf), Yes It is 509 GB. Time for investigation.
Troubleshooting
We asked the DBA, give us another size report with table along with sizes. What we noticed in that report is TimerJobHistory table is using 506 GB (949 Million records) out of 509 GB. Now we moved to SharePoint servers and trying to understand what’s going on, why timer job history is growing that big.
Every time a timer job runs in SharePoint, it create an entry (row) in the Timer Job History table of the Configuration database. There is one timer job called "Delete Job History", which is responsible to purge / delete the old record accordingly (depending upon the retention period & Timer Job schedule).
In ideal environment, it should auto truncate the old record (as per farm settings, OOTB 7 days retention policy). But that is not happening in our case. So we move toe timer job in central admin and check the “Delete Job History” status as well as history.
- It schedule to run once a week (weekly)
- On history page, that job never completed successfully, always failed after 5 min run.
So clearly, this job is doing what it’s meant to, I did further investigation and run the job manually. After 5 minutes, job’s status changed to failed from running and only purge the 1 Million record as per the database report. Following error it throws with failed:
The Execute method of job definition Microsoft.SharePoint.Administration.SPDeleteJobHistoryJobDefinition (ID f9at30d8-34ed-44b6-2348-82356b115erf) threw an exception. More information is included below. Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding... (Correlation=34b04a9e-9def-803b-qea1-aa67aa60a3cc)
I again start the timer job manually, it stopped again after 5 min with above error and delete only 1 million records.
Root cause
From our investigation looks like it is bug in the SharePoint "Delete job History", it has timer in it which force it to shut down after 5 minutes of run. This job is schedule to run once a day and there is no auto re run after failure before next schedule time. SharePoint adding 8 Million record in the table while delete history job purge only 1 million so we are adding roughly 63 million every week, which cause this size issue with configuration database. I report the same bug to MSFT, let’s see if they accept it as defect or give us any workaround.
Here is the great reading which explain the problem with more information, it applies to SharePoint 2013. For Information only - Changes in the TimerJobHistory table maintenance introduced in November 2017 CU for SharePoint 2013
Note: To prevent this from happening you should ensure that the Delete Job History job has been running successfully and should be not disabled or long running schedule (i.e. run once a month)
Resolution
Now I made the following changes to our environment to fix the issue.
Default value
- Timer Job history retention is 3 days
- Timer job run once a week ( weekly )
Changed Value
- Change the retention period from 3 days to 2 days.
- Change the "Delete Job History" schedule from weekly to hourly.
After above changes, in couple of days size of “TimerJobHistory“ come down to 3 GB, Now the issue with free space inside the Configuration database. Now, we have to deal with shrinking of config database and get the space back. For shrinking Microsoft recommends an article for it:
Note: For Shrinking of configuration database, it is recommended contact the Microsoft support to get the guideline otherwise the shrinking operation may put the environment out of support.
For the Shrinking a database file, here is the article which mostly recommended but need approval from MSFT.Shrink a File
Following steps,
- To shrink the files right click on the database > Tasks > Shrink > Files:
- Select the appropriate File Type based on the file that you wish to shrink (Data or Log).
- Click Ok and it will shrink the file to the desired size.
- Once completed for the necessary files you can re-run the disk usage report and should see a reduction in the overall size and amount of free space.
- Now check the Disk Usage Report of the content database and make sure free space released to the OS.
Hope this will help and save your time.