On occasion you have to find the means to get around an issue. Recently, with a client, I discovered an unknown SQL Server that wasn’t being properly cared for. This particular instance wasn’t taking backups and it was deemed as a critical production server. You can begin to see the problems that were arising.
However, there is a twist to the story. Due to changes in active directory, I couldn’t get the appropriate permissions configured for the SQL Server service to allow it to backup to the network destination. I also could not take an outage anytime soon to change the service account to something better suited. It’s a long and convoluted story for another post. The immediate need was to 1) get backups of all of the databases locally and 2) find a way to move them off of the local server. I also needed to do this on a schedule so that I could meet the RPO/RTO guidelines the client had set forth.
My solution? RoboCopy.
Robocopy has been around for years within the Microsoft eco-system and it is highly versatile. However, until recently, it wasn’t a tool that I was versed at and frankly, hardly used it. Over the past year or so, however, I have found myself using it more and more as a solution to for file movement when needed.
Essentially, robocopy will move files or even directories from one location to another. It will copy permissions without any additional coding, and it will only copy the files that do not exist within the destination. This is useful in that you do not have to explicitly account for files that might already exist at the destination. It handles it for you. Some other benefits of robocopy,
- Ability to restart in the event it fails, both reading and writing files
- Handles subdirectories
- Copy file attributes
- Can mirror directories
- Move files and then delete the source once they are copied
- many others…
You can see all of the options from the
documentation from Microsoft.
In this case, I’m interested in using the following switches,
- /s – process all of the subdirectories from the specified root folder
- /z – copy files in restartable mode. If it dies in the middle, it can pick up where it left off
- /eta – shows the estimated time of arrival of the files
- /r:3 – retry the reads for 3 before dying. The default is 1 million tries.
- /w:3 – number of seconds to wait before retrying. The default is 30 seconds
RoboCopy In Action
In this case, using RoboCopy is pretty straight forward. I have
Ola Hallengren’s Maintenance Solution running on the SQL Server instance, so I configured it to backup to a specific directory on the local server. I also make note of the available drive space. On this particular instance, the databases are relatively small. 3 days of backups stored locally is less than 5GB so not heavy disk usage. If you have larger databases, you might need to adjust how many backups you keep locally so that you do not consume the entire drive. Adjust accordingly.
The script itself is pretty simple and follows this format with the previously listed switches,
robocopy [destination] /s /z /eta /r:3 /w:3
or
robocopy c:\temp\servername \\naservername\sharename$ /s /z /eta /r:3 /w:3
SQL Agent Job
We currently do transaction log backups of production databases every 15 minutes. Since I am backing up the database as well as transaction logs to the local disk, I needed to be able to push them up to the network storage on a regular basis. You can use Windows Task Scheduler to facilitate this, however since I’m more versed in the SQL Server side of things, I decided to put this into an agent job. This gave me a lot of flexibility of scheduling and given that the files should be small, I can schedule the job to run more frequently. Running the job more frequently allows me to exceed the RPO/RTO requirements without a lot of extra overhead.
To set this up, first, we have to put the command into a batch file in which we can call from the agent job. In the batch file, we have to adjust the error levels so that robocopy will suppress a successful exit status and only report on actually errors. We do this because robocopy will report a successful exit as a non-zero ERRORLEVEL. SQL Server agent views as any non-zero exit a failure and will subsequently fail the step. I borrowed the below code from Rob Volk (T) and one of his
blog posts.
rem suppress successful robocopy exit statuses, only report genuine errors (bitmask 16 and 8 settings)
set/A errlev="%ERRORLEVEL% & 24"
Next, we have to adjust for an appropriate exit so that the agent job can succeed or fail correctly.
rem exit batch file with errorlevel so SQL job can succeed or fail appropriately
exit/B %errlev%
The entire batch file now looks like this,
rem suppress successful robocopy exit statuses, only report genuine errors (bitmask 16 and 8 settings)
set/A errlev="%ERRORLEVEL% & 24"
rem exit batch file with errorlevel so SQL job can succeed or fail appropriately
exit/B %errlev%
robocopy c:\temp\servername \\naservername\sharename$ /s /z /eta /r:3 /w:3
Once the batch file was created, I could create a standard agent job with a step that would execute the batch file. It looks like this,
You can see that the type has been set to “Operating System”. The “/c” switch tells cmd.exe to execute the string supplied, which in this case is the batch file that I had created.
Since the client has a RTO/RPO of 15 minutes, I scheduled the job to run every 5 minutes. This helps ensure that any recent backups are pushed to a secondary off-site storage within the 15 minute limitation in the event of a hardware failure.
Summary
The real solution to this issue would to adjust permissions properly for the SQL Server service or change the service account altogether. This would allow the SQL Server to backup directly to the network destination without needing a mitigating step in between. However, there are times where that might not be possible, and you have to find another solution. In this case, RoboCopy was a great middle man solution that fits all of my requirements. Next time you need to move files around, remember that RoboCopy might just fit the bill.
Enjoy!