With the short time I have been programming using Amazon Web Services (AWS) I have learned three things...
- AWS is NOT .NET Framework friendly! They are more .NET Core friendly but don't keep up with the latest version.
- The AWS .NET SDK needs a lot of work. I've previously wrote about this in my "Is Quality Part Of The Open-Source Projects Your App Is Using?" article.
- AWS documentation for .NET is REALLY bad and something things are completely missing!
So in this post, I will be discussing what I have learned, the hard way, using the Simple Queuing Service (SQS) and LAMBDA's. Actually, some of the info we had to get clarification from somebody on the AWS Premium Support team which took over a week and it still wasn't detailed enough to answer some of my questions.
With this LAMBDA, it is triggered whenever a message hits SQS. The function handler in .NET Core looks like this:
public void FunctionHandler(SQSEvent sqsEvent, ILambdaContext context)
There isn't much interesting in the ILambdaContext except for a reference to the LAMBDA logger. I set it global to the function as soon as it's invoked like this:
_logger = context.Logger;
For this project I'm only logging Exceptions to this log, the rest of them are sent to logentries.com, since combing through the AWS CloudWatch logs is very painful. The main thing I want to discuss is...
Message Deletion
When a message is sent to a LAMBDA via a SQSEvent, if the message is processed normally by the LAMBDA, then SQS automatically deletes it as soon as the LAMBDA completes. Pretty simple right? But what if there is an issue processing the message and you don't want SQS deleting it so it will be automatically retried at a later time?
There are a few steps you need to keep in mind, with the last one being the most important.
Default Visibility Timeout
First, the Default Visibility Timeout (the time the message will be invisible to other processes) for the queue needs to be set to a realistic time (in seconds). For example, for this LAMBDA, a 60 second timeout was set, so we set the Default Visibility Timeout to 2 minutes. After that 2 minutes, SQS will call the LAMBDA again to process the message. Think of this an easy automatic retry.
Maximum Receives
Second, the Maximum Receives on the queue needs to be set, which is basically the number of times the message will be retried (yes I know, it's a dumb name). For this LAMBDA, we set it to 720 which means SQS will try to process it for 1,404 minutes or 23.4 hours. In this instance, our authorization tokens coming from mobile clients are invalid after 24 hours, so this number is perfect. If we can't process it after that amount of time, we have much bigger issues. After the Maximum Receives number is hit, then the messages can and should be moved to a dead letter queue.
Reprocessing of the Message
Lastly, this is how to tell SQS to reprocess the message... well this is what isn't documented by AWS. At first we were told to send back an "error" and SQS will not delete it so I will be reprocessed later. My first question was "what error"? Do I send back an HTTP Status code like in other areas of AWS? Do I send back a .NET Exception? So I send back text? Something else?
If API Gateway calls a LAMBDA, a APIGatewayProxyResponse type can be sent back so it will know if the LAMBDA process properly via the StatusCode (an HTTP Status Code number) property. There isn't an equivalent response for SQS.
The AWS Premium Support person told us "failure means there were some errors when processing the message". Again, this tells me NOTHING about what SQS wants back so the message won't be deleted. (can you feel my frustration?) So what I tried was a try-catch then a throw in the FunctionHandler method like this...
try
{
//Code removed for brevity
}
catch (Exception ex)
{
_logger?.LogLine($"Error running LAMBDA: {ex.Message}");
_logentriesLogger?.Error(ex, "Error running LAMBDA.");
throw;
}
That did the trick! This LAMBDA calls an endpoint in our back-end that will start throwing timeouts if it can't handle the load. When that happens, SQS kicks in to do the retires and eventually all the messages are processed.
It would be REALLY nice if AWS documented this and made their SDK more constant!! How does this compare to Azure? I don't know, but I intend to find out for a future post, stay tuned!
Do you have a better way? Make a comment below.