Things I Didn't Know About SQS

Tutorials

October 19th, 2021

amazon-sqs-leader.jpg

When it comes to queues, the AWS SQS service is a great option. It's super cheap, super reliable, and can scale higher than most of us will ever need.

In general, I'm a fan of any service I don't have to manage myself. Most managed services in AWS are rather expensive. SQS is one of the few very useful services that is extremely affordable.

However, there are a few key differences you need to know about it! Each of the following details have bitten me before. Here's the little details about SQS that you should know!

Visibility Timeout

One unique part of SQS queues is the Visibility Timeout.

Most of us are probably used to not having to think about this - when we get a job in our Laravel queue worker, no other queue worker will pick up that job.

SQS works a bit differently. Let's say our Visibility Timeout is set for 10 seconds. If our queue worker takes more than 10 seconds to process the job, SQS will make the job visible again. This means another queue worker may pick that job up!

So we need to be careful that our Visibility Timeout is set HIGHER than it will take to get a job completed.

Visibility Timeout can be set to a default value for each SQS queue, or it can be set for each individual job. You can even extend a visibility timeout on a job that already exists.

Laravel won't set a Visibility Timeout for you, so it's best to set a default value that's higher than it takes to process your jobs.

There is a "hack" you can use to extend a job's Visibility Timeout, however! You can call the release() method, making sure to set a delay.

For the SQS queue driver, this sets the visibility timeout. If you have a specific job that likely needs more time than the default Visibility Timeout, you can call this method early in your job (with a delay!!) to increase the Visibility Timeout. This will give your job more time to process before being made visible for another queue worker to pick up.

1public function handle() {
2 // Increase visibility timeout for this job
3 // to one minute
4 $this->release(60);
5 
6 // Continue on with the job...
7}

If you don't set a delay, the Visibility Timeout will be zero, which makes the job visible in the SQS queue immediately - likely not what you want!

Long and Short Polling

SQS's API is all HTTP based. This means that when our queue workers poll SQS for a new job, it's making an HTTP request to the SQS API.

By default, this does "short polling" - if no job is available when the HTTP request is made, SQS immediately returns an empty response.

Long polling allows you to keep an HTTP request open for a certain amount of time. While the HTTP request is open, SQS may send a job to the queue worker at any time.

Laravel doesn't do any long polling, but there is something important to know here.

If you use the SQS queue driver, you may see that some jobs take a while to get processed - as if the queue worker can't find a new job. This is related to how SQS is scaled within AWS.

Here's the relevant thing to know, from the SQS docs:

With short polling, the ReceiveMessage request queries only a subset of the servers (based on a weighted random distribution) to find messages that are available to include in the response. Amazon SQS sends the response right away, even if the query found no messages.

With long polling, the ReceiveMessage request queries all of the servers for messages. Amazon SQS sends a response after it collects at least one available message, up to the maximum number of messages specified in the request. Amazon SQS sends an empty response only if the polling wait time expires.

It turns out that with long polling, we're likely to get jobs more quickly as it polls all of the SQS servers that may contain our jobs!

However, Laravel doesn't support long-polling out of the box. Luckily, we can do something about that. There's a little note in the bottom of the docs linked above:

Short polling occurs when the WaitTimeSeconds parameter of a ReceiveMessage request is set to 0 in one of two ways:

  • The ReceiveMessage call sets WaitTimeSeconds to 0.
  • The ReceiveMessage call doesn’t set WaitTimeSeconds, but the queue attribute ReceiveMessageWaitTimeSeconds is set to 0.

Laravel won't do the first bullet point there - it doesn't explicitly enable long polling by setting the WaitTimeSeconds when polling for new jobs.

However, if we set the SQS Queue's default ReceiveMessageWaitTimeSeconds parameter to be greater than 0, long polling is enabled on SQS's end! While the Laravel queue worker won't wait for the full value of ReceiveMessageWaitTimeSeconds (it waits for whatever timeout is set by default for making HTTP requests by the AWS PHP SDK), this still triggers SQS to check all servers as if it's long polling, which means we're more likely to get jobs from our SQS queue more quickly.

It's a small thing, but certainly has helped me resolve my occasional annoyance with SQS queues!

Guarantees and Job Order

There are 2 flavors of SQS:

  1. Standard
  2. FIFO (first in, first out)

Standard SQS

Standard SQS is the one most of us use. It guarantees "at least once delivery", and nothing else. This means 2 things:

There's no de-duplication - if you send the exact same job into SQS more than once, it will get processed more than once. This likely isn't a surprise. If you send the exact same job into any Laravel queue, you'll end processing that job more than once!

There's no guaranteeing message order - You won't necessarily get the jobs processed in the same order that you send them in, even if you only use a single queue worker.

This is different from other Laravel queue drivers, such as Database and Redis drivers.

Note that Standard SQS attempts to send jobs in the order they are received, but it's not always possible due to the scale and architecture of SQS. That's noted in the SQS FAQ here.

FIFO SQS

FIFO queues have 2 guarantees:

  1. Exactly-Once Processing
  2. Message Order

FIFO queues will remove duplicate jobs. You can allow SQS to determine if a job is a duplicate based on the job data, or you can set a Deduplication ID to a value of your choosing. SQS uses that value to compare against other jobs. If it finds a job with the same content or Deduplication ID, it will delete the duplicate job.

So, this helps you ensure that you don't process the same job more than once.

FIFO also guarantees message order. FIFO queues deliver jobs in the order they are sent in (first in, first out - the oldest job/message get processed first).

No job is sent to be processed until the one before it is completed (deleted). This, however, means that having more than one queue worker is useless. How then, do you scale our FIFO queues to handle more than one job at a time?

You can use Message Groups. Order is only guaranteed within a message group, so you can get concurrency in FIFO queues by assigning a unique message group to jobs that should be processed in order.

For example, you may want to assign a message group per user in your application, in which case you Message Group ID might be set to something like "user-x" where x is the user ID.

There's a bit more to FIFO queues. To read more about them, see our article on Using FIFO Queues!

Filed in:

Chris Fidao

Teaching coding and servers at CloudCasts and Servers for Hackers. Co-founder of Chipper CI.