[LRUG] Queue-related war stories

Najaf Ali ali at happybearsoftware.com
Mon Mar 16 02:12:04 PDT 2015


Hi all,

I'm trying to identify some general good practices (based on real-life 
problems) when it comes to working with async job queues (think DJ, Resque 
and Sidekiq).

So far I've been doing this by collecting stories of how they've failed 
catastrophically (e.g. sending thousands of spurious SMS's to your 
customers) and seeing if I can identify any common themes based on those.

Here are some examples of what I mean (anonymised to protect the innocent):

* Having a (e.g. hourly) cron job that checks if a job has been done and 
then enqueues the job if it hasn't. It knows this because the successfully 
completed job would leave some sort of evidence of completion in e.g. the 
database. If your workers go down for a day, this means the same job would 
be enqueued over and over again superfluously.

* Sending multiple emails (hundreds) in a single job lead to a problem 
where if just one of those emails (say the 24th) fails to be delivered, the 
entire job fails and emails 1-23 get sent again when your worker retries it 
again and again and again.

* With the workers/app running the same codebase but on different virtual 
servers, deploying only to the application server (and not the server 
running the workers) resulted in the app servers queueing jobs that the 
workers didn't know how to process.  

It would be great to hear what sort of issues/incidents you've come across 
while using async job queues like the above. I don't think I have enough 
examples to make any generalisations about the "right way" to use them yet, 
so more interested in just things that went wrong and how you fixed them at 
the moment.

Feel free to reply off-list if you'd rather not share with everyone, I 
intend to put the findings together in a blog post with a few guesses as to 
how to avoid these sorts of problems.

All the best,

-Ali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20150316/f577650f/attachment-0007.html>


More information about the Chat mailing list