[LRUG] Queue-related war stories

Mon Mar 16 03:07:34 PDT 2015

At-least-once semantics is one of the reasons to look beyond Resque/Sidekiq
once you've moved the overhead of message management off your database
(DelayedJob).

On a related note Salvatore Sanfilippo (antirez) has been working on Disque
which sounds interesting. He published a blog post overnight on how that's
going http://antirez.com/news/88

*Garry Shutler*
@gshutler <http://twitter.com/gshutler>
gshutler.com

On 16 March 2015 at 10:01, James McCarthy <james at lety.co> wrote:

>  I've used and swear by RabbitMQ, avoiding all the DB as a Q
> locking/transaction/commit issues.
>
> Handy thing with RabbitMQ is that it includes an acknowledge configuration
> which can be auto or manual.
>
> With manual, you need to call the acknowledge method before the message is
> removed from the queue.
>
> James.
>
>
> On 16/03/15 09:12, Najaf Ali wrote:
>
> Hi all,
>
>  I'm trying to identify some general good practices (based on real-life
> problems) when it comes to working with async job queues (think DJ, Resque
> and Sidekiq).
>
>  So far I've been doing this by collecting stories of how they've failed
> catastrophically (e.g. sending thousands of spurious SMS's to your
> customers) and seeing if I can identify any common themes based on those.
>
>  Here are some examples of what I mean (anonymised to protect the
> innocent):
>
>  * Having a (e.g. hourly) cron job that checks if a job has been done and
> then enqueues the job if it hasn't. It knows this because the successfully
> completed job would leave some sort of evidence of completion in e.g. the
> database. If your workers go down for a day, this means the same job would
> be enqueued over and over again superfluously.
>
>  * Sending multiple emails (hundreds) in a single job lead to a problem
> where if just one of those emails (say the 24th) fails to be delivered, the
> entire job fails and emails 1-23 get sent again when your worker retries it
> again and again and again.
>
>  * With the workers/app running the same codebase but on different
> virtual servers, deploying only to the application server (and not the
> server running the workers) resulted in the app servers queueing jobs that
> the workers didn't know how to process.
>
>  It would be great to hear what sort of issues/incidents you've come
> across while using async job queues like the above. I don't think I have
> enough examples to make any generalisations about the "right way" to use
> them yet, so more interested in just things that went wrong and how you
> fixed them at the moment.
>
>  Feel free to reply off-list if you'd rather not share with everyone, I
> intend to put the findings together in a blog post with a few guesses as to
> how to avoid these sorts of problems.
>
>  All the best,
>
>  -Ali
>
>
> _______________________________________________
> Chat mailing listChat at lists.lrug.org
> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
>
> --
> James McCarthy
>
> Software Consultant
>
> LetyCo
>
> Mob:  07577006897
>
> Email:  james at lety.co
> lety.co
>
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20150316/0cbeaa6c/attachment.html>