[LRUG] Queuing systems

Wed Sep 7 13:18:36 PDT 2011

My biggest issue in the past has been getting something like RabbitMQ setup and debugging the often cryptic errors. The RabbitMQ guys have just pushed an add-on into private beta on Heroku. Getting it up and running has been really simple (http://blog.heroku.com/archives/2011/8/31/rabbitmq_add_on_now_available_on_heroku/), but I've only used it with Heroku. I assume they have a fully hosted product that you could use outside of Heroku too if you wanted.

Then the problem comes back to focussing on scaling your workers easily to keep up with your load.

In my limited experience, I naively thought that the most important thing was to save the data and I could process it later. When dealing with something that has consistently high throughput (like the twitter stream) if you're unable to process at roughly the same rate data is coming in then you just end up with an increasingly large backlog that you'll never get ahead of. The only option is to be able to process at near real time, so you need to parallelise enough to handle that or throttle input to a level you can mange.

On 7 Sep 2011, at 10:55, Neil Middleton wrote:

> Good point on the parallel processing - that will need some thought.
> 
> xMQ does seem to be the ideal solution - it's just a case of getting it hosted somehow reliably. We don't really want to get into messing about with hosting too much.
> 
> Neil
> On Wednesday, 7 September 2011 at 10:48, Chris Rode wrote:
> 
>> Processing every job is accommodated within the AMQP specification (be careful about double processing though). In order, removes the ability to parallel process. 
>> 
>> In such systems it is much easier to create the job on the queue than process it. If you don't parallel process when consuming you may create a bottle necked backlog which will get worse over time. 
>> 
>> Use cases off the golden path will not guarantee In order processing, even in the AMQP compliant techs. 
>> 
>> On 07/09/2011, at 11:31, Neil Middleton <neil.middleton at gmail.com> wrote:
>> 
>>> Everything processed in order.  Generally jobs are similar to analytics data, and are processed to provide a similar sort of stats set.
>>> 
>>> Every single job needs to be processed.
>>> 
>>> Neil
>>> On Wednesday, 7 September 2011 at 10:29, Graham Ashton wrote:
>>> 
>>>> On 7 Sep 2011, at 10:18, Neil Middleton wrote:
>>>> 
>>>>> Hundreds, possibly thousands of new jobs per second.
>>>> 
>>>> I was actually wondering in terms of bytes per second, but that's still a useful answer.
>>>> 
>>>> I know I'm still avoiding the question here, but I'm now wondering how you're thinking of going about processing them. That would probably have an impact on how I'd approach queueing.
>>>> 
>>>> i.e. If your job processing component went off line for six hours for some reason, would it (when it came back up) be more important to process the earliest queued data, or the most recent data?
>>>> 
>>>> If your job processing stuff got a long way behind would it make sense to drop really old jobs and just start on recent stuff? I'm wondering if this data needs persisting to disk, or whether you can get away with stashing it in RAM (possibly in a persistence backed key value store like Redis).
>>>> _______________________________________________
>>>> Chat mailing list
>>>> Chat at lists.lrug.org
>>>> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>> 
>>> _______________________________________________
>>> Chat mailing list
>>> Chat at lists.lrug.org
>>> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>> 
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> 
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20110907/8a2853ae/attachment-0003.html>