[LRUG] Multi-threading, Ruby & Rails

Mon Sep 17 13:55:21 PDT 2012

Hey Paul, 

At my last job I dealt with a somewhat similar problem. Email generation, rendering and sending. The numbers were in the millions per day, and it had to happen really quickly. The solution I worked on wasn't perfect, and it was a learning experience for everyone involved, but here's some of thoughts based on that experience.

Correct me if I'm wrong, but it sounds like currently your setup processes the task from beginning to end in one big swoop taking hours of CPU processing time. Based on that assumption, here's roughly what I would try if I were in your shoes:

1. Break apart the task into individual steps.
2. Use a message queue of some sort (RabbitMQ, Resque, etc.), and publish a message to the queue containing metadata for the very first step of the task.
3. Have X number of single-threaded Ruby workers all listening for messages on the queue. When they receive a message, they determine what code to run and with what arguments based on metadata about the step, overall task etc. that's in the message.
4. When a worker is done performing the step in question, it publishes a new message for the next step of the task, which any other worker can receive and perform, in turn publishing another message for step thereafter.

I'd recommend running the workers on a dedicated worker box, but it's not a requirement if their CPU/Memory usage is predictable and you manage it well.

At this point, the simplest way to scale would be to simply start up more single-threaded Ruby workers to give you increased parallelisation. It's parallel processing without multi-threading in Ruby at the expense of system RAM though, as each running Ruby process typically has a 30-100MB memory footprint before it event does anything.

Hence the next step is generally multi-threading, as more jobs will be processed in parallel per Ruby process and it's memory footprint. As we all know multi-threading can be rather painful, but done right it's very effective. Unfortunately we only ended up using multi-threading for a couple of worker types at my last place, so I didn't have a chance to mess with it as much as I would have liked before I left. But from what I know, if performance and specifically multi-threaded performance is key, JRuby surely seems to be the way to go.

Hopefully this will be of some interest and use to you Paul :)

P.S. At my old job, we ended up using RabbitMQ as our message broker, and custom-built Ruby workers consuming messages and performing the work. We had lots of different worker types doing different jobs, and some workers doing a whole range of jobs. The decision of which workers do what, how messages flow through different queues and such can have a great impact on performance if done correctly. However, that is massive topic all in it's own :)

-jimeh

On Monday, 17 September 2012 at 18:57, Paul Robinson wrote:

> Hi all,
> 
> Now the recruiter rant post is on HN, let's move that discussion over there and talk about some proper Ruby stuff, eh? Please?
> 
> Right, multi-threading, Ruby and Rails.
> 
> This is causing me some pain, and I suspect it's because my mid-/low-level coding voodoo left my soul sometime around 2004. The beauty of a high-level language such as Ruby mixed with the fact I have not had to spend a moment thinking about memory management in 6 years has left my deeper coding brain soft, flabby and over-obsessed with meta-programming. A little like the fattened goose before Christmas (who are *so* into meta-programming, btw).
> 
> On our current project we have a linear process that takes some time to process. It can easily be parallelised, because it's a discrete set of 20-30 steps that need to be done in order for each of the 'x' number of instances we're dealing with. Right now it can take hours, and for various reasons we need it to take seconds.
> 
> My first stab at this was to look at benchmarking profiles and to look for single methods that were taking up a lot of wallclock time. There aren't any. We're not locking on I/O, we're not sitting in a single method for 30% of the time or anything, it's just a long drawn-out set of processes. Interestingly, the only headliner (at 8% of wall clock) is Kernel#Integer and we can't eliminate that. 
> 
> So we're moving straight to parallelisation.
> 
> My first thought was to either:
> 
> a) Split things up into separate fork'ed processes, but I don't like the bootstrap/tidy-up overhead that fork provides
> 
> b) Throw it out to cloud-like infrastructures like Hadoop/MapReduce, but the problems needs direct SQL access and that can get messy
> 
> c) Multi-thread it, and at least on a single server be able to get 8x-16x performance increase over multiple cores and maybe re-visit b) but with something a bit more pure Ruby-esque like delayed job, resque, etc.
> 
> The problem is, multi-threading in Ruby - particularly in Rails with ActiveRecord model actions - kinda sucks. I can get it working, but it's painful. It doesn't look or feel graceful, and frankly I'm not sure if the internal methods for doing it are all that careful.
> 
> Anybody here with experience in this little niche want to open up the discussion, provide some pointers and context, before I start poking around the internals of MRI? I've discovered that JRuby has a potentially better internal implementation of Thread, but I've not had a chance to play with it in anger yet - is it worth it?
> 
> Thanks in advance,
> 
> Paul
> 
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org (mailto:Chat at lists.lrug.org)
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20120917/c4560bd0/attachment-0003.html>