[LRUG] Multi-threading, Ruby & Rails

Tue Sep 18 02:07:02 PDT 2012

I'll try and deal with all of the responses in one email, thanks for the input so far.

On 18 Sep 2012, at 09:09, Pratik <pratiknaik at gmail.com> wrote:

> It really depends on why exactly those steps are taking hours to run.
> Are there too many database/network calls that are slowing things
> down? Or is most of the time spent in actual processing? If former,
> you could start with MRI 1.9 + Threads. Most of the ruby libraries now
> play very well with network calls and threading, allowing other
> threads to run when one is waiting for results. This is true for
> mysql2, net:http.

There's a lot of DB calls in there, but nothing in the benchmark profiling is suggesting that's the slow-down. We just have a lot of things to do in a certain order per piece of data.

If you were to plot %age wall clock on the y-axis and have a bar for each method along the x-axis, we might expect a bar graph to look something like (ASCII-art warning: please read in fixed-width font):

	|*
	|*
	|**
%age	|**
	|***
	|*********************
	+---------------------

In other words, a couple of methods are taking up a lot of time, and we'd bash those down. The long-tail would not concern us. But what we're actually seeing is something more like:

	|**
	|***
	|**************
%age	|********************* ->
	|********************* ->  continues this way for 100+ methods
	|********************* ->
	+---------------------

So that means we're either in a situation where only parallel processing is going to give us a win, because we can't optimise a handful of methods, or that the profiling is obfuscating the fact we are actually I/O bound but we're not seeing it from the standard profile report.

One suggestion we've come up with internally is to try and eliminate the AR/SQL calls, or at least move the SQL DB into RAM to see if that shortens up the tail somewhat. We've also considered dropping functionality in the process to shorten the long tail, but we're reluctant to do that, because what's the point of being fast if we're not shipping the value we started out wanting to deliver?

On 18 Sep 2012, at 08:51, Sidu Ponnappa <ckponnappa at gmail.com> wrote:

> If you're comfortable with threading and it feels like a good fit,
> switch to JRuby without a second thought. Otherwise, Hadoop.

Switching to JRuby is something several people have suggested, and it's looking more and more attractive.

> Remember that Ruby is moving into the threaded web app space after
> many many years with Rails 4, Rubinius, JRuby  and Phusion Passenger
> all supporting threads. You will no longer be fighting an uphill
> battle.

Rails 4 looks like it will help, JRuby sounds like it already will, and for processing controller methods, we're already up to speed on Passenger. In fact, one possibility is that we use passenger as a sort of thread proxy. We have a controller that handles the processing for one data point based on params, and then a pool of passenger workers that get called by the front-end app, allowing passenger to do the dirty work. At least that way we'd get to stay with MRI, but goodness, that sounds hacky and dirty.

On 17 Sep 2012, at 23:30, Tim Cowlishaw <tim at timcowlishaw.co.uk> wrote:

> However, there's a DB input format [1] for hadoop that allows you to use the rows returned by a DB query as the input to a mapreduce job which might be helpful in this case. It depends a little on the complexity of the query

Alas, the complexity of the queries involved is relatively high, but I'll take a look at Nathan Marz's book this week, thanks for the heads-up. It might be the way to go.

> If you go don this route but are keen to use some sort of higher-level concurrency primitive than threads, locks, mutexes etc then you might want to take a look at akka [4], an erlang-ish   library for actor-based concurrency on the JVM.  It's written and maintained by Typesafe, the scala guys, but is usable from any other JVM language too (and it looks like people have had some success using it with JRuby [5]), so it might prove fruitful if you decide that JRuby's the way you want to go.

That looks very promising. I suppose one of the nice things about the JRuby environment is once you get over the ickyness of it being Java, you get the benefits of a mature and stable ecosystem of libraries to hook into, and that's a pretty new area for us to explore.

On 17 Sep 2012, at 22:38, Roland Swingler <roland.swingler at gmail.com> wrote:

> I've not tried it and I don't know whether you need SQL or "SQL-like"
> but there are things like hive http://hive.apache.org/ built on top of
> hadoop that may be of some use?

We'd ruled out Hive initially because it would mean porting some legacy complex queries, but several people have suggested it off-list, and it might be worth a spike to see if it's worth the pain. Thanks for the pointer.

On 17 Sep 2012, at 21:55, Jim Myhrberg <contact at jimeh.me> wrote:

> Correct me if I'm wrong, but it sounds like currently your setup processes the task from beginning to end in one big swoop taking hours of CPU processing time.

Absolutely correct, but with quite a few AR-induced queries going on in the middle, too.

> 1. Break apart the task into individual steps.

We almost have this part done.

> 2. Use a message queue of some sort (RabbitMQ, Resque, etc.), and publish a message to the queue containing metadata for the very first step of the task.

You're the third person to suggest RabbitMQ to me, and we already have Resque running as part of another process, so this starts to sound promising...

> 3. Have X number of single-threaded Ruby workers all listening for messages on the queue. When they receive a message, they determine what code to run and with what arguments based on metadata about the step, overall task etc. that's in the message.

Yup, sounds like where we were heading.

> 4. When a worker is done performing the step in question, it publishes a new message for the next step of the task, which any other worker can receive and perform, in turn publishing another message for step thereafter.

Now that's an idea I hadn't considered. My only concern is we end up with a lot of overhead of workers discovering/messaging/etc. however the horizontal scalability of this feels like it could work quite nicely. If we find we're slowing down, we can just add more workers. By having the pieces segregated like this, we can easily start to optimise individual components.

It feels more "scientific" than a data_sources.each{|source| Thread.new ... } type approach. :-)

> At this point, the simplest way to scale would be to simply start up more single-threaded Ruby workers to give you increased parallelisation. It's parallel processing without multi-threading in Ruby at the expense of system RAM though, as each running Ruby process typically has a 30-100MB memory footprint before it event does anything.

RAM is relatively cheap and inexhaustible compared to wallclock time. :-)

> Hopefully this will be of some interest and use to you Paul :)

Yes, all interesting and useful, thanks.

> P.S. At my old job, we ended up using RabbitMQ as our message broker, and custom-built Ruby workers consuming messages and performing the work. We had lots of different worker types doing different jobs, and some workers doing a whole range of jobs. The decision of which workers do what, how messages flow through different queues and such can have a great impact on performance if done correctly. However, that is massive topic all in it's own :)

One for an evening over beers, I expect. I think we'll look into this step first and provide feedback to the list in due course.

Thanks to everybody who replied, sorry if I missed anybody out. All very useful insights and feedback.

Paul