[LRUG] Multi-threading, Ruby & Rails

Tue Sep 18 01:09:37 PDT 2012

It really depends on why exactly those steps are taking hours to run.
Are there too many database/network calls that are slowing things
down? Or is most of the time spent in actual processing? If former,
you could start with MRI 1.9 + Threads. Most of the ruby libraries now
play very well with network calls and threading, allowing other
threads to run when one is waiting for results. This is true for
mysql2, net:http.

On 17 September 2012 18:57, Paul Robinson <paul at 32moves.com> wrote:
> Hi all,
>
> Now the recruiter rant post is on HN, let's move that discussion over there and talk about some proper Ruby stuff, eh? Please?
>
> Right, multi-threading, Ruby and Rails.
>
> This is causing me some pain, and I suspect it's because my mid-/low-level coding voodoo left my soul sometime around 2004. The beauty of a high-level language such as Ruby mixed with the fact I have not had to spend a moment thinking about memory management in 6 years has left my deeper coding brain soft, flabby and over-obsessed with meta-programming. A little like the fattened goose before Christmas (who are *so* into meta-programming, btw).
>
> On our current project we have a linear process that takes some time to process. It can easily be parallelised, because it's a discrete set of 20-30 steps that need to be done in order for each of the 'x' number of instances we're dealing with. Right now it can take hours, and for various reasons we need it to take seconds.
>
> My first stab at this was to look at benchmarking profiles and to look for single methods that were taking up a lot of wallclock time. There aren't any. We're not locking on I/O, we're not sitting in a single method for 30% of the time or anything, it's just a long drawn-out set of processes. Interestingly, the only headliner (at 8% of wall clock) is Kernel#Integer and we can't eliminate that.
>
> So we're moving straight to parallelisation.
>
> My first thought was to either:
>
> a) Split things up into separate fork'ed processes, but I don't like the bootstrap/tidy-up overhead that fork provides
>
> b) Throw it out to cloud-like infrastructures like Hadoop/MapReduce, but the problems needs direct SQL access and that can get messy
>
> c) Multi-thread it, and at least on a single server be able to get 8x-16x performance increase over multiple cores and maybe re-visit b) but with something a bit more pure Ruby-esque like delayed job, resque, etc.
>
> The problem is, multi-threading in Ruby - particularly in Rails with ActiveRecord model actions - kinda sucks. I can get it working, but it's painful. It doesn't look or feel graceful, and frankly I'm not sure if the internal methods for doing it are all that careful.
>
> Anybody here with experience in this little niche want to open up the discussion, provide some pointers and context, before I start poking around the internals of MRI? I've discovered that JRuby has a potentially better internal implementation of Thread, but I've not had a chance to play with it in anger yet - is it worth it?
>
> Thanks in advance,
>
> Paul
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org

-- 
Programmer @ 37signals | http://twitter.com/lifo | http://m.onkey.org