[LRUG] Multi-threading, Ruby & Rails

Mon Sep 17 14:38:46 PDT 2012

Hi

> b) Throw it out to cloud-like infrastructures like Hadoop/MapReduce, but the problems needs direct SQL access and that can get messy

I've not tried it and I don't know whether you need SQL or "SQL-like"
but there are things like hive http://hive.apache.org/ built on top of
hadoop that may be of some use?

> JRuby has a potentially better internal implementation of Thread

JRuby threads are Java threads, so you you get their benefits - i.e.
proper use of all cores, no global interpreter lock.

In addition to parallelisation have you thought about dropping down to
another language, and using ruby to tie the bits together? Hours ->
Seconds sounds expensive otherwise.

Hope that is of some limited help

Roland

On Mon, Sep 17, 2012 at 6:57 PM, Paul Robinson <paul at 32moves.com> wrote:
> Hi all,
>
> Now the recruiter rant post is on HN, let's move that discussion over there and talk about some proper Ruby stuff, eh? Please?
>
> Right, multi-threading, Ruby and Rails.
>
> This is causing me some pain, and I suspect it's because my mid-/low-level coding voodoo left my soul sometime around 2004. The beauty of a high-level language such as Ruby mixed with the fact I have not had to spend a moment thinking about memory management in 6 years has left my deeper coding brain soft, flabby and over-obsessed with meta-programming. A little like the fattened goose before Christmas (who are *so* into meta-programming, btw).
>
> On our current project we have a linear process that takes some time to process. It can easily be parallelised, because it's a discrete set of 20-30 steps that need to be done in order for each of the 'x' number of instances we're dealing with. Right now it can take hours, and for various reasons we need it to take seconds.
>
> My first stab at this was to look at benchmarking profiles and to look for single methods that were taking up a lot of wallclock time. There aren't any. We're not locking on I/O, we're not sitting in a single method for 30% of the time or anything, it's just a long drawn-out set of processes. Interestingly, the only headliner (at 8% of wall clock) is Kernel#Integer and we can't eliminate that.
>
> So we're moving straight to parallelisation.
>
> My first thought was to either:
>
> a) Split things up into separate fork'ed processes, but I don't like the bootstrap/tidy-up overhead that fork provides
>
> b) Throw it out to cloud-like infrastructures like Hadoop/MapReduce, but the problems needs direct SQL access and that can get messy
>
> c) Multi-thread it, and at least on a single server be able to get 8x-16x performance increase over multiple cores and maybe re-visit b) but with something a bit more pure Ruby-esque like delayed job, resque, etc.
>
> The problem is, multi-threading in Ruby - particularly in Rails with ActiveRecord model actions - kinda sucks. I can get it working, but it's painful. It doesn't look or feel graceful, and frankly I'm not sure if the internal methods for doing it are all that careful.
>
> Anybody here with experience in this little niche want to open up the discussion, provide some pointers and context, before I start poking around the internals of MRI? I've discovered that JRuby has a potentially better internal implementation of Thread, but I've not had a chance to play with it in anger yet - is it worth it?
>
> Thanks in advance,
>
> Paul
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org