[LRUG] Using statistical tables with Rails

Harry Marr harry.marr at gmail.com
Fri Sep 23 04:44:12 PDT 2011


Another +1 for MongoDB - a common pattern for storing this kind of data in Mongo is using multiple collections that aggregate the data over different time spans (similar to RRDTool). So you might have a collection that aggregates hourly data, another one that aggregates daily data, and another for weekly data. This allows you to query different time spans much for efficiently and also allows you to get rid of unnecessary data much more easily.

Hourly data may be useful for the last few days, but it probably won't be interesting to you months down the line, so you can clear out old hourly data regularly (perhaps weekly), and daily data slightly less regularly (maybe monthly), leaving you with high-level long term information and fine-grained short term data. If you don't care to much for the exact time frames you keep, you could use capped collections, but be aware that as more events happen per month, a smaller time period will be retained.

Inserting the data should be really quick - you can just use an upsert with an $inc. Inserts and non-size modifying updates (such as $inc) are really quick in MongoDB, so you shouldn't really notice the hit of writing the event multiple times. Also, as it's non-cruicial data you needn't bother with 'safe' mode (again improving performance).

To inspect the data, just regular queries will take you most of the way, but if you need to do anything too special then map reduce is your friend. 

--
Harry Marr

UK: (+44) 7904-331-207
US: (+1) 415-860-1754

http://twitter.com/harrymarr
http://github.com/hmarr



On Friday, 23 September 2011 at 12:22, Richard Livsey wrote:

> Completely agree, if you've got any relationships or embedded documents then it's much easier to use an ORM, I use MongoMapper most of the time and have been very happy.
> 
> If all you're doing is throwing event/log data into a single collection, then it's worth considering whether you really need an ORM, especially when you're dealing with vast amounts of information and speed is paramount.
> 
> Cheers.
> 
> -- 
> Richard Livsey
> Co-Founder, MinuteBase 
> Meeting collaboration made easy
> http://minutebase.com
> +44 (0) 7841 260 797
> 
> On Friday, 23 September 2011 at 12:07, Gerhard Lazu wrote:
> 
> > Richard, I've tried the pure ruby driver approach for mongo, it requires a lot of work if you want to keep it pretty. Depends what you're doing, but even with 3-4 collections and simple relationships, an ORM makes life easy with little effort.
> > 
> > Mongoid with some sensible decisions worked best for us, only about 20% slower than using the driver directly (I went crazy on the benchmarks). If you write the app using EM (em pool the connections), you'll hardly notice the 20% difference. 
> > 
> > This is my 2nd large-ish production mongo & redis pure MRI app, I've been both places : ). 
> > 
> > Gerhard
> > 
> > 
> > On Fri, Sep 23, 2011 at 11:55 AM, Richard Livsey <richard at livsey.org (mailto:richard at livsey.org)> wrote:
> > > I'd recommend taking a look at mongodb [1], your use-case sounds like what it was made for and mongo is really easy to get up and running to play with.
> > > 
> > >  Because it's document oriented ('schema-less') you've got a lot of flexibility on how you structure the data you want to store. You're not stuck with a table with tonnes of NULL fields because not every event contains the same information for example.
> > > 
> > >  It's very good at throwing large amounts of event/log style information into, whilst at the same time has atomic modifiers so you can build up aggregates as you go. You still get indexes and the ability to run ad-hoc queries, so you can just start by dumping data in there and work on the aggregate counters over time.
> > > 
> > >  The raw Ruby driver is really nice and you don't need to use an ORM for most tasks, if you have a more complex model then it's worth looking at MongoMapper [2] or Mongoid [3].
> > > 
> > >  Hope that helps!
> > > 
> > >  [1] - http://mongodb.org
> > >  [2] - http://mongomapper.com
> > >  [3] - http://mongoid.org
> > > 
> > >  --
> > >  Richard Livsey
> > >  Co-Founder, MinuteBase
> > >  Meeting collaboration made easy
> > > http://minutebase.com
> > > +44 (0) 7841 260 797 (tel:%2B44%20%280%29%207841%20260%20797)
> > > 
> > >  On Friday, 23 September 2011 at 11:37, Neil Middleton wrote:
> > > 
> > > > I'm building an app that needs to store a fair amount of events that the users carry out. (Think LOTS as in millions per month).
> > > > 
> > > > I need to report on the these events (total of type x in the last month, etc) and need something resilient and fast.
> > > > 
> > > > I've toyed with Redis etc to store aggregates of the data, but this could just mean that I'm building up a massive store of single figure aggregates that aren't rebuildable.
> > > > 
> > > > Whilst this isn't a bad solution, I'm looking at storing the raw event data in tables that I can then query on a needs basis, and potentially generate aggregate counters on a periodic basis. This would thus give me the ability to add counters over time, and also carry out ad-hoc inspections on what is going on, something which aggregates don't allow.
> > > > 
> > > > Question is, how is best to do this? I obviously don't want to have to create a model for each table (which is what Rails would prefer), so do I just create the tables and interact with raw SQL on a needs basis, or is there some other choice for dealing with this sort of data?
> > > > 
> > > > It would be interesting to know what thoughts you guys have.
> > > > 
> > > > Cheers
> > > > 
> > > > Neil
> > > > 
> > > > _______________________________________________
> > > > Chat mailing list
> > > > Chat at lists.lrug.org (mailto:Chat at lists.lrug.org) (mailto:Chat at lists.lrug.org)
> > > > http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> > > 
> > > 
> > >  _______________________________________________
> > >  Chat mailing list
> > > Chat at lists.lrug.org (mailto:Chat at lists.lrug.org)
> > > http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> > 
> > _______________________________________________
> > Chat mailing list
> > Chat at lists.lrug.org (mailto:Chat at lists.lrug.org)
> > http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> 
> 
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org (mailto:Chat at lists.lrug.org)
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20110923/21aad766/attachment-0003.html>


More information about the Chat mailing list