[LRUG] Using Ruby for semantic analysis and categorising documents - first steps?

Mon Aug 20 07:05:30 PDT 2012

Hi all,  

I'm trying to spec out a feature at work, to sift through a load of text in case studies or similar articles, and categorise them according to some pre-determined criteria, and present them later to users of an app we're build, to help them discover useful steps their business on take to reduce emissions.  

So far, we've been looking through case studies manually to get an idea of the shape of the data, to work out how we might retrieve it later on, and right now, we're relying on people to understand the content and categorise it, and this feels like something screaming out to be automated, if we can get half decent results from semantic analysis tools.

On the surface this sounds like something I might use some OpenCalais-based service, or gem like SemExtractor[1], for a first pass, and then allow clean up manually, but I'm not really familiar enough with semantic analysis tools to know if what I'm doing is a fools errand or not yet.

Before I lose a few days trying to learn about the quirks of various term-extraction API's, I wanted to ask – what tips do you wish you'd have been given before you spent a couple of days on this problem, if you've done this already?

We're working with Rails 3, and totally open to using something like SOLR or Elastic Search for parts of this, if it helps.

Thanks, and apologies in advance for the somewhat open ended question.

C

[1]: https://github.com/apneadiving/SemExtractor

--  
Chris Adams
mobile: 07974 368 229
twitter: @mrchrisadams
www: chrisadams.me.uk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20120820/0acb346d/attachment-0003.html>