[LRUG] Using Ruby for semantic analysis and categorising documents - first steps?

Matt Haynes Matt.Haynes at bbc.co.uk
Mon Aug 20 09:20:05 PDT 2012


Similar to wikipedia miner I guess but have you considered also looking 
into a dbpedia type triple store too? Depending on your requirements 
it could prove a useful setup.

In BBC News we recently prototyped a system that first extracts named 
entities from text then cross references them with Wikipedia ID's. From 
that we could pull dbpedia data into a triple store and link it to an 
article.

Once in a triple store the queries we could write were pretty powerful,
for example a query for news articles about "Music" might search
for entities of type musician, bands, orchestra, concert venue, etc.

The prototype was no means perfect - we never really solved disambiguation
or relevancy - but the results were certainly impressive, especially for
me coming in with a more traditional "search" based mindset.

Cheers,

Matt



-----Original Message-----
From: chat-bounces at lists.lrug.org on behalf of Chris Lowis
Sent: Mon 8/20/2012 3:49 PM
To: London Ruby Users Group
Subject: Re: [LRUG] Using Ruby for semantic analysis and categorising documents - first steps?
 
>> I'm trying to spec out a feature at work, to sift through a load of text in
>> case studies or similar articles, and categorise them according to some
>> pre-determined criteria, and present them later to users of an app we're
>> build, to help them discover useful steps their business on take to reduce
>> emissions.

We've (BBC R&D) been doing something similar to this and have had
quite a bit of success using Wikipedia Miner[1], and have also
implemented our own term extraction code to work with noisy
speech-to-text transcripts. Happy to talk more about what we've been
doing with you if it looks like something that might help.

Cheers,

Chris

[1] http://wikipedia-miner.cms.waikato.ac.nz/
_______________________________________________
Chat mailing list
Chat at lists.lrug.org
http://lists.lrug.org/listinfo.cgi/chat-lrug.org


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20120820/8a19783e/attachment.html>


More information about the Chat mailing list