ohdang888 Posted May 29, 2010 Share Posted May 29, 2010 I'm about to start working on a content-based recommendation engine to find similar entries in my large database of news articles. Do you think the best way of going about this is a simple mysql MATCH AGAINST query? or are there better ways to go about this? btw my database has about 100,000 entries at the moment Quote Link to comment https://forums.phpfreaks.com/topic/203304-anyone-built-a-content-based-recommendation-engine/ Share on other sites More sharing options...
JonnoTheDev Posted June 1, 2010 Share Posted June 1, 2010 Do you think the best way of going about this is a simple mysql MATCH AGAINST query Absolutely not. If you want to use open source technology then have a look at Sphinx or Lucene. They are full text search indexes. Just Google or search the forum, i've posted the links up many times. Quote Link to comment https://forums.phpfreaks.com/topic/203304-anyone-built-a-content-based-recommendation-engine/#findComment-1066051 Share on other sites More sharing options...
ohdang888 Posted June 1, 2010 Author Share Posted June 1, 2010 Do you think the best way of going about this is a simple mysql MATCH AGAINST query Absolutely not. If you want to use open source technology then have a look at Sphinx or Lucene. They are full text search indexes. Just Google or search the forum, i've posted the links up many times. Thanks man. I'll look into it. I mean, just to make sure before i start: the best way (thats feasible w/out a huge R&D budget) to find relevant articles would be a FULL TEXT search engine right? Thanks, Quote Link to comment https://forums.phpfreaks.com/topic/203304-anyone-built-a-content-based-recommendation-engine/#findComment-1066357 Share on other sites More sharing options...
JonnoTheDev Posted June 1, 2010 Share Posted June 1, 2010 Yes. Related articles are best done with a token based query. When you submit an article write a script that finds word occurances and store the most common against that article. i.e If I write an article on golf then the words, 'golf', 'tee', 'fairway' are likely to have a high number of occurances. These are your tokens. If I submit another article on golf then that article will also realte to the same tokens if it has matching high word occurances. So your database: articles ===== articleId title body 1, Golf Article 1, This is the body containing the word golf. Golf is great. I love playing golf....... 1, Golf Article 2, Golf courses around the world. The best fairways. The fairways are very good on this golf course...... tokens ===== tokenId token 1, golf 2, fairways tokenToArticle ========== id tokenId articleId 1,1,1 2,1,2 3,2,2 So you can see that tokenId 1 (golf) is related to both articles. Use a full text engine such as sphinx to create an index from your database. This will perform very fast searches to pull out your related articles and also return search results from a text search box on the website. Using mysql to search text is very slow and returns poor results. Quote Link to comment https://forums.phpfreaks.com/topic/203304-anyone-built-a-content-based-recommendation-engine/#findComment-1066367 Share on other sites More sharing options...
ohdang888 Posted June 1, 2010 Author Share Posted June 1, 2010 Thank you very much! I'm trying it now, except i'm more of a coder than a computer wiz, so i'm confused about its installation and posted it earlier here: http://www.phpfreaks.com/forums/index.php/topic,299871.0.html thanks again! Quote Link to comment https://forums.phpfreaks.com/topic/203304-anyone-built-a-content-based-recommendation-engine/#findComment-1066370 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.