per1os Posted April 13, 2007 Share Posted April 13, 2007 Here is the story, I want to create an automated "Related Articles" as to say feature. Where basically a user creates an article and from there it scans what that author currently has in the database and returns the top 5 most relevant articles. Pretty straight forward, no fancy work really. Now I would prefer to do this with MySQL FullText search. But let's say the author is new and only has 3 articles no results will be returned due to this: "... you should add at least 3 rows to the table before you try to match anything, and what you're searching for should only be contained in one of the three rows. This is because of the 50% threshold. If you insert only one row, then now matter what you search for, it is in 50% or more of the rows in the table, and therefore disregarded." Really going about it this way would only prove effective if the user has at least 10 articles, especially if most of the articles are of similar topics. Is there a work around so that MySQL will return the rows even if the 50% threshold is breached? Now I have created a secondary option, that would be a "work around" if this is not possible. The code is shown below, the reason I do not want to use the code is because I want consistency in my code, I do not want to say "if the author has less than 10 articles than grab all 10 articles and process them through this code". Any insight on the problem is appreciated. Let me re-iterate, the code works fine but would not due the trick against let's say 50 articles due to efficiency, which is why MySQL should be the preferred solution with Full-Text search capabilities. Here is my related topic code (note please excuse the sloppyness, just threw it together.): <?php $mainArticle = 'Computer programming is one of the best for certain php languages that are not of this world! that'; $articles[0] = 'This is a topic that is totally not related to the first article at all!'; $articles[1] = 'Programming in PHP has done miraculous wonders to this time and many other exciting events!'; $articles[2] = 'Programming This is a topic that is totally not related to the first article at all'; $articles[3] = 'Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all'; $articles[4] = 'Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages'; $articles[5] = 'Jack and jill went up the hill to fetch a pail of water, jack fell down and broke his crown and jill came tumbling after!'; $related = relatedTest($mainArticle, $articles); print "The following articles are related to : " . $mainArticle . " (ordered by most revlevant)<br /><br />"; foreach ($related as $key => $matches) { print "Article: " . $articles[$key] . "<br />"; } print "<br /><br /><br />These were all the articles used.<br /><br />"; foreach ($articles as $article) { print $article . "<br />"; } function relatedTest($mainArticle, $articles) { $mainArticle = stripCommons($mainArticle); $words = explode(" ", $mainArticle); foreach ($articles as $key => $article) { $artWords[$key] = explode(" ", stripCommons($article)); $matches = compareWords($words, $artWords[$key]); if ($matches > 0) { $match[$key] = $matches; }else { unset($artWords[$key]); } } arsort($match); return $match; } function compareWords($words, $compwords) { $match = 0; if (is_array($words)) { foreach ($words as $word) { foreach ($compwords as $compword) { if (strtolower($compword) == strtolower($word)) { $match++; } } } } return $match; } function stripCommons($article) { $article = ereg_replace("'|\.|\?|!|,|\"|&|:|-|\[|\]|\(|\)|\+|=|~|\||\*|\^|%|\$|@|#|<|>|`|;|_|\{|\}", "", $article); $article = " " . $article . " "; $commonWords = array("if", "u", "so", "it", "its", "is", "of", "or", "by", "on", "but", "a", "was", "for", "it", "this", "was", "to", "are", "can", "you", "your", "any", "or", "the", "with", "this", "not", "at", "and", "that"); $commonWords = strlenSort($commonWords); foreach ($commonWords as $word) { if (eregi(" ".$word." ", $article)) { $article = str_replace(" ".$word." ", " ", $article); } } return trim($article); } function strlenSort($array) { // sort array by string length foreach ($array as $key => $size) { $newArray[$key] = strlen($size); } arsort($newArray, SORT_NUMERIC); $i=0; foreach ($newArray as $key => $size) { $returnArr[$i++] = $array[$key]; } return $returnArr; } ?> Here is what the code above will output =) The following articles are related to : Computer programming is one of the best for certain php languages that are not of this world! that (ordered by most revlevant) Article: Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages Article: Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all Article: Programming in PHP has done miracuouls wonders to this time and many other exciting events! Article: Programming This is a topic that is totally not related to the first article at all These were all the articles used. This is a topic that is totally not related to the first article at all! Programming in PHP has done miracuouls wonders to this time and many other exciting events! Programming This is a topic that is totally not related to the first article at all Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages Jack and jill went up the hill to fetch a pail of water, jack fell down and broke his crown and jill came tumbling after! Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/46928-related-topics-quest/ Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.