Jump to content

Related Topics Quest


per1os

Recommended Posts

Here is the story, I want to create an automated "Related Articles" as to say feature. Where basically a user creates an article and from there it scans what that author currently has in the database and returns the top 5 most relevant articles. Pretty straight forward, no fancy work really.

 

Now I would prefer to do this with MySQL FullText search. But let's say the author is new and only has 3 articles no results will be returned due to this:

 

"... you should add at least 3 rows to the table before you try to match anything, and what you're searching for should only be contained in one of the three rows. This is because of the 50% threshold. If you insert only one row, then now matter what you search for, it is in 50% or more of the rows in the table, and therefore disregarded."

 

Really going about it this way would only prove effective if the user has at least 10 articles, especially if most of the articles are of similar topics.  Is there a work around so that MySQL will return the rows even if the 50% threshold is breached?

 

Now I have created a secondary option, that would be a "work around" if this is not possible. The code is shown below, the reason I do not want to use the code is because I want consistency in my code, I do not want to say "if the author has less than 10 articles than grab all 10 articles and process them through this code".

 

Any insight on the problem is appreciated.  Let me re-iterate, the code works fine but would not due the trick against let's say 50 articles due to efficiency, which is why MySQL should be the preferred solution with Full-Text search capabilities.

 

Here is my related topic code (note please excuse the sloppyness, just threw it together.):

 

<?php
$mainArticle = 'Computer programming is one of the best for certain php languages that are not of this world! that';

$articles[0] = 'This is a topic that is totally not related to the first article at all!';
$articles[1] = 'Programming in PHP has done miraculous wonders to this time and many other exciting events!';
$articles[2] = 'Programming This is a topic that is totally not related to the first article at all';
$articles[3] = 'Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all';
$articles[4] = 'Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages';
$articles[5] = 'Jack and jill went up the hill to fetch a pail of water, jack fell down and broke his crown and jill came tumbling after!';

$related = relatedTest($mainArticle, $articles);
print "The following articles are related to : " . $mainArticle . " (ordered by most revlevant)<br /><br />";
foreach ($related as $key => $matches) {
print "Article: " . $articles[$key] . "<br />";
}

print "<br /><br /><br />These were all the articles used.<br /><br />";

foreach ($articles as $article) {
print $article . "<br />";
}

function relatedTest($mainArticle, $articles) {
$mainArticle = stripCommons($mainArticle);
$words = explode(" ", $mainArticle);

foreach ($articles as $key => $article) {
	$artWords[$key] = explode(" ", stripCommons($article));

	$matches = compareWords($words, $artWords[$key]);

	if ($matches > 0) {
		$match[$key] = $matches;
	}else {
		unset($artWords[$key]);
	}
}
arsort($match);
return $match;
}

function compareWords($words, $compwords) {
$match = 0;
if (is_array($words)) {
	foreach ($words as $word) {
		foreach ($compwords as $compword) {
			if (strtolower($compword) == strtolower($word)) {
				$match++;
			}
		}
	}
}

return $match;
}

function stripCommons($article) {
$article = ereg_replace("'|\.|\?|!|,|\"|&|:|-|\[|\]|\(|\)|\+|=|~|\||\*|\^|%|\$|@|#|<|>|`|;|_|\{|\}", "", $article);
$article = " " . $article . " ";
$commonWords = array("if", "u", "so", "it", "its", "is", "of", 
					"or", "by", "on", "but", "a", "was", "for", "it", 
						"this", "was", "to", "are", "can", "you", "your", 
						"any", "or", "the", "with", "this", "not", "at", "and", "that");
$commonWords = strlenSort($commonWords);	

foreach ($commonWords as $word) {
	if (eregi(" ".$word." ", $article)) {
		$article = str_replace(" ".$word." ", " ", $article);
	}
}

return trim($article);
}

function strlenSort($array) {
// sort array by string length
foreach ($array as $key => $size) {
	$newArray[$key] = strlen($size);
}
arsort($newArray, SORT_NUMERIC);

$i=0;
foreach ($newArray as $key => $size) {
	$returnArr[$i++] = $array[$key];
}

return $returnArr;
}
?>

 

Here is what the code above will output =)

 

The following articles are related to : Computer programming is one of the best for certain php languages that are not of this world! that (ordered by most revlevant)

Article: Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages
Article: Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all
Article: Programming in PHP has done miracuouls wonders to this time and many other exciting events!
Article: Programming This is a topic that is totally not related to the first article at all



These were all the articles used.

This is a topic that is totally not related to the first article at all!
Programming in PHP has done miracuouls wonders to this time and many other exciting events!
Programming This is a topic that is totally not related to the first article at all
Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all
Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages
Jack and jill went up the hill to fetch a pail of water, jack fell down and broke his crown and jill came tumbling after!

 

Thanks!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.