Jump to content

[SOLVED] Ordering Search Results by most occurences


calabiyau

Recommended Posts

$query = "SELECT DISTINCT COUNT(*) as occurrences, id, title, body FROM blog WHERE (";

                
    while(list($key,$val)=each($split_search)){   
              if($val<>" " and strlen($val) > 0){   
              $query .= "(title LIKE '%".$val."%' OR body LIKE '%".$val."%') OR";
      
              }   
    }   
              $query=substr($query,0,(strLen($query)-3));//this will eat the last OR   

  
	$query .= ") GROUP BY id ORDER BY occurrences DESC";
       

 

I've been following along a tutorial on creating a search engine that uses the above code to generate the query from an array created from the search terms.  I've purposefully created blog entries with very different number of occurences of the search terms i'm testing with, but although the query is valid, it is not ordering them by number of occurences.  The syntax of the query generation is exactly like from the tutorial and the query output looks to be correct.  Anybody able to see what might be wrong with this?

SELECT DISTINCT COUNT(*) as occurrences, id, title, body FROM blog WHERE ((title LIKE '%dead%' OR body LIKE '%dead%') OR(title LIKE '%sea%' OR body LIKE '%sea%') OR(title LIKE '%scrolls%' OR body LIKE '%scrolls%')) GROUP BY id ORDER BY occurrences DESC

 

This is the query output

I was having the same issue so I decided to create my own little test. Basically it uses a main article and grabs the keywords and finds related articles. However the mainArticle parameter can be keywords you used to pull up your results suchs as something like $keywords = "scrolls dead sea"; From that it will than search the results you retrieved from the database (must be in an array with all the terms you want to search) and returns a new array with the matching articles and in an order of most keywords found.

 

Fun stuff.

 

<?php
$mainArticle = 'Computer programming is one of the best for certain php languages that are not of this world! that';

$articles[0] = 'This is a topic that is totally not related to the first article at all!';
$articles[1] = 'Programming in PHP has done miraculous wonders to this time and many other exciting events!';
$articles[2] = 'Programming This is a topic that is totally not related to the first article at all';
$articles[3] = 'Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all';
$articles[4] = 'Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages';
$articles[5] = 'Jack and jill went up the hill to fetch a pail of water, jack fell down and broke his crown and jill came tumbling after!';

$related = relatedTest($mainArticle, $articles);
print "The following articles are related to : " . $mainArticle . " (ordered by most revlevant)<br /><br />";
foreach ($related as $key => $matches) {
print "Article: " . $articles[$key] . "<br />";
}

print "<br /><br /><br />These were all the articles used.<br /><br />";

foreach ($articles as $article) {
print $article . "<br />";
}

function relatedTest($mainArticle, $articles) {
$mainArticle = stripCommons($mainArticle);
$words = explode(" ", $mainArticle);

foreach ($articles as $key => $article) {
	$artWords[$key] = explode(" ", stripCommons($article));

	$matches = compareWords($words, $artWords[$key]);

	if ($matches > 0) {
		$match[$key] = $matches;
	}else {
		unset($artWords[$key]);
	}
}
arsort($match);
return $match;
}

function compareWords($words, $compwords) {
$match = 0;
if (is_array($words)) {
	foreach ($words as $word) {
		foreach ($compwords as $compword) {
			if (strtolower($compword) == strtolower($word)) {
				$match++;
			}
		}
	}
}

return $match;
}

function stripCommons($article) {
$article = ereg_replace("'|\.|\?|!|,|\"|&|:|-|\[|\]|\(|\)|\+|=|~|\||\*|\^|%|\$|@|#|<|>|`|;|_|\{|\}", "", $article);
$article = " " . $article . " ";
$commonWords = array("if", "you", "so", "it", "its", "is", "of", 
					"or", "by", "on", "but", "a", "was", "for", "it", 
						"this", "was", "to", "are", "can", "you", "your", 
						"any", "or", "the", "with", "this", "not", "at", "and", "that");
$commonWords = strlenSort($commonWords);	

foreach ($commonWords as $word) {
	if (eregi(" ".$word." ", $article)) {
		$article = str_replace(" ".$word." ", " ", $article);
	}
}

return trim($article);
}

function strlenSort($array) {
// sort array by string length
foreach ($array as $key => $size) {
	$newArray[$key] = strlen($size);
}
arsort($newArray, SORT_NUMERIC);

$i=0;
foreach ($newArray as $key => $size) {
	$returnArr[$i++] = $array[$key];
}

return $returnArr;
}
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.