Jump to content

Recommended Posts

$query = "SELECT DISTINCT COUNT(*) as occurrences, id, title, body FROM blog WHERE (";

                
    while(list($key,$val)=each($split_search)){   
              if($val<>" " and strlen($val) > 0){   
              $query .= "(title LIKE '%".$val."%' OR body LIKE '%".$val."%') OR";
      
              }   
    }   
              $query=substr($query,0,(strLen($query)-3));//this will eat the last OR   

  
	$query .= ") GROUP BY id ORDER BY occurrences DESC";
       

 

I've been following along a tutorial on creating a search engine that uses the above code to generate the query from an array created from the search terms.  I've purposefully created blog entries with very different number of occurences of the search terms i'm testing with, but although the query is valid, it is not ordering them by number of occurences.  The syntax of the query generation is exactly like from the tutorial and the query output looks to be correct.  Anybody able to see what might be wrong with this?

SELECT DISTINCT COUNT(*) as occurrences, id, title, body FROM blog WHERE ((title LIKE '%dead%' OR body LIKE '%dead%') OR(title LIKE '%sea%' OR body LIKE '%sea%') OR(title LIKE '%scrolls%' OR body LIKE '%scrolls%')) GROUP BY id ORDER BY occurrences DESC

 

This is the query output

I was having the same issue so I decided to create my own little test. Basically it uses a main article and grabs the keywords and finds related articles. However the mainArticle parameter can be keywords you used to pull up your results suchs as something like $keywords = "scrolls dead sea"; From that it will than search the results you retrieved from the database (must be in an array with all the terms you want to search) and returns a new array with the matching articles and in an order of most keywords found.

 

Fun stuff.

 

<?php
$mainArticle = 'Computer programming is one of the best for certain php languages that are not of this world! that';

$articles[0] = 'This is a topic that is totally not related to the first article at all!';
$articles[1] = 'Programming in PHP has done miraculous wonders to this time and many other exciting events!';
$articles[2] = 'Programming This is a topic that is totally not related to the first article at all';
$articles[3] = 'Programming in Computer language PHP has done This is a topic that is totally not related to the first article at all';
$articles[4] = 'Certainly this is not a computer topic This is a topic that is totally not related to the first article at all or is it world of languages';
$articles[5] = 'Jack and jill went up the hill to fetch a pail of water, jack fell down and broke his crown and jill came tumbling after!';

$related = relatedTest($mainArticle, $articles);
print "The following articles are related to : " . $mainArticle . " (ordered by most revlevant)<br /><br />";
foreach ($related as $key => $matches) {
print "Article: " . $articles[$key] . "<br />";
}

print "<br /><br /><br />These were all the articles used.<br /><br />";

foreach ($articles as $article) {
print $article . "<br />";
}

function relatedTest($mainArticle, $articles) {
$mainArticle = stripCommons($mainArticle);
$words = explode(" ", $mainArticle);

foreach ($articles as $key => $article) {
	$artWords[$key] = explode(" ", stripCommons($article));

	$matches = compareWords($words, $artWords[$key]);

	if ($matches > 0) {
		$match[$key] = $matches;
	}else {
		unset($artWords[$key]);
	}
}
arsort($match);
return $match;
}

function compareWords($words, $compwords) {
$match = 0;
if (is_array($words)) {
	foreach ($words as $word) {
		foreach ($compwords as $compword) {
			if (strtolower($compword) == strtolower($word)) {
				$match++;
			}
		}
	}
}

return $match;
}

function stripCommons($article) {
$article = ereg_replace("'|\.|\?|!|,|\"|&|:|-|\[|\]|\(|\)|\+|=|~|\||\*|\^|%|\$|@|#|<|>|`|;|_|\{|\}", "", $article);
$article = " " . $article . " ";
$commonWords = array("if", "you", "so", "it", "its", "is", "of", 
					"or", "by", "on", "but", "a", "was", "for", "it", 
						"this", "was", "to", "are", "can", "you", "your", 
						"any", "or", "the", "with", "this", "not", "at", "and", "that");
$commonWords = strlenSort($commonWords);	

foreach ($commonWords as $word) {
	if (eregi(" ".$word." ", $article)) {
		$article = str_replace(" ".$word." ", " ", $article);
	}
}

return trim($article);
}

function strlenSort($array) {
// sort array by string length
foreach ($array as $key => $size) {
	$newArray[$key] = strlen($size);
}
arsort($newArray, SORT_NUMERIC);

$i=0;
foreach ($newArray as $key => $size) {
	$returnArr[$i++] = $array[$key];
}

return $returnArr;
}
?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.