Jump to content

[SOLVED] Coded search


sphinx9999

Recommended Posts

I am writing a search mechanism which only brings out the loosely relevant results from the database and then uses php code to rank and display the most relevant results. I have a custom algorithm for the ranking process but am struggling to find the values to feed it. I need to bring back the maximum number of matching words in the query in the same order and sequential in the result's phrase. ???

 

For example, the query could be 'one two three four'. My results from the database could be 'two three four five', 'one two three four' and 'one two four three'. I need to store that, in order, these results have: 3 ('two three four'), 4 ('one two three four') and 2 ('one two') words sequentially matching in the correct order. Hope that makes some sense...

 

Any ideas as to how I can do this count?

 

To make it slightly more interesting/complex, consider the result 'one two five three four' which has two occurences of 2 seq matched words ('one two' and 'three four')

 

Any help would be appreciated (pseudocode or whatever).

Link to comment
Share on other sites

What you want is a word-by-word search rather than a full string search.  Consider using the FULLTEXT features in MySQL (if available on your server configuration).  You can avoid creating many of the search algorithms yourself as MySQL has built in support for quoted phrases, word inclusion (+ word), word exclusion (- word), etc. in FULLTEXT.

Link to comment
Share on other sites

Consider using the FULLTEXT features in MySQL

 

Yep, that does sound sensible. However, the business model is based around the devised algorithm so that we can guarantee certain customers will appear in the search for certain criteria. The company is insisting that we retain control of this so that it can be changed at will (and will still function exactly the same way if changed to a different DB).

Thanx

Link to comment
Share on other sites

Well, if you HAVE to use this sort of algorithm, you're going to have to do this within PHP, you wouldn't be able get mysql to do much of the work for you.

 

What i propose would only be a viable solution if there aren't thousands of results to search through, since we must work out the number of consecutive words in each result, and that involves cyling through all our search terms. The more search terms and the more results there are, the slower this would be. However, this should work, including when we have a result with two consecutive words twice, which would be more relevant than two consecutive words once, but less relevant than 3 consecutive words once (i assume anyway).

 

<?php
$search = 'one two three four';
$results[0] = 'two three four five';
$results[1] = 'one two three four';
$results[2] = 'one two four three';
$results[3] = 'one two five three four';
$search_terms = explode(' ',$search);
$matches = array();
foreach($results as $result_key => $result){//cycle through our results
foreach($search_terms as $key => $term){//use each word in the search terms to find how many consecutive matches we have
	$temp = $term;//temporary string we're searching for
	$found = 0;//how many words we've found
	$iterations = 0;//how many times while loop has run
	while(strstr($result,$temp)){//run this while the temporay string is within each result
		$found++;
		if(!isset($search_terms[$key+1+$iterations])){//if the next word doesn't exist, break - we've used all our search terms
			break;
		}
		$temp.=' '.$search_terms[$key+1+$iterations];//if it DOES exist, change the temp string to include the next word
		$iterations++;
	}
	$matches[$result_key][] = $found;//we store EVERY consecutive number of matches we find, to allow for sorting where w words occur twice
}
}
//echo '<pre>'.print_r($matches,1).'</pre>';
function sort_array($a,$b){//user defined comparison function to order our results
rsort($a);//sort the two arrays of matches we are comparing
rsort($b);
if($a[0] > $b[0]){//if the first term in one is greater than, or les than, we can easily sort
	return -1;
}elseif($a[0] < $b[0]){
	return 1;
}else{//otherwise, they're equal, and we have to continue looking through the array untill we find an unequal pair
	foreach($a as $k=>$v){
	 	if($a[$k] == $b[$k]){
			continue;
		}
	 	return ($a[$k] > $b[$k]) ? -1 : 1;
	}
	return 1;//if the arrays are identical, we need to return something - cannot order them in any meaningful way.
}
}
uasort($matches,'sort_array');
//echo '<pre>'.print_r($matches,1).'</pre>';
echo 'Search term relevance:<br />';
foreach($matches as $k=>$v){
echo $results[$k].'<br />';
}
?>

 

Edit: I've commented that quite heavily, but if you don't understand any of it, then just ask.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.