Jump to content

Archived

This topic is now archived and is closed to further replies.

arfa

search script pos muddle

Recommended Posts

I am building a wee search script but am stuck trying to cover multiple occurences and get tangled in a myriad of if() loops.
I am searching standard html & text files, so no SQL

Scenario:
$simple_string = 'I am stuck trying to cover multiple occurences of keywords in proximity';
$querry = 'stuck multiple';

open file(s)
strip tags
if (strstr($simple_string,$querry) {
$pos[]=strpos($simple_string,$querry); }

In the example above two values are returned for $pos
I set up a +$end -$start for $pos to return a string and we have two strings - eg:
> am stuck trying
> cover multiple occurences

My ideal in the above example would be to return just one 'merged' string.
> am stuck trying to cover multiple occurences

######
How does $pos determine its position relative to another $pos???
######

Here is my script start so far:
$page_dir='./files_here';
$dh = opendir($page_dir); // just one DIR for now
        while ($files = readdir($dh)) {
            if (ereg(".\.htm$","$files")) { // test htm files
                $file_array[] = $files;    // all files as array
                $rec_num = count($file_array)-1;  }}
                      $one_bit='';  $hits='';  $text=''; $runner=''; $plod='';
foreach ($file_array as $incl) {
$str=file_get_contents("$page_dir/$incl");
                    $linker = str_replace('.htm','',$incl); // for result reporting
            $tagless = strtolower(strip_tags($str));
                            $full_len = strlen($tagless);  // total string length
                  for ($s=0; $s<=$str_num; $s++){          // loop each word of string
                  if (strstr($tagless,$str_pop[$s])) {      // found one word of string
                        $hits++;
                            $pos[] =strpos($tagless,$str_pop[$s]); // where is it
                if ($pos>=$long ) {$x=$pos-$long;} else{$x=$pos-$pos;}
                  if ($pos+$long<=$full_len) {$y=$pos+$long;} else{$y=$full_len;}
                            $less_end = $full_len-$y;
            $one_bit .= substr($tagless,$x,-$less_end)."<BR>"; // got one hit = one sting bit
}}}

I tried an array of $pos and then tried to compare proximity of results.
if else if or and...
I have tried several other different approaches to get tidy result strings but it gets very convoluted and fuzzy in my head.

Or,
maybe there another approach to this?

The site is relatively simple so I am not particularly concerned with weighting although suggestion on how to track this would also be welcome.

thanks - arfa

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.