Jump to content

search script pos muddle


arfa

Recommended Posts

I am building a wee search script but am stuck trying to cover multiple occurences and get tangled in a myriad of if() loops.
I am searching standard html & text files, so no SQL

Scenario:
$simple_string = 'I am stuck trying to cover multiple occurences of keywords in proximity';
$querry = 'stuck multiple';

open file(s)
strip tags
if (strstr($simple_string,$querry) {
$pos[]=strpos($simple_string,$querry); }

In the example above two values are returned for $pos
I set up a +$end -$start for $pos to return a string and we have two strings - eg:
> am stuck trying
> cover multiple occurences

My ideal in the above example would be to return just one 'merged' string.
> am stuck trying to cover multiple occurences

######
How does $pos determine its position relative to another $pos???
######

Here is my script start so far:
$page_dir='./files_here';
$dh = opendir($page_dir); // just one DIR for now
        while ($files = readdir($dh)) {
            if (ereg(".\.htm$","$files")) { // test htm files
                $file_array[] = $files;    // all files as array
                $rec_num = count($file_array)-1;  }}
                      $one_bit='';  $hits='';  $text=''; $runner=''; $plod='';
foreach ($file_array as $incl) {
$str=file_get_contents("$page_dir/$incl");
                    $linker = str_replace('.htm','',$incl); // for result reporting
            $tagless = strtolower(strip_tags($str));
                            $full_len = strlen($tagless);  // total string length
                  for ($s=0; $s<=$str_num; $s++){          // loop each word of string
                  if (strstr($tagless,$str_pop[$s])) {      // found one word of string
                        $hits++;
                            $pos[] =strpos($tagless,$str_pop[$s]); // where is it
                if ($pos>=$long ) {$x=$pos-$long;} else{$x=$pos-$pos;}
                  if ($pos+$long<=$full_len) {$y=$pos+$long;} else{$y=$full_len;}
                            $less_end = $full_len-$y;
            $one_bit .= substr($tagless,$x,-$less_end)."<BR>"; // got one hit = one sting bit
}}}

I tried an array of $pos and then tried to compare proximity of results.
if else if or and...
I have tried several other different approaches to get tidy result strings but it gets very convoluted and fuzzy in my head.

Or,
maybe there another approach to this?

The site is relatively simple so I am not particularly concerned with weighting although suggestion on how to track this would also be welcome.

thanks - arfa
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.