Jump to content

[SOLVED] Filter out everything from string except keywords?


Recommended Posts

Lets say I have this array and string:

 

$keywords = array('[keyword1]', '[keyword2]', '[keyword3]');
$string = '<h1><strong>[keyword1]</strong></h1><br /> not keyword [keyword2], or keyword <strong>[keyword3]</strong>';

 

What is the best method that I could strip everything from the $string but the $keywords. so that the end result would just a list of the keywords in $string:

 

keyword1
keyword2
keyword3

 

$keywords = array('[keyword1]', '[keyword2]', '[keyword3]');
$string = '<h1><strong>[keyword1]</strong></h1><br /> not keyword [keyword2], or keyword <strong>[keyword3]</strong>';
foreach($keywords as $word) {
   if (stristr($string,$word) !== false) // use strstr if you want case sensitivity
      $list[] = $word;
}
print_r($list);

stristr (case insensitive) or strstr (case sensitive) is the fastest way to check if one string is in another.  As far as looping through each word...well, you have a list of words, so yes, you're going to have to loop through each one.

 

I suppose you could do a preg_match_all with alternation but I don't know if that would necessarily be faster. 

If you keep the current [keyword] format (with the [..]) the stristr is faster than preg_match.  Why? Because [] is significant in regex so you have to take extra steps to escape each of those brackets (look in function preg_method below).  I have a preg2_method function that uses the same array but without brackets.  Using preg_match_all without having to go through the extra steps pretty much evens it out with stristr method. 

 

So, if the keywords have to have the brackets around them, I'd stick with the stristr.  If you can remove them, speedwise, they are about even.  But, you don't have to worry about altering code if you decide to change up your keyword delimiters, whereas, you may or may not have to change it up with preg_match.

 

So overall, I'd stick with the stristr method.

 

<?php
$keywords = array('[keyword1]', '[keyword2]', '[keyword3]');
$keywords2 = array('keyword1', 'keyword2', 'keyword3');

$string = '<h1><strong>[keyword1]</strong></h1><br /> not keyword [keyword2], or keyword <strong>[keyword3]</strong>';

function preg_method($keywords, $string) {
   $keywords = preg_quote(implode(' ',$keywords));
   $keywords = explode(' ',$keywords);
   $keywords = implode('|',$keywords);
   preg_match_all("~$keywords~",$string,$matches);
}   

function preg2_method($keywords, $string) {
   $keywords = implode('|',$keywords);
   preg_match_all("~$keywords~",$string,$matches);
}   

function stristr_method($keywords, $string) {
   foreach($keywords as $word) {
      if (stristr($string,$word) !== false)
         $list[] = $word;
   }
}

echo "preg_match_all method: <br/>";
for($x = 1; $x <= 10; $x++) {
   $start = (float) microtime(true);
   preg_method($keywords, $string);
   $time = (float) microtime(true) - $start;
   echo "$time<br/>";
}

echo "preg2 method: <br/>";
for($x = 1; $x <= 10; $x++) {
   $start = (float) microtime(true);
   preg2_method($keywords2, $string);
   $time = (float) microtime(true) - $start;
   echo "$time<br/>";
}

echo "stristr method: <br/>";
for($x = 1; $x <= 10; $x++) {
   $start = (float) microtime(true);
   stristr_method($keywords, $string);
   $time = (float) microtime(true) - $start;
   echo "$time<br/>";
}

?>

 

output:

preg_match_all method:

0.000109910964966

1.4066696167E-5

1.19209289551E-5

1.12056732178E-5

1.09672546387E-5

1.09672546387E-5

1.09672546387E-5

1.00135803223E-5

1.09672546387E-5

1.09672546387E-5

preg2 method:

1.4066696167E-5

8.82148742676E-6

9.05990600586E-6

7.86781311035E-6

7.86781311035E-6

7.86781311035E-6

9.05990600586E-6

7.86781311035E-6

8.10623168945E-6

8.10623168945E-6

stristr method:

2.50339508057E-5

9.05990600586E-6

8.10623168945E-6

8.10623168945E-6

7.15255737305E-6

8.10623168945E-6

8.10623168945E-6

8.10623168945E-6

7.15255737305E-6

8.10623168945E-6

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.