Jump to content

simple regex question


dijona

Recommended Posts

I've got a php function that tries to make searches safer by stripping most non-essential characters. but I wanted to keep a character in there- specifically, the & sign.

 

originally I had my regex pattern as:

 

$patterns = '/[^a-zA-Z0-9\s]/';

 

but when I searched for Echo & the Bunnymen (for example) it returned search results for Echo.

 

then I thought I'd add the & like so:

 

$patterns = '/[^a-zA-Z0-9&\s]/'; \\added the &

 

but that still doesn't do it. so how can I make sure users search for the whole string 'Echo & the Bunnymen'?

 

Link to comment
Share on other sites

This worked for me: preg_replace('/[^a-zA-Z0-9&\s]/i', '', 'Echo! & t\/h/\e $Bunnymen%*()@ ');

 

The result is Echo & the Bunnymen

 

Perhaps the problem is with the ampersand itself when being passed to the search in your code? I suggest htmlentities() or htmlspecialchars(). It's also possible that the ampersand means bitwise AND or something in whatever context you're using it.

 

 

Link to comment
Share on other sites

thanks beta0x64 - I'm still new to this but I was just using echo & the bunnymen as an example - not something to specifically escape. There are lots of bands that use an & in their name so I'm trying for a somewhat more global solution. I'd like to stick with preg_replace rather than htmlentities() or htmlspecialchars() because this function is already being used for most of my safe-search duties and working fine.

Link to comment
Share on other sites

I'm not seeing a problem, using '/[^a-zA-Z0-9&\s]/ as the search parameeter seems to work fine for me:

 

echo preg_replace('/[^a-zA-Z0-9&\s]/', '', "--Echo & The Bunnymen--");
//Output: Echo & The Bunnymen
echo preg_replace('/[^a-zA-Z0-9&\s]/', '', "--Hall & Oats--");
//Output: Hall & Oats

 

You should double check the actual input data. Are you positive that the input is EXACTLY the ampersand and not the HTML code for an ampersand, i.e. "&"?

Link to comment
Share on other sites

Well, now the problem becomes a little more interesting.

 

The easy solution would be to simply allow the semi-colon as well. But, then that would allow a semi-colon anywhere in the name.

 

Assuming you do not want to allow the semi-colon and that you want to maintain "&" (instead of converting to just "&"), you could do this:

 

$patterns = array(
    "/&/",
    "/[^a-z0-9&\s]/i",
    "/&/"
);

$replacements = array(
    "&",
    "",
    "&"
);

echo preg_replace($patterns, $replacements, "--Echo & The Bunnymen--");

 

Basically, it does the following in order:

 

1. Convert any "&" to just "&"

2. Removes any character not a-z (upper or lower case), 0-9, & (ampersand), or white space

3. Convert any ampersands ("&") back to "&"

Link to comment
Share on other sites

Thanks so much for your help, that's just what I needed.

 

Now, if it's not too much trouble.. ;) how would i write the rule for the band 50/50 where obviously the problem is the slash (this time an actual slash, not the html code!!) I really need to write letters to all bands and ask them to use plain ol' vanilla characters. Once I figure that out (how to escape slashes) I think I can follow through to other examples on my own.

Link to comment
Share on other sites

A better question is what is in the data that you need to get rid of? Why are you working with dirty data?

 

Anyway to allow for a forward slash just use "\/", I know it looks funny, the backward slash is just to tell the regex processor to interpret the character as a literal.

$patterns = array(
    "/&/",
    "/[^a-z0-9&\/\s]/i",
    "/&/"
);

$replacements = array(
    "&",
    "",
    "&"
);

 

You know there are going to be other exceptions too. It might be better if you post some of the input data showing what the problem is. There might be a better solution.

 

Ones that might be problematic are ones with accented characters such as "Björk". The script above will convert that to "Bjrk". I don't know of a modifyer that is "accent insensitive" (if there is such a word)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.