dijona Posted May 18, 2010 Share Posted May 18, 2010 I've got a php function that tries to make searches safer by stripping most non-essential characters. but I wanted to keep a character in there- specifically, the & sign. originally I had my regex pattern as: $patterns = '/[^a-zA-Z0-9\s]/'; but when I searched for Echo & the Bunnymen (for example) it returned search results for Echo. then I thought I'd add the & like so: $patterns = '/[^a-zA-Z0-9&\s]/'; \\added the & but that still doesn't do it. so how can I make sure users search for the whole string 'Echo & the Bunnymen'? Quote Link to comment Share on other sites More sharing options...
beta0x64 Posted May 18, 2010 Share Posted May 18, 2010 This worked for me: preg_replace('/[^a-zA-Z0-9&\s]/i', '', 'Echo! & t\/h/\e $Bunnymen%*()@ '); The result is Echo & the Bunnymen Perhaps the problem is with the ampersand itself when being passed to the search in your code? I suggest htmlentities() or htmlspecialchars(). It's also possible that the ampersand means bitwise AND or something in whatever context you're using it. Quote Link to comment Share on other sites More sharing options...
dijona Posted May 18, 2010 Author Share Posted May 18, 2010 thanks beta0x64 - I'm still new to this but I was just using echo & the bunnymen as an example - not something to specifically escape. There are lots of bands that use an & in their name so I'm trying for a somewhat more global solution. I'd like to stick with preg_replace rather than htmlentities() or htmlspecialchars() because this function is already being used for most of my safe-search duties and working fine. Quote Link to comment Share on other sites More sharing options...
Psycho Posted May 19, 2010 Share Posted May 19, 2010 I'm not seeing a problem, using '/[^a-zA-Z0-9&\s]/ as the search parameeter seems to work fine for me: echo preg_replace('/[^a-zA-Z0-9&\s]/', '', "--Echo & The Bunnymen--"); //Output: Echo & The Bunnymen echo preg_replace('/[^a-zA-Z0-9&\s]/', '', "--Hall & Oats--"); //Output: Hall & Oats You should double check the actual input data. Are you positive that the input is EXACTLY the ampersand and not the HTML code for an ampersand, i.e. "&"? Quote Link to comment Share on other sites More sharing options...
dijona Posted May 19, 2010 Author Share Posted May 19, 2010 yes, actually the xml file (the input) has the & and not & - I wasn't sure which regex to use - how would I allow & in my search forms? Quote Link to comment Share on other sites More sharing options...
Psycho Posted May 19, 2010 Share Posted May 19, 2010 Well, now the problem becomes a little more interesting. The easy solution would be to simply allow the semi-colon as well. But, then that would allow a semi-colon anywhere in the name. Assuming you do not want to allow the semi-colon and that you want to maintain "&" (instead of converting to just "&"), you could do this: $patterns = array( "/&/", "/[^a-z0-9&\s]/i", "/&/" ); $replacements = array( "&", "", "&" ); echo preg_replace($patterns, $replacements, "--Echo & The Bunnymen--"); Basically, it does the following in order: 1. Convert any "&" to just "&" 2. Removes any character not a-z (upper or lower case), 0-9, & (ampersand), or white space 3. Convert any ampersands ("&") back to "&" Quote Link to comment Share on other sites More sharing options...
dijona Posted May 19, 2010 Author Share Posted May 19, 2010 Thanks so much for your help, that's just what I needed. Now, if it's not too much trouble.. how would i write the rule for the band 50/50 where obviously the problem is the slash (this time an actual slash, not the html code!!) I really need to write letters to all bands and ask them to use plain ol' vanilla characters. Once I figure that out (how to escape slashes) I think I can follow through to other examples on my own. Quote Link to comment Share on other sites More sharing options...
Psycho Posted May 19, 2010 Share Posted May 19, 2010 A better question is what is in the data that you need to get rid of? Why are you working with dirty data? Anyway to allow for a forward slash just use "\/", I know it looks funny, the backward slash is just to tell the regex processor to interpret the character as a literal. $patterns = array( "/&/", "/[^a-z0-9&\/\s]/i", "/&/" ); $replacements = array( "&", "", "&" ); You know there are going to be other exceptions too. It might be better if you post some of the input data showing what the problem is. There might be a better solution. Ones that might be problematic are ones with accented characters such as "Björk". The script above will convert that to "Bjrk". I don't know of a modifyer that is "accent insensitive" (if there is such a word) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.