dijona Posted May 18, 2010 Share Posted May 18, 2010 I've got a php function that tries to make searches safer by stripping most non-essential characters. but I wanted to keep a character in there- specifically, the & sign. originally I had my regex pattern as: $patterns = '/[^a-zA-Z0-9\s]/'; but when I searched for Echo & the Bunnymen (for example) it returned search results for Echo. then I thought I'd add the & like so: $patterns = '/[^a-zA-Z0-9&\s]/'; \\added the & but that still doesn't do it. so how can I make sure users search for the whole string 'Echo & the Bunnymen'? Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/ Share on other sites More sharing options...
beta0x64 Posted May 18, 2010 Share Posted May 18, 2010 This worked for me: preg_replace('/[^a-zA-Z0-9&\s]/i', '', 'Echo! & t\/h/\e $Bunnymen%*()@ '); The result is Echo & the Bunnymen Perhaps the problem is with the ampersand itself when being passed to the search in your code? I suggest htmlentities() or htmlspecialchars(). It's also possible that the ampersand means bitwise AND or something in whatever context you're using it. Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1059843 Share on other sites More sharing options...
dijona Posted May 18, 2010 Author Share Posted May 18, 2010 thanks beta0x64 - I'm still new to this but I was just using echo & the bunnymen as an example - not something to specifically escape. There are lots of bands that use an & in their name so I'm trying for a somewhat more global solution. I'd like to stick with preg_replace rather than htmlentities() or htmlspecialchars() because this function is already being used for most of my safe-search duties and working fine. Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1060095 Share on other sites More sharing options...
Psycho Posted May 19, 2010 Share Posted May 19, 2010 I'm not seeing a problem, using '/[^a-zA-Z0-9&\s]/ as the search parameeter seems to work fine for me: echo preg_replace('/[^a-zA-Z0-9&\s]/', '', "--Echo & The Bunnymen--"); //Output: Echo & The Bunnymen echo preg_replace('/[^a-zA-Z0-9&\s]/', '', "--Hall & Oats--"); //Output: Hall & Oats You should double check the actual input data. Are you positive that the input is EXACTLY the ampersand and not the HTML code for an ampersand, i.e. "&"? Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1060655 Share on other sites More sharing options...
dijona Posted May 19, 2010 Author Share Posted May 19, 2010 yes, actually the xml file (the input) has the & and not & - I wasn't sure which regex to use - how would I allow & in my search forms? Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1060758 Share on other sites More sharing options...
Psycho Posted May 19, 2010 Share Posted May 19, 2010 Well, now the problem becomes a little more interesting. The easy solution would be to simply allow the semi-colon as well. But, then that would allow a semi-colon anywhere in the name. Assuming you do not want to allow the semi-colon and that you want to maintain "&" (instead of converting to just "&"), you could do this: $patterns = array( "/&/", "/[^a-z0-9&\s]/i", "/&/" ); $replacements = array( "&", "", "&" ); echo preg_replace($patterns, $replacements, "--Echo & The Bunnymen--"); Basically, it does the following in order: 1. Convert any "&" to just "&" 2. Removes any character not a-z (upper or lower case), 0-9, & (ampersand), or white space 3. Convert any ampersands ("&") back to "&" Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1060776 Share on other sites More sharing options...
dijona Posted May 19, 2010 Author Share Posted May 19, 2010 Thanks so much for your help, that's just what I needed. Now, if it's not too much trouble.. how would i write the rule for the band 50/50 where obviously the problem is the slash (this time an actual slash, not the html code!!) I really need to write letters to all bands and ask them to use plain ol' vanilla characters. Once I figure that out (how to escape slashes) I think I can follow through to other examples on my own. Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1060870 Share on other sites More sharing options...
Psycho Posted May 19, 2010 Share Posted May 19, 2010 A better question is what is in the data that you need to get rid of? Why are you working with dirty data? Anyway to allow for a forward slash just use "\/", I know it looks funny, the backward slash is just to tell the regex processor to interpret the character as a literal. $patterns = array( "/&/", "/[^a-z0-9&\/\s]/i", "/&/" ); $replacements = array( "&", "", "&" ); You know there are going to be other exceptions too. It might be better if you post some of the input data showing what the problem is. There might be a better solution. Ones that might be problematic are ones with accented characters such as "Björk". The script above will convert that to "Bjrk". I don't know of a modifyer that is "accent insensitive" (if there is such a word) Link to comment https://forums.phpfreaks.com/topic/202104-simple-regex-question/#findComment-1060881 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.