Picabrillo Posted February 8, 2007 Share Posted February 8, 2007 <?php $val = "Normal (and bold) text!!"; $newval = preg_replace('/[^A-z0-9 -\']/', '', $val) echo $newval; // Outputs "Normal and bold text!!" ?> Why is it doing this? Have I missed something? It should only be allowing what I've asked for in the pattern - I don't see any exclamation marks being allowed. Thanks in advance Quote Link to comment Share on other sites More sharing options...
effigy Posted February 8, 2007 Share Posted February 8, 2007 space-\' creates a range of characters between a space and a single quote; resulting in the allowance of !, ", #, $, %, and &. If you want to include a hyphen, make sure it's either the first character in the character class (best practice), or escaped. The same applies to A-z which includes characters you may not expect. For further reference see the ASCII table. Quote Link to comment Share on other sites More sharing options...
Picabrillo Posted February 8, 2007 Author Share Posted February 8, 2007 Thanks effigy. Replaced '/[^A-z0-9 -\']/ ' with '/[^A-z0-9-\'\s]/' and its working fine now. Quote Link to comment Share on other sites More sharing options...
effigy Posted February 8, 2007 Share Posted February 8, 2007 Thanks effigy. Replaced '/[^A-z0-9 -\']/ ' with '/[^A-z0-9-\'\s]/' and its working fine now. I wouldn't do that; it can be confusing. Did you look into the A-z range? That's going to allow [, ], \, ^, _, and `. Quote Link to comment Share on other sites More sharing options...
Picabrillo Posted February 9, 2007 Author Share Posted February 9, 2007 I would be interested to know why it seems confusing - to me it looks fine, but then again I should be getting into the practice of making it understandable to others. What would've you done differently effigy? As for A-z range, I see what you mean. Thanks for ASCII link, that was much better than some i've been to. I assume that A-Za-z would work better? Quote Link to comment Share on other sites More sharing options...
obsidian Posted February 9, 2007 Share Posted February 9, 2007 I would be interested to know why it seems confusing - to me it looks fine, but then again I should be getting into the practice of making it understandable to others. What would've you done differently effigy? As for A-z range, I see what you mean. Thanks for ASCII link, that was much better than some i've been to. I assume that A-Za-z would work better? I would personally go with one range of characters and simply make it case insensitive for readability, but effigy may have an even better idea: <?php $newval = preg_replace('|[^-a-z\d \']|i', '', $val); ?> Quote Link to comment Share on other sites More sharing options...
effigy Posted February 9, 2007 Share Posted February 9, 2007 I would be interested to know why it seems confusing.... What would've you done differently effigy? The best practice I suggested; placing the hyphen first in the list. It can be confusing for others because you have back to back ranges with the "loose" hyphen mixed in. If someone has to the change the pattern, they might create a subtle bug without realizing it. As for A-z range, I see what you mean. Thanks for ASCII link, that was much better than some i've been to. I assume that A-Za-z would work better? Yes. I would personally go with one range of characters and simply make it case insensitive for readability, but effigy may have an even better idea: Actually, for such a small, simple pattern, A-Za-z without the /i would be best. This saves the engine from doing the extra work of case conversion. Quote Link to comment Share on other sites More sharing options...
Picabrillo Posted February 9, 2007 Author Share Posted February 9, 2007 Thanks guys, that was really helpful. One more thing - Other than obviously typing less, are there general performance benefits for using shorthand character classes? From the impression I'm getting, the gains are small until you start dealing with more complex patterns (validating e-mail addresses or URIs maybe). Again, I would guess its as much to do with whether it's confusing to read or not. Quote Link to comment Share on other sites More sharing options...
effigy Posted February 9, 2007 Share Posted February 9, 2007 Yes and no. Less typing means less chances to make an error and also less code to look at it. These are huge benefits when trying to minimalize human error. (And don't forget the intuitive negated shorthands, \D and \S for instance.) The non-typing benefits come in to play when Unicode and locales are considered; however, if you know you're working with Unicode, I think it's better to use the \p{...} properties. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.