Jump to content

Recommended Posts

<?php
$val = "Normal (and bold) text!!";
$newval = preg_replace('/[^A-z0-9 -\']/', '', $val)
echo $newval;
// Outputs "Normal and bold text!!"  
?>

 

Why is it doing this? Have I missed something? It should only be allowing what I've asked for in the pattern - I don't see any exclamation marks being allowed.

 

Thanks in advance

space-\' creates a range of characters between a space and a single quote; resulting in the allowance of !, ", #, $, %, and &. If you want to include a hyphen, make sure it's either the first character in the character class (best practice), or escaped.

 

The same applies to A-z which includes characters you may not expect.

 

For further reference see the ASCII table.

I would be interested to know why it seems confusing - to me it looks fine, but then again I should be getting into the practice of making it understandable to others. What would've you done differently effigy?

 

As for A-z range, I see what you mean. Thanks for ASCII link, that was much better than some i've been to. I assume that A-Za-z would work better?

I would be interested to know why it seems confusing - to me it looks fine, but then again I should be getting into the practice of making it understandable to others. What would've you done differently effigy?

 

As for A-z range, I see what you mean. Thanks for ASCII link, that was much better than some i've been to. I assume that A-Za-z would work better?

 

I would personally go with one range of characters and simply make it case insensitive for readability, but effigy may have an even better idea:

<?php
$newval = preg_replace('|[^-a-z\d \']|i', '', $val);
?>

I would be interested to know why it seems confusing.... What would've you done differently effigy?

 

The best practice I suggested; placing the hyphen first in the list. It can be confusing for others because you have back to back ranges with the "loose" hyphen mixed in. If someone has to the change the pattern, they might create a subtle bug without realizing it.

 

As for A-z range, I see what you mean. Thanks for ASCII link, that was much better than some i've been to. I assume that A-Za-z would work better?

 

Yes.

 

I would personally go with one range of characters and simply make it case insensitive for readability, but effigy may have an even better idea:

 

Actually, for such a small, simple pattern, A-Za-z without the /i would be best. This saves the engine from doing the extra work of case conversion.

Thanks guys, that was really helpful.

 

One more thing - Other than obviously typing less, are there general performance benefits for using shorthand character classes? From the impression I'm getting, the gains are small until you start dealing with more complex patterns (validating e-mail addresses or URIs maybe). Again, I would guess its as much to do with whether it's confusing to read or not.

Yes and no. Less typing means less chances to make an error and also less code to look at it. These are huge benefits when trying to minimalize human error. (And don't forget the intuitive negated shorthands, \D and \S for instance.) The non-typing benefits come in to play when Unicode and locales are considered; however, if you know you're working with Unicode, I think it's better to use the \p{...} properties.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.