Jump to content

[SOLVED] Alphanumeric plus hypens, periods, and underscores.


bundyxc

Recommended Posts

I'm working on a CMS that requires to download MySpace profile pages. The URL format is standard, except for the username portion. The username can contain letters, numbers, hyphens, and periods.

 

Valid

_n.E.E.d.l.e_

_____NEEDLE-123

_-5678...needle-_

 

Invalid

n e e d l e

!@#$%^&*

needle'd!

 

I was using this regex:

^[a-zA-Z0-9_]*$

 

But it allowed ! (although I'm not sure how/why), and didn't allow hyphens/periods.

Thank you for your time.

Link to comment
Share on other sites

^[a-zA-Z0-9_.-]*$

 

If the hyphen is placed as the first or last character in the character class, it is treated as a literal (as opposed to a range), and meta characters like the dot lose their special meaning within a character class, and as a result do not require escaping. So one solution could be:

 

#^[a-z0-9_.-]*$#i

 

Granted, the star modifier (zero or more times) means it can match nothing.. so if there is a minimum length for username, you could use an interval instead:

 

#^[a-z0-9_.-]{3,}$#i    // minimum 3 characters long...

 

Also note that patterns like these could also match an username like ....----a---.. So I'm not sure what kind of restrictions you have in place to check for those kind of wacky situations.

 

 

Link to comment
Share on other sites

I would actually consider ....----a---.. valid, so this isn't a problem. The application just has to request a page from MySpace, and if it were to request "myspace.com/....----a---..", then MySpace would just say:

 

Invalid Friend ID.
This user has either cancelled their membership, or their account has been deleted.

 

And I use that error message to tell if the username is invalid. The regex that I requested was simply so that users wouldn't use ! or anything simliar, as that would just get a "Page cannot be found". The same is true when it comes to crazy ASCII characters.

 

With that said, I have a problem. I'm using the following snippet of code, and I'm sure that due to a syntactical error, it isn't working how it should.

 

$post = preg_replace('/#^[a-z0-9_.-]{1,}$#i/', '', 'd!@#agsd$%^&*');
print_r($post); //Outputs d!@#agsd$%^&

 

As you can see, nothing has changed in the string. In my mind, the regex is supposed to do this:

 

Go to the string, and replace every character that is NOT alphanumeric/period/hyphen, with '' (deleting it completely). Once this is done, declare this new string as $post.

 

Obviously my logic is flawed. Thanks for the help guys!

Link to comment
Share on other sites

okay so since you are wanting to replace things, you need to remove the ^ and $ also, doing {1,} is effectively the same as just using +, not that you need it at all in a preg_replace in this instance, since preg_replace will replace all instances anyways.  Also, it needs to be a negative character class, since you want to replace anything that is not those characters.

 

$post = preg_replace('/#[^a-z0-9_.-]#i/', '', 'd!@#agsd$%^&*');

Link to comment
Share on other sites

Except you'll also need to remove the delimiting / characters. nrg chose a hash sign as delimiter. You can use a forward slash as well, but you can't just add that, you'll have to change it.

 

$post = preg_replace('#[^a-z0-9_.-]#i', '', 'd!@#agsd$%^&*');

Link to comment
Share on other sites

Except you'll also need to remove the delimiting / characters. nrg chose a hash sign as delimiter. You can use a forward slash as well, but you can't just add that, you'll have to change it.

 

$post = preg_replace('#[^a-z0-9_.-]#i', '', 'd!@#agsd$%^&*');

 

Honestly, I just had no clue what I was doing. I saw "/[regex]/"[ so much, I just thought that the forwardslash was part of the syntax of a regular expression.  :shrug:

 

Thanks for all the help guys. I'm trying to learn regex, but it's a slow process. haha.

Link to comment
Share on other sites

Honestly, I just had no clue what I was doing. I saw "/[regex]/"[ so much, I just thought that the forwardslash was part of the syntax of a regular expression.

 

In pcre, delimiters (typically seen as the forward slash) can be any non alpha numeric, non whitespace ASCII character (except for the backslash).

So you can use #[regex]#, or /[regex]/ or ~[regex]~, or ![regex]!, or.... you get the idea...

 

EDIT - just be mindful that if you use a character inside your pattern that is also used as your delimiters, you escape them (using the backslash), otherwise, you'll run into unknown modifier messages.. not pretty.

Link to comment
Share on other sites

Thanks! I'm currently trying to learn from this tut: http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html

 

It's the best I've found. Thanks for helping me understand regex delimiters. I didn't know that you'd need any, as it's a string. I figured that the beginning and end of the string would work as delimiters. ;)

 

But I'm just going to assume that you can put functions/operators outside of the delimiters.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.