bundyxc Posted August 9, 2009 Share Posted August 9, 2009 I'm working on a CMS that requires to download MySpace profile pages. The URL format is standard, except for the username portion. The username can contain letters, numbers, hyphens, and periods. Valid _n.E.E.d.l.e_ _____NEEDLE-123 _-5678...needle-_ Invalid n e e d l e !@#$%^&* needle'd! I was using this regex: ^[a-zA-Z0-9_]*$ But it allowed ! (although I'm not sure how/why), and didn't allow hyphens/periods. Thank you for your time. Quote Link to comment Share on other sites More sharing options...
Garethp Posted August 9, 2009 Share Posted August 9, 2009 ^[a-zA-Z0-9_\.\-]*$ Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 9, 2009 Share Posted August 9, 2009 ^[a-zA-Z0-9_.-]*$ If the hyphen is placed as the first or last character in the character class, it is treated as a literal (as opposed to a range), and meta characters like the dot lose their special meaning within a character class, and as a result do not require escaping. So one solution could be: #^[a-z0-9_.-]*$#i Granted, the star modifier (zero or more times) means it can match nothing.. so if there is a minimum length for username, you could use an interval instead: #^[a-z0-9_.-]{3,}$#i // minimum 3 characters long... Also note that patterns like these could also match an username like ....----a---.. So I'm not sure what kind of restrictions you have in place to check for those kind of wacky situations. Quote Link to comment Share on other sites More sharing options...
bundyxc Posted August 9, 2009 Author Share Posted August 9, 2009 I would actually consider ....----a---.. valid, so this isn't a problem. The application just has to request a page from MySpace, and if it were to request "myspace.com/....----a---..", then MySpace would just say: Invalid Friend ID. This user has either cancelled their membership, or their account has been deleted. And I use that error message to tell if the username is invalid. The regex that I requested was simply so that users wouldn't use ! or anything simliar, as that would just get a "Page cannot be found". The same is true when it comes to crazy ASCII characters. With that said, I have a problem. I'm using the following snippet of code, and I'm sure that due to a syntactical error, it isn't working how it should. $post = preg_replace('/#^[a-z0-9_.-]{1,}$#i/', '', 'd!@#agsd$%^&*'); print_r($post); //Outputs d!@#agsd$%^& As you can see, nothing has changed in the string. In my mind, the regex is supposed to do this: Go to the string, and replace every character that is NOT alphanumeric/period/hyphen, with '' (deleting it completely). Once this is done, declare this new string as $post. Obviously my logic is flawed. Thanks for the help guys! Quote Link to comment Share on other sites More sharing options...
.josh Posted August 9, 2009 Share Posted August 9, 2009 okay so since you are wanting to replace things, you need to remove the ^ and $ also, doing {1,} is effectively the same as just using +, not that you need it at all in a preg_replace in this instance, since preg_replace will replace all instances anyways. Also, it needs to be a negative character class, since you want to replace anything that is not those characters. $post = preg_replace('/#[^a-z0-9_.-]#i/', '', 'd!@#agsd$%^&*'); Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted August 9, 2009 Share Posted August 9, 2009 Except you'll also need to remove the delimiting / characters. nrg chose a hash sign as delimiter. You can use a forward slash as well, but you can't just add that, you'll have to change it. $post = preg_replace('#[^a-z0-9_.-]#i', '', 'd!@#agsd$%^&*'); Quote Link to comment Share on other sites More sharing options...
.josh Posted August 9, 2009 Share Posted August 9, 2009 haha oops, can't believe I missed that Quote Link to comment Share on other sites More sharing options...
bundyxc Posted August 9, 2009 Author Share Posted August 9, 2009 Except you'll also need to remove the delimiting / characters. nrg chose a hash sign as delimiter. You can use a forward slash as well, but you can't just add that, you'll have to change it. $post = preg_replace('#[^a-z0-9_.-]#i', '', 'd!@#agsd$%^&*'); Honestly, I just had no clue what I was doing. I saw "/[regex]/"[ so much, I just thought that the forwardslash was part of the syntax of a regular expression. Thanks for all the help guys. I'm trying to learn regex, but it's a slow process. haha. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 9, 2009 Share Posted August 9, 2009 Honestly, I just had no clue what I was doing. I saw "/[regex]/"[ so much, I just thought that the forwardslash was part of the syntax of a regular expression. In pcre, delimiters (typically seen as the forward slash) can be any non alpha numeric, non whitespace ASCII character (except for the backslash). So you can use #[regex]#, or /[regex]/ or ~[regex]~, or ![regex]!, or.... you get the idea... EDIT - just be mindful that if you use a character inside your pattern that is also used as your delimiters, you escape them (using the backslash), otherwise, you'll run into unknown modifier messages.. not pretty. Quote Link to comment Share on other sites More sharing options...
bundyxc Posted August 9, 2009 Author Share Posted August 9, 2009 Thanks! I'm currently trying to learn from this tut: http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html It's the best I've found. Thanks for helping me understand regex delimiters. I didn't know that you'd need any, as it's a string. I figured that the beginning and end of the string would work as delimiters. But I'm just going to assume that you can put functions/operators outside of the delimiters. Quote Link to comment Share on other sites More sharing options...
.josh Posted August 9, 2009 Share Posted August 9, 2009 main reason is that for whatever reason the php devs opted to have modifiers inside the regex string instead of as a separate argument, so it needs to have the delimiters so it can know pattern from modifiers. Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted August 10, 2009 Share Posted August 10, 2009 It's because they're called PCRE (Perl Compatible Regular Expressions). In Perl you'd do: if ($string =~ m/a-z0-9/i) { print "It's an alphanumeric string!"; } You also use that notation in other places, like in the editor called vi(m). Quote Link to comment Share on other sites More sharing options...
.josh Posted August 10, 2009 Share Posted August 10, 2009 well yeah, it's pcre but they could have still made the modifiers a separate argument and use the same syntax. They could have easily made the preg_xx functions have no delims, wrapped in quotes, modifiers as seperate argument, and build it internally for the regex engine. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.