Jump to content

Easy regex


mikesta707

Recommended Posts

I know this is easy, but for some reason, regex just kicks my ass.

 

The pattern I currently have goes like so

$pattern = "/[a-zA-Z]{2,3}\/web\/[0-9]{9,11}\/\.html/";

 

i'm trying to match URLs that look like this:

mnh/web/(somes numbers).html
thc/web/(numbs).html
dcf/web/(numbs).html

Note that the first 3 digits are basically for certain areas (IE manhattan is mnh, queens is que or something, etc.) I used a character class that includes any characters of 2-3 letters in length to make it easier on myself.

 

I'm sure its a simple fix, but I just can't seem to figure it out

Link to comment
Share on other sites

Do you have an example of something that doesn't match that you think should? My first suggestion is that since you are working with a path don't use slashes as your delimiters, it just complicates things when you have to escape them in the string. I don't think you need the last one at all as the number seems like a filename and that shouldn't contain a forward slash.

 

$pattern = "#[a-z]{2,3}/web/[0-9]{9,11}\.html#i";

Link to comment
Share on other sites

Ok, so i'm finding out im really bad at this. Im trying to capture the stuff inside of a div tag that looks like

<div id="userbody">
stuff stuff
</div>

 

my pattern looks like this

$pattern = '#<div id="userbody">(.)+</div>#i';

 

I do the following, with the above $pattern

//$stuff is the html
if (preg_match($pattern, $stuff, $matches)){
print_r($matches);
}
else {
echo "Failure";
}

 

and always get failure. When I change the pattern to just

$pattern = '#<div id="userbody">#i';

I seem to get a match, but when I print_r matches, its empty (and I'm not really sure if it should be empty, but since I have no capturing group, I'm assuming thats right)

 

any idea on whats wrong with my pattern?

Link to comment
Share on other sites

By default the dot won't match newlines, to make it do so you need to add the s modifier to the end of your pattern.  You'll also probably want to use (.+) rather than (.)+ as the latter will only capture the very last character that it can find.

Link to comment
Share on other sites

To further comment, be careful when using greedy quantifiers like .* or .+, as this will greedily match (or in your case, capture) as much as it can, then backtrack till it matches what comes after it in the pattern.. so if you have multiple nested divs, you may end up matching more than you bargined for... in cases like this, I would recommend using lazy quantifiers instead: (.+?)

 

This thread has discussions / explanations on this matter:

http://www.phpfreaks.com/forums/index.php/topic,236933.msg1103233.html (my post is #11 (which upon re-reading, probably would have worded it differently, but still gets the point across) and cv's #14 (more colourful / illustrative)).

 

This is not to suggest that greedy quantifiers are in and of themselves inherently bad.. but rather that they are bad when improperly employed, which might end in undesirable results.

Link to comment
Share on other sites

Thanks guys! It was the newlines. I did read about the . not matching newlines, but i tried to test if there were newlines in the text (it seems I tested wrong) Again thanks alot.

 

and yeah I also read about "greediness" vs. "laziness", but at this point I was just trying to get a working regex, and probably would have optimized it afterwards. Thanks for the tips though! greatly appreciated

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.