Hate Posted August 1, 2012 Share Posted August 1, 2012 I'm trying to parse html and get the usernames of people registered. I'm able to get most of the usernames, however there are a few usernames that have stars beside their names inside of the a tag which is throwing my expression off. Here's an example of what I mean: <a href="user.php?i=252341">Marssz3<img src="/images/i67.gif"></a> Here's my current code: preg_match_all('/<a href=\"user\.php\?i=.*">(.*?)<\/a>/ism', $row, $title How could I change my expression to ignore the "<img src="/images/i67.gif">" if it is present (not always present) and in this particular case grab the username "Marssz3" (though thats not static) Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 1, 2012 Share Posted August 1, 2012 First off there's no need to escape that double quote there, as it's not a meta character in either RegExp or single-quoted PHP strings. Then to your issue: What you want to look for is everything that's not a "<", instead of everything up to the "</a>" tag. In other words [^<]+ is what you're looking for. Quote Link to comment Share on other sites More sharing options...
Hate Posted August 2, 2012 Author Share Posted August 2, 2012 First off there's no need to escape that double quote there, as it's not a meta character in either RegExp or single-quoted PHP strings. Then to your issue: What you want to look for is everything that's not a "<", instead of everything up to the "</a>" tag. In other words [^<]+ is what you're looking for. I'm still confused. Could you explain a bit better? I learn from example. Quote Link to comment Share on other sites More sharing options...
.josh Posted August 2, 2012 Share Posted August 2, 2012 swap out the part of your regex that matches the name, with the regex ChristianF gave you. Do you know which part of your regex matches the username? Were you the one who wrote this code? Quote Link to comment Share on other sites More sharing options...
Hate Posted August 2, 2012 Author Share Posted August 2, 2012 swap out the part of your regex that matches the name, with the regex ChristianF gave you. Do you know which part of your regex matches the username? Were you the one who wrote this code? I wrote my current code. I tried swapping what ChristianF gave me, but it's not working. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 2, 2012 Share Posted August 2, 2012 Think you could post the result, so that we can see exactly what you did? Quote Link to comment Share on other sites More sharing options...
Berre Posted August 2, 2012 Share Posted August 2, 2012 This will match everything inside the brackets; a-z: [a-z] This will match everything but what's inside the brackets; a-z: [^a-z] The code he showed you will match everything but the "<" character, which is when the tag starts closing, and the username in your example is ended. [^<]+ If it's not working, it's probably because you still have "<\/a>" at the end. "[^<]" will not match your string because of the <img> tag. So you could just remove "<\/a>" at the end of your regex, or add information about the img tag. Quote Link to comment Share on other sites More sharing options...
Psycho Posted August 3, 2012 Share Posted August 3, 2012 $regEx = '#<a href="user\.php\?i=[^>]+>([^<]*)#ism'; preg_match_all($regEx, $text, $title); Although, I will proffer another suggestion. Sometimes I have had a similar problem and found that trying to build the perfect regex is either not possible, too much work, or not efficient. In those cases look to other means to post-process the data. In this instance if there was not a good solution for what you needed you could have simply used striptags() on the result to remove the image tag. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.