Prismatic Posted July 19, 2009 Share Posted July 19, 2009 Ok so I've got a difficult regex problem (well, for me anyways) I'm trying to parse some HTML, a forum. The pattern successfully pulls out usernames and post counts, but I also need it to pull out the username of the person who edited a post, if it exists. Is there a way to tell regex (preg) to "Keep going over any text and new lines until we see the edited by name, or if we find our original pattern stop." Here is the pattern that successfully pulls out usernames and post counts &which=boards">([a-zA-Z-_]*)<\/a> \s*.*?alt="(\d*) posts" Now heres what I tried to make it find the block of HTML that may or may not exist that contains the Edited by text &which=boards">([a-zA-Z-_]*)<\/a> \s*.*?alt="(\d*) posts"[.*\s*]*(?:<span(?:.*)>Edited By:<\/span>\s*<a(?:.*)>([a-zA-Z-_]*)<\/a>)? It never matches the edited by text. Any ideas? Note that the number of lines between where it finds the post count and the Edited By text can change. Huge kudos to anyone who can help me [attachment deleted by admin] Quote Link to comment Share on other sites More sharing options...
thebadbad Posted July 19, 2009 Share Posted July 19, 2009 Try '~&which=boards">([a-z_-]*)</a> .+?alt="([0-9]+) posts"(?:.+?<span[^>]*>Edited By:</span>\s*<[^>]+>([a-z_-]*)</a>)?~is' Probably not the most elegant way to do it, but I hope it at least works ^^ Edit: Actually, I'm afraid my pattern captures the wrong "Edited By" user name, when there's no edit. Ugh. Bed time. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted July 19, 2009 Share Posted July 19, 2009 Hmmm, is this along the lines of what you are looking for? Example: $html = <<<EOD <a href="&which=boards">RandomUsername</a> <span><a alt="16609 posts" border="0 hspace="0" vspace="0" align="absmiddle"></span> <br/> <span style="font-weight:bold;">Edited By:</span> <a href="&which=boards">RandomUsername</a> <a href="&which=boards">AnotherRandomUser</a> <span><a alt="9719 posts" border="0 hspace="0" vspace="0" align="absmiddle"></span> <br/> <a href="&which=boards">SomeGuy</a> <span><a alt="16609 posts" border="0 hspace="0" vspace="0" align="absmiddle"></span> <br/> EOD; preg_match_all('#href="&which=boards">(.+?)</a> \R+<span><a alt="(\d+) posts[^>]+></span>\R+<br/>(?:\R+<span.*?>Edited By:</span>\R+<a href="&which=boards">(.+?)</a>)?#si', $html, $matches, PREG_SET_ORDER); $total = count($matches); for ($a = 0 ; $a < $total ; $a++) { unset($matches[$a][0]); // get ride of array[0] (which holds everything that preg_match_all matches) } echo '<pre>'.print_r($matches, true); Output: Array ( [0] => Array ( [1] => RandomUsername [2] => 16609 [3] => RandomUsername ) [1] => Array ( [1] => AnotherRandomUser [2] => 9719 ) [2] => Array ( [1] => SomeGuy [2] => 16609 ) ) In either case, for future reference Prismatic, instead of including screenshots, perhaps cutting and pasting the sample portion code in question (saves from having to retype the sample into the IDE to test things out). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.