squiggerz Posted November 28, 2007 Share Posted November 28, 2007 Ok, I've been at this for the better part of 8 hours now, no sign of luck yet, somebody please help if you can and are willing: <li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li> <li>Oh man, bah bah <b><font color="#FF0000">38999</font></b> <a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li> </ul> Alright, as you can see, I'm trying to extract data from unordered lists. These lists are many and spread out across a thousand different files (yes literally). What I'm trying to get into a subpattern is the last list item: the <li>Oh man, bah bah..... one. Now, my current regex wont grab the data right because I'm trying to get everything between the <li and <b (if it's there, if not, use the <font color tag). Here's my regex: @(<li>)+(.*?)(<b>)*(<font color="#FF0000">)+@ism The only problem is, in the second subpattern (.*?) it finds everything after the first <li> in the string, which includes half of that paragraph from the first code I pasted above, up until the first <b or <font color tag it comes to. Is there any way I can still use the <li tag that is relevant to the list item I want or is there some other way to get that data? I hope the regex I have there kind of explains what I'm trying to do. Ultimately, what I want in that (.*?) area would be: Oh man, bah bah Any and all help would be GREATLY appreciated. Any questions will be promptly answered as I'm sitting on this forum for even a glimmer of hope, 8 hours straight and I still cant get it right. Thanks, Sq Quote Link to comment Share on other sites More sharing options...
effigy Posted November 28, 2007 Share Posted November 28, 2007 This assumes there are no nested lists. <pre> <?php $data = <<<DATA <ul> <li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li> <li>Oh man, bah bah <b><font color="#FF0000">38999</font></b> <a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li> </ul> DATA; preg_match_all('% <li> ### Capture everything up to <b, <font, or </li. ((??!<(?:b|font|/li)).)*) ### Match everything up to </li. (??!</li).)* ### Match </li>, whitespace, and </ul> in order ### to match the last list item. </li> \s* </ul> %xs', $data, $matches); print_r($matches); ?> </pre> Quote Link to comment Share on other sites More sharing options...
squiggerz Posted November 28, 2007 Author Share Posted November 28, 2007 Ok... I kinda came up with the same thing but I have another question... ((??!<(?:b|font|/li)).)*) Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags? You see, I have these <a name="blah blah"><b><u>Stuff here</u></b></a> blocks of code... I'd like to remove just the <a name="blah blah"> and </a> tags, without affecting other 'a' type tags, like <a href tags.. Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp. Thanks, Sq Quote Link to comment Share on other sites More sharing options...
effigy Posted November 29, 2007 Share Posted November 29, 2007 Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags? Yes. Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp. Lookaround. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.