squiggerz Posted November 28, 2007 Share Posted November 28, 2007 Ok, I've been at this for the better part of 8 hours now, no sign of luck yet, somebody please help if you can and are willing: <li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li> <li>Oh man, bah bah <b><font color="#FF0000">38999</font></b> <a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li> </ul> Alright, as you can see, I'm trying to extract data from unordered lists. These lists are many and spread out across a thousand different files (yes literally). What I'm trying to get into a subpattern is the last list item: the <li>Oh man, bah bah..... one. Now, my current regex wont grab the data right because I'm trying to get everything between the <li and <b (if it's there, if not, use the <font color tag). Here's my regex: @(<li>)+(.*?)(<b>)*(<font color="#FF0000">)+@ism The only problem is, in the second subpattern (.*?) it finds everything after the first <li> in the string, which includes half of that paragraph from the first code I pasted above, up until the first <b or <font color tag it comes to. Is there any way I can still use the <li tag that is relevant to the list item I want or is there some other way to get that data? I hope the regex I have there kind of explains what I'm trying to do. Ultimately, what I want in that (.*?) area would be: Oh man, bah bah Any and all help would be GREATLY appreciated. Any questions will be promptly answered as I'm sitting on this forum for even a glimmer of hope, 8 hours straight and I still cant get it right. Thanks, Sq Link to comment https://forums.phpfreaks.com/topic/79206-pcre-bits_and_fragments-explodemyhead/ Share on other sites More sharing options...
effigy Posted November 28, 2007 Share Posted November 28, 2007 This assumes there are no nested lists. <pre> <?php $data = <<<DATA <ul> <li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li> <li>Oh man, bah bah <b><font color="#FF0000">38999</font></b> <a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li> </ul> DATA; preg_match_all('% <li> ### Capture everything up to <b, <font, or </li. ((??!<(?:b|font|/li)).)*) ### Match everything up to </li. (??!</li).)* ### Match </li>, whitespace, and </ul> in order ### to match the last list item. </li> \s* </ul> %xs', $data, $matches); print_r($matches); ?> </pre> Link to comment https://forums.phpfreaks.com/topic/79206-pcre-bits_and_fragments-explodemyhead/#findComment-401193 Share on other sites More sharing options...
squiggerz Posted November 28, 2007 Author Share Posted November 28, 2007 Ok... I kinda came up with the same thing but I have another question... ((??!<(?:b|font|/li)).)*) Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags? You see, I have these <a name="blah blah"><b><u>Stuff here</u></b></a> blocks of code... I'd like to remove just the <a name="blah blah"> and </a> tags, without affecting other 'a' type tags, like <a href tags.. Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp. Thanks, Sq Link to comment https://forums.phpfreaks.com/topic/79206-pcre-bits_and_fragments-explodemyhead/#findComment-401591 Share on other sites More sharing options...
effigy Posted November 29, 2007 Share Posted November 29, 2007 Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags? Yes. Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp. Lookaround. Link to comment https://forums.phpfreaks.com/topic/79206-pcre-bits_and_fragments-explodemyhead/#findComment-402136 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.