Jump to content

PCRE... $bits_and_fragments = explode($myHead);


squiggerz

Recommended Posts

Ok, I've been at this for the better part of 8 hours now, no sign of luck yet, somebody please help if you can and are willing:

 

<li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li>

<li>Oh man, bah bah  <b><font color="#FF0000">38999</font></b>
<a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li>
</ul>

 

Alright, as you can see, I'm trying to extract data from unordered lists. These lists are many and spread out across a thousand different files (yes literally).

 

What I'm trying to get into a subpattern is the last list item: the <li>Oh man, bah bah..... one.

 

Now, my current regex wont grab the data right because I'm trying to get everything between the <li and <b (if it's there, if not, use the <font color tag).

 

Here's my regex:

@(<li>)+(.*?)(<b>)*(<font color="#FF0000">)+@ism

 

The only problem is, in the second subpattern (.*?) it finds everything after the first <li> in the string, which includes half of that paragraph from the first code I pasted above, up until the first <b or <font color tag it comes to.

 

Is there any way I can still use the <li tag that is relevant to the list item I want or is there some other way to get that data?

 

I hope the regex I have there kind of explains what I'm trying to do. Ultimately, what I want in that (.*?) area would be:

 

Oh man, bah bah  

 

Any and all help would be GREATLY appreciated. Any questions will be promptly answered as I'm sitting on this forum for even a glimmer of hope, 8 hours straight and I still cant get it right.

 

Thanks,

          Sq

 

Link to comment
https://forums.phpfreaks.com/topic/79206-pcre-bits_and_fragments-explodemyhead/
Share on other sites

This assumes there are no nested lists.

 

<pre>
<?php
$data = <<<DATA
<ul>
	<li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li>
	<li>Oh man, bah bah  <b><font color="#FF0000">38999</font></b>
	<a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li>
</ul>
DATA;

preg_match_all('%
	<li>
	### Capture everything up to <b, <font, or </li.
	((??!<(?:b|font|/li)).)*)
	### Match everything up to </li.
	(??!</li).)*
	### Match </li>, whitespace, and </ul> in order
	### to match the last list item.
	</li>
	\s*
	</ul>
%xs', $data, $matches);
print_r($matches);
?>
</pre>

Ok... I kinda came up with the same thing but I have another question...

 

((??!<(?:b|font|/li)).)*)

 

Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags?

 

You see, I have these

 

<a name="blah blah"><b><u>Stuff here</u></b></a>

 

blocks of code... I'd like to remove just the <a name="blah blah"> and </a> tags, without affecting other 'a' type tags, like <a href tags..

 

Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp.

 

Thanks,

        Sq

Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags?

 

Yes.

 

Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp.

 

Lookaround.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.