Jump to content

PCRE... $bits_and_fragments = explode($myHead);


Recommended Posts

Ok, I've been at this for the better part of 8 hours now, no sign of luck yet, somebody please help if you can and are willing:

 

<li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li>

<li>Oh man, bah bah  <b><font color="#FF0000">38999</font></b>
<a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li>
</ul>

 

Alright, as you can see, I'm trying to extract data from unordered lists. These lists are many and spread out across a thousand different files (yes literally).

 

What I'm trying to get into a subpattern is the last list item: the <li>Oh man, bah bah..... one.

 

Now, my current regex wont grab the data right because I'm trying to get everything between the <li and <b (if it's there, if not, use the <font color tag).

 

Here's my regex:

@(<li>)+(.*?)(<b>)*(<font color="#FF0000">)+@ism

 

The only problem is, in the second subpattern (.*?) it finds everything after the first <li> in the string, which includes half of that paragraph from the first code I pasted above, up until the first <b or <font color tag it comes to.

 

Is there any way I can still use the <li tag that is relevant to the list item I want or is there some other way to get that data?

 

I hope the regex I have there kind of explains what I'm trying to do. Ultimately, what I want in that (.*?) area would be:

 

Oh man, bah bah  

 

Any and all help would be GREATLY appreciated. Any questions will be promptly answered as I'm sitting on this forum for even a glimmer of hope, 8 hours straight and I still cant get it right.

 

Thanks,

          Sq

 

Link to comment
https://forums.phpfreaks.com/topic/79206-pcre-bits_and_fragments-explodemyhead/
Share on other sites

This assumes there are no nested lists.

 

<pre>
<?php
$data = <<<DATA
<ul>
	<li><a href="../blah/bleh/96400_96549_bluh.htm">Bluh Administration</a></li>
	<li>Oh man, bah bah  <b><font color="#FF0000">38999</font></b>
	<a href="../urv/urv302/urv_38000-38999.htm#38700"><font color="#00FF00">(+)</font></a></li>
</ul>
DATA;

preg_match_all('%
	<li>
	### Capture everything up to <b, <font, or </li.
	((??!<(?:b|font|/li)).)*)
	### Match everything up to </li.
	(??!</li).)*
	### Match </li>, whitespace, and </ul> in order
	### to match the last list item.
	</li>
	\s*
	</ul>
%xs', $data, $matches);
print_r($matches);
?>
</pre>

Ok... I kinda came up with the same thing but I have another question...

 

((??!<(?:b|font|/li)).)*)

 

Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags?

 

You see, I have these

 

<a name="blah blah"><b><u>Stuff here</u></b></a>

 

blocks of code... I'd like to remove just the <a name="blah blah"> and </a> tags, without affecting other 'a' type tags, like <a href tags..

 

Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp.

 

Thanks,

        Sq

Can I use that type of set up to remove a set of tags in some code but not the stuff between those tags?

 

Yes.

 

Also can someone point me in the direction to where I can learn more about the assertions like ?: ?! and ?=, php.net doesnt seem to explain it in easy enough language for me to grasp.

 

Lookaround.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.