Jump to content

[SOLVED] Parsing HTML Comments, But Harder! Can't Figure This One Out


Recommended Posts

I'm new to regexp so I'm having a really tough time with this one.

 

I need a regexp that will parse an HTML comment *THAT contains a certain keyword*, into 3 parts:

 

(1) The part before the comment tag

(2) The comment tag itsef

(3) The part after the comment tag

 

I've tried using this regexp with preg_match():

 

/(.*)(<!--.*MyKeyword.*-->)(.*)/s

 

but, it *does not* work with this HTML:

 

<p>stuff before</p>

<!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->

<!-- <p>stuff</p> MyKeyword <p>stuff</p> -->

<p>stuff after</p>

 

My 3 matches come out looking like this:

 

(1) <p>stuff before</p>

(2) <!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->

    <!-- <p>stuff</p> MyKeyword <p>stuff</p> -->

(3) <p>stuff after</p>

 

It's picking up the first "<!--" and then expanding the second match (2) to the "-->" that occurs *after* the second comment tag because of the greedy ".*" .

 

What I really need is for the matches to look like this:

 

(1) <p>stuff before</p>

    <!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->

(2) <!-- <p>stuff</p> MyKeyword <p>stuff</p> -->

(3) <p>stuff after</p>

 

 

How do I do this?

 

Thanks in advance!

 

try

<?php
$text = '<p>stuff before</p>
<!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->
<!-- <p>stuff</p> MyKeyword <p>stuff</p> -->
<p>stuff after</p>';
preg_match('/^(.*)(<!--.*?MyKeyword.*?-->)(.*)$/s', $text, $out);
print_r($out);
?>

Many thanks for the suggestion. This is definitely closer, but I'm still having a false match...

 

$x = <<<eof

  <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">

  <tr>

  <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">

  <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG"></a>

  </td>

  <!-- <p class="Style_1"> </p> -->

  <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">

  <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">

  <p class="ps_ListItem_Title_P">__BlurbBanner__</p>

  </a>

  <p class="ps_ListItem_Body_P">__BlurbDesc__</p>

  <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->

  <!--  -->

  </td>

  </tr>

  </table>

eof;

 

$re = '/^(.*)(<!--.*?MyKeyword.*?-->)(.*)$/s'

$MatchCount = preg_match($re, $x, $Matches);

 

Yields this...

 

 

$Matches[1]

  <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">

  <tr>

  <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">

  <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG"></a>

  </td>

 

 

$Matches[2]

  <!-- <p class="Style_1"> </p> -->

  <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">

  <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">

  <p class="ps_ListItem_Title_P">MyKeyword</p>

  </a>

  <p class="ps_ListItem_Body_P">__BlurbDesc__</p>

  <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->

 

 

$Matches[3]

  <!--  -->

  </td>

  </tr>

  </table>

 

 

The requirement is that all matches of MyKeyword occur within HTML comment tags.

 

The problem is that $Matches[2] is matching the instance of "MyKeyword" which is not within a comment tag (highlighted in red).

 

It seems I somehow have to tell the regexp "don't look past the next "-->" for the next match.

 

Thoughts? Ideas?

 

Correction to above code... (sorry)

 

$x = <<<eof

  <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">

  <tr>

  <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">

  <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG">[/url]

  </td>

  <!-- <p class="Style_1"> </p> -->

  <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">

  <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">

  <p class="ps_ListItem_Title_P">MyKeyword</p>

  [/url]

  <p class="ps_ListItem_Body_P">__BlurbDesc__</p>

  <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->

  <!--  -->

  </td>

  </tr>

  </table>

eof;

try

<?php
$x = <<<eof
   <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">
   <tr>
   <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">
   <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG">[/url]
   </td>
   <!-- <p class="Style_1"> </p> -->
   <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">
   <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">
   <p class="ps_ListItem_Title_P">MyKeyword</p>
   [/url]
   <p class="ps_ListItem_Body_P">__BlurbDesc__</p>
   <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->
   <!-- b <p class="ps_ListItem_Title_P">MyKeyword</p> -->
   </td>
   </tr>
   </table>
eof;
preg_match('/^(.*)(<!--.*?(?!-->)MyKeyword.*?-->)(.*)$/s', $x, $out);
print_r($out);
?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.