Jump to content

[SOLVED] Parsing HTML Comments, But Harder! Can't Figure This One Out


mmem700

Recommended Posts

I'm new to regexp so I'm having a really tough time with this one.

 

I need a regexp that will parse an HTML comment *THAT contains a certain keyword*, into 3 parts:

 

(1) The part before the comment tag

(2) The comment tag itsef

(3) The part after the comment tag

 

I've tried using this regexp with preg_match():

 

/(.*)(<!--.*MyKeyword.*-->)(.*)/s

 

but, it *does not* work with this HTML:

 

<p>stuff before</p>

<!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->

<!-- <p>stuff</p> MyKeyword <p>stuff</p> -->

<p>stuff after</p>

 

My 3 matches come out looking like this:

 

(1) <p>stuff before</p>

(2) <!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->

    <!-- <p>stuff</p> MyKeyword <p>stuff</p> -->

(3) <p>stuff after</p>

 

It's picking up the first "<!--" and then expanding the second match (2) to the "-->" that occurs *after* the second comment tag because of the greedy ".*" .

 

What I really need is for the matches to look like this:

 

(1) <p>stuff before</p>

    <!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->

(2) <!-- <p>stuff</p> MyKeyword <p>stuff</p> -->

(3) <p>stuff after</p>

 

 

How do I do this?

 

Thanks in advance!

 

try

<?php
$text = '<p>stuff before</p>
<!-- <p>stuff</p> SomeKeyword <p>stuff</p> -->
<!-- <p>stuff</p> MyKeyword <p>stuff</p> -->
<p>stuff after</p>';
preg_match('/^(.*)(<!--.*?MyKeyword.*?-->)(.*)$/s', $text, $out);
print_r($out);
?>

Many thanks for the suggestion. This is definitely closer, but I'm still having a false match...

 

$x = <<<eof

  <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">

  <tr>

  <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">

  <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG"></a>

  </td>

  <!-- <p class="Style_1"> </p> -->

  <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">

  <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">

  <p class="ps_ListItem_Title_P">__BlurbBanner__</p>

  </a>

  <p class="ps_ListItem_Body_P">__BlurbDesc__</p>

  <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->

  <!--  -->

  </td>

  </tr>

  </table>

eof;

 

$re = '/^(.*)(<!--.*?MyKeyword.*?-->)(.*)$/s'

$MatchCount = preg_match($re, $x, $Matches);

 

Yields this...

 

 

$Matches[1]

  <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">

  <tr>

  <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">

  <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG"></a>

  </td>

 

 

$Matches[2]

  <!-- <p class="Style_1"> </p> -->

  <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">

  <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">

  <p class="ps_ListItem_Title_P">MyKeyword</p>

  </a>

  <p class="ps_ListItem_Body_P">__BlurbDesc__</p>

  <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->

 

 

$Matches[3]

  <!--  -->

  </td>

  </tr>

  </table>

 

 

The requirement is that all matches of MyKeyword occur within HTML comment tags.

 

The problem is that $Matches[2] is matching the instance of "MyKeyword" which is not within a comment tag (highlighted in red).

 

It seems I somehow have to tell the regexp "don't look past the next "-->" for the next match.

 

Thoughts? Ideas?

 

Correction to above code... (sorry)

 

$x = <<<eof

  <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">

  <tr>

  <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">

  <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG">[/url]

  </td>

  <!-- <p class="Style_1"> </p> -->

  <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">

  <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">

  <p class="ps_ListItem_Title_P">MyKeyword</p>

  [/url]

  <p class="ps_ListItem_Body_P">__BlurbDesc__</p>

  <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->

  <!--  -->

  </td>

  </tr>

  </table>

eof;

try

<?php
$x = <<<eof
   <table border="0" align="center" cellpadding="0" cellspacing="0" class="ps_ListItem_TABLE">
   <tr>
   <td width="120" align="center" valign="middle" class="ps_ListItem_TD-Pic">
   <a href="__BlurbProductURL__"><img src="__BlurbPicURL__" border="0" class="ps_ListItem_IMG">[/url]
   </td>
   <!-- <p class="Style_1"> </p> -->
   <td width="365" align="left" valign="top" class="ps_ListItem_TD-Desc">
   <a href="__BlurbProductURL__" class="ps_ListItem_Title_A">
   <p class="ps_ListItem_Title_P">MyKeyword</p>
   [/url]
   <p class="ps_ListItem_Body_P">__BlurbDesc__</p>
   <!-- <p class="Style_1"></p><p class="SomeOtherClass"></p> -->
   <!-- b <p class="ps_ListItem_Title_P">MyKeyword</p> -->
   </td>
   </tr>
   </table>
eof;
preg_match('/^(.*)(<!--.*?(?!-->)MyKeyword.*?-->)(.*)$/s', $x, $out);
print_r($out);
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.