parsing nested divs

zkent · April 13, 2007

I am trying to parse an XHTML web page and extract the contents of a div element with an id of "Segments". I have sucessfully (I think) located the correct div but don't know the syntax to extract the HTML inside the div. In javascript, I would just use div.innerHTML but I can't find the PHP equivalent.

<?php
$doc = new DomDocument;

// load the doc into the XML object
$doc->LoadHTMLFile("I-290-2.html");

$divs = $doc->getElementsByTagName('div');

foreach ($divs as $div) {
   $v = $div->getAttribute('id');
   if ($v == "Segments") {
      // THIS IS WHERE I WANT TO EXTRACT CONTENTS
   }
}
?>

boo_lolly · April 13, 2007

we'd have to see the class to tell you how to do it.

zkent · April 13, 2007

we'd have to see the class to tell you how to do it.

I haven't created any classes. I am using http://us2.php.net/dom_domelement_getelementsbytagname

per1os · April 13, 2007

Just a shot in the dark, but isn't the innerhtml consider an attribute? I am not sure if this is true or not, but maybe try this:

<?php
$doc = new DomDocument;

// load the doc into the XML object
$doc->LoadHTMLFile("I-290-2.html");

$divs = $doc->getElementsByTagName('div');

foreach ($divs as $div) {
   $v = $div->getAttribute('id');
   if ($v == "Segments") {
      $inner = $div->getAttribute('innerhtml');
      print $inner . "<br />";
   }
}
?>

Again I am not sure if that is true or not.

zkent · April 14, 2007

@frost110

No dice. Good thinking though.

per1os · April 14, 2007

If it were me I would look into regex www.php.net/regex

I am sure if you look here

www.php.net/ereg or www.php.net/preg

II am sure in the user contributions you will find someone who created something similiar to what you want that you can manipulate to get what you want.

Just an idea.

Glyde · April 14, 2007

I am trying to parse an XHTML web page and extract the contents of a div element with an id of "Segments". I have sucessfully (I think) located the correct div but don't know the syntax to extract the HTML inside the div. In javascript, I would just use div.innerHTML but I can't find the PHP equivalent.
<?php
$doc = new DomDocument;

// load the doc into the XML object
$doc->LoadHTMLFile("I-290-2.html");

$divs = $doc->getElementsByTagName('div');

foreach ($divs as $div) {
   $v = $div->getAttribute('id');
   if ($v == "Segments") {
      // THIS IS WHERE I WANT TO EXTRACT CONTENTS
   }
}
?>

<?php
// Remove below this line if you have PHP 5
if (!function_exists('file_get_contents')) {
     function file_get_contents($file) {
          $lineList = file($file);
          return implode("\n", $lineList);
     }
}
// Remove above this inen if you have PHP 5
$fileContents = file_get_contents("I-290-2.html");

// Get the div
preg_match("@<div[^>]+id=['\"]?Segments['\"]?[^>]+?>(.+?)</div>@is", $fileContents, $matchList);
print_r($matchList);
?>

Untested...

zkent · April 17, 2007

Glyde,

I tried using a regex at first, but the divs in this document are very nested and I couldn't get the regex right. The div I want is neither the outermost or innermost div. there is, however, a unique comment at the end of most of the major divs. I modified your code to include that end comment and it works. I would however, still like to figure out how to get at the content using the DOM instead.

<div id="Segments">

... HTML ...

</div><!-- end Segements -->

<?php

$fileContents = file_get_contents("I-290-2.html");

// Get the div
preg_match("@<div[^>]+id=['\"]?Segments['\"]?[^>]+?>(.+?)</div><!-- end Segments -->@is", $fileContents, $matchList);
print_r($matchList);

?>

Thanks,

Zach

Sign In

parsing nested divs

Recommended Posts

zkent

Link to comment

Share on other sites

boo_lolly

Link to comment

Share on other sites

zkent

Link to comment

Share on other sites

per1os

Link to comment

Share on other sites

zkent

Link to comment

Share on other sites

per1os

Link to comment

Share on other sites

Glyde

Link to comment

Share on other sites

zkent

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information