Jump to content

parsing nested divs


zkent

Recommended Posts

I am trying to parse an XHTML web page and extract the contents of a div element with an id of "Segments".  I have sucessfully (I think) located the correct div but don't know the syntax to extract the HTML inside the div.  In javascript, I would just use div.innerHTML but I can't find the PHP equivalent.

 

<?php
$doc = new DomDocument;

// load the doc into the XML object
$doc->LoadHTMLFile("I-290-2.html");

$divs = $doc->getElementsByTagName('div');

foreach ($divs as $div) {
   $v = $div->getAttribute('id');
   if ($v == "Segments") {
      // THIS IS WHERE I WANT TO EXTRACT CONTENTS
   }
}
?>

Link to comment
https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/
Share on other sites

Just a shot in the dark, but isn't the innerhtml consider an attribute? I am not sure if this is true or not, but maybe try this:

 

<?php
$doc = new DomDocument;

// load the doc into the XML object
$doc->LoadHTMLFile("I-290-2.html");

$divs = $doc->getElementsByTagName('div');

foreach ($divs as $div) {
   $v = $div->getAttribute('id');
   if ($v == "Segments") {
      $inner = $div->getAttribute('innerhtml');
      print $inner . "<br />";
   }
}
?>

 

Again I am not sure if that is true or not.

Link to comment
https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-228854
Share on other sites

If it were me I would look into regex www.php.net/regex

 

I am sure if you look here

www.php.net/ereg or www.php.net/preg

 

II am sure in the user contributions you will find someone who created something similiar to what you want that you can manipulate to get what you want.

 

Just an idea.

Link to comment
https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-229032
Share on other sites

I am trying to parse an XHTML web page and extract the contents of a div element with an id of "Segments".  I have sucessfully (I think) located the correct div but don't know the syntax to extract the HTML inside the div.  In javascript, I would just use div.innerHTML but I can't find the PHP equivalent.

 

<?php
$doc = new DomDocument;

// load the doc into the XML object
$doc->LoadHTMLFile("I-290-2.html");

$divs = $doc->getElementsByTagName('div');

foreach ($divs as $div) {
   $v = $div->getAttribute('id');
   if ($v == "Segments") {
      // THIS IS WHERE I WANT TO EXTRACT CONTENTS
   }
}
?>

 

<?php
// Remove below this line if you have PHP 5
if (!function_exists('file_get_contents')) {
     function file_get_contents($file) {
          $lineList = file($file);
          return implode("\n", $lineList);
     }
}
// Remove above this inen if you have PHP 5
$fileContents = file_get_contents("I-290-2.html");

// Get the div
preg_match("@<div[^>]+id=['\"]?Segments['\"]?[^>]+?>(.+?)</div>@is", $fileContents, $matchList);
print_r($matchList);
?>

Untested...

Link to comment
https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-229060
Share on other sites

Glyde,

 

I tried using a regex at first, but the divs in this document are very nested and I couldn't get the regex right.  The div I want is neither the outermost or innermost div.  there is, however, a unique comment at the end of most of the major divs.  I modified your code to include that end comment and it works.  I would however, still like to figure out how to get at the content using the DOM instead.

 

<div id="Segments">

... HTML ...

</div><!-- end Segements -->

 

<?php

$fileContents = file_get_contents("I-290-2.html");

// Get the div
preg_match("@<div[^>]+id=['\"]?Segments['\"]?[^>]+?>(.+?)</div><!-- end Segments -->@is", $fileContents, $matchList);
print_r($matchList);

?>

 

Thanks,

Zach

Link to comment
https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-231485
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.