zkent Posted April 13, 2007 Share Posted April 13, 2007 I am trying to parse an XHTML web page and extract the contents of a div element with an id of "Segments". I have sucessfully (I think) located the correct div but don't know the syntax to extract the HTML inside the div. In javascript, I would just use div.innerHTML but I can't find the PHP equivalent. <?php $doc = new DomDocument; // load the doc into the XML object $doc->LoadHTMLFile("I-290-2.html"); $divs = $doc->getElementsByTagName('div'); foreach ($divs as $div) { $v = $div->getAttribute('id'); if ($v == "Segments") { // THIS IS WHERE I WANT TO EXTRACT CONTENTS } } ?> Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/ Share on other sites More sharing options...
boo_lolly Posted April 13, 2007 Share Posted April 13, 2007 we'd have to see the class to tell you how to do it. Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-228773 Share on other sites More sharing options...
zkent Posted April 13, 2007 Author Share Posted April 13, 2007 we'd have to see the class to tell you how to do it. I haven't created any classes. I am using http://us2.php.net/dom_domelement_getelementsbytagname Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-228786 Share on other sites More sharing options...
per1os Posted April 13, 2007 Share Posted April 13, 2007 Just a shot in the dark, but isn't the innerhtml consider an attribute? I am not sure if this is true or not, but maybe try this: <?php $doc = new DomDocument; // load the doc into the XML object $doc->LoadHTMLFile("I-290-2.html"); $divs = $doc->getElementsByTagName('div'); foreach ($divs as $div) { $v = $div->getAttribute('id'); if ($v == "Segments") { $inner = $div->getAttribute('innerhtml'); print $inner . "<br />"; } } ?> Again I am not sure if that is true or not. Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-228854 Share on other sites More sharing options...
zkent Posted April 14, 2007 Author Share Posted April 14, 2007 @frost110 No dice. Good thinking though. Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-228971 Share on other sites More sharing options...
per1os Posted April 14, 2007 Share Posted April 14, 2007 If it were me I would look into regex www.php.net/regex I am sure if you look here www.php.net/ereg or www.php.net/preg II am sure in the user contributions you will find someone who created something similiar to what you want that you can manipulate to get what you want. Just an idea. Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-229032 Share on other sites More sharing options...
Glyde Posted April 14, 2007 Share Posted April 14, 2007 I am trying to parse an XHTML web page and extract the contents of a div element with an id of "Segments". I have sucessfully (I think) located the correct div but don't know the syntax to extract the HTML inside the div. In javascript, I would just use div.innerHTML but I can't find the PHP equivalent. <?php $doc = new DomDocument; // load the doc into the XML object $doc->LoadHTMLFile("I-290-2.html"); $divs = $doc->getElementsByTagName('div'); foreach ($divs as $div) { $v = $div->getAttribute('id'); if ($v == "Segments") { // THIS IS WHERE I WANT TO EXTRACT CONTENTS } } ?> <?php // Remove below this line if you have PHP 5 if (!function_exists('file_get_contents')) { function file_get_contents($file) { $lineList = file($file); return implode("\n", $lineList); } } // Remove above this inen if you have PHP 5 $fileContents = file_get_contents("I-290-2.html"); // Get the div preg_match("@<div[^>]+id=['\"]?Segments['\"]?[^>]+?>(.+?)</div>@is", $fileContents, $matchList); print_r($matchList); ?> Untested... Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-229060 Share on other sites More sharing options...
zkent Posted April 17, 2007 Author Share Posted April 17, 2007 Glyde, I tried using a regex at first, but the divs in this document are very nested and I couldn't get the regex right. The div I want is neither the outermost or innermost div. there is, however, a unique comment at the end of most of the major divs. I modified your code to include that end comment and it works. I would however, still like to figure out how to get at the content using the DOM instead. <div id="Segments"> ... HTML ... </div><!-- end Segements --> <?php $fileContents = file_get_contents("I-290-2.html"); // Get the div preg_match("@<div[^>]+id=['\"]?Segments['\"]?[^>]+?>(.+?)</div><!-- end Segments -->@is", $fileContents, $matchList); print_r($matchList); ?> Thanks, Zach Link to comment https://forums.phpfreaks.com/topic/46916-parsing-nested-divs/#findComment-231485 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.