Jump to content

count childNodes in domDocument


ludo1960

Recommended Posts

Hi guys,

Reading this from php.net, has got me a wee bit confused. Trying to implement is has got me doubly confused! My code:

        $dom = new DOMDocument;
        libxml_use_internal_errors(true);
        $dom->loadHTMLFile($parent_node);

        if($dom->childNodes <>0) {
            $kids = array (
                'url' => $parent_node,
                'No_of_kids' => count($dom->childNodes)
            ); 
        }

 Results in '' Notice: Object of class DOMNodeList could not be converted to int'

How the heck am i supposed to count the childNodes?

Link to comment
Share on other sites

A (very) brief note about Countable objects

Classes implementing the Countable interface define and implement their own count() method.  The DOMNodeList class is one such class.

Instances of classes that implement the Countable interface can be passed to the count() function, and their own special count() method gets called.  In DOMNodeList's case, that method returns the number of nodes in the list.  There is nothing stopping you from calling the count() method on the object (e.g. $myobject->count()) rather than the count() function (e.g. count($myobject)), if that's what you want to do.

 

How the heck am i supposed to count the childNodes?

Back to your original question.  There are several ways to get the number of nodes in a DOMNodeList (which is what your $dom->childNodes is).

1. $dom->childNodes->length
2. count($dom->childNodes)
3. $dom->childNodes->count()

 

Link to comment
Share on other sites

Hmm,

        $dom = new DOMDocument;
        libxml_use_internal_errors(true) ;
        $dom->loadHTMLFile($parent_node) ;

        if($dom->childNodes->length <>0) {
            $kids[] = array (
                'url' => $parent_node,
                'No_of_kids' => count($dom->childNodes) 
            );   
        }
		echo '<pre>',print_r( $kids ),'</pre>'; 

Results in:

    [0] => Array
        (
            [url] => http://mysite.com/test/php/intro.pdo.html
            [No_of_kids] => 2
        )

    [1] => Array
        (
            [url] => http://mysite.com/test/php/pdo.setup.html
            [No_of_kids] => 2
        )

    [2] => Array
        (
            [url] => http://mysite.com/test/php/pdo.constants.html
            [No_of_kids] => 2

Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys?

Link to comment
Share on other sites

1 hour ago, ludo1960 said:

Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys?

It looks like you're scraping pages from the PHP manual, so taking one of those as an example, the HTML looks like this (super-stripped down for simplicity):

<?php
$html = '<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <!-- lots more goes here -->
</html>';
$dom = new DOMDocument();
$dom->loadHTML($html);

var_dump($dom->childNodes->length);
foreach ($dom->childNodes as $childNode) {
    var_dump(get_class($childNode));
}

The above outputs the following:

int(2)
string(15) "DOMDocumentType"
string(10) "DOMElement"

This shows that the document ($dom) has two child nodes: 1. the document type (<!DOCTYPE html>) and 2. the "html" element.

Hope that helps. ?

Link to comment
Share on other sites

One last question, if i'm crawling a site e.g. index page -> level 1 page -> level 2 page etc (no more child pages after this) How do I know i've reached the end point? Should I expect no. of childnodes = 0  ? Or have I got the wrong end of the stick?

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.