Jump to content
ludo1960

count childNodes in domDocument

Recommended Posts

Hi guys,

Reading this from php.net, has got me a wee bit confused. Trying to implement is has got me doubly confused! My code:

        $dom = new DOMDocument;
        libxml_use_internal_errors(true);
        $dom->loadHTMLFile($parent_node);

        if($dom->childNodes <>0) {
            $kids = array (
                'url' => $parent_node,
                'No_of_kids' => count($dom->childNodes)
            ); 
        }

 Results in '' Notice: Object of class DOMNodeList could not be converted to int'

How the heck am i supposed to count the childNodes?

Share this post


Link to post
Share on other sites

By reading the documentation?

DOMNodeList

Or by looking at your own code a couple lines lower.

Share this post


Link to post
Share on other sites

A (very) brief note about Countable objects

Classes implementing the Countable interface define and implement their own count() method.  The DOMNodeList class is one such class.

Instances of classes that implement the Countable interface can be passed to the count() function, and their own special count() method gets called.  In DOMNodeList's case, that method returns the number of nodes in the list.  There is nothing stopping you from calling the count() method on the object (e.g. $myobject->count()) rather than the count() function (e.g. count($myobject)), if that's what you want to do.

 

How the heck am i supposed to count the childNodes?

Back to your original question.  There are several ways to get the number of nodes in a DOMNodeList (which is what your $dom->childNodes is).

1. $dom->childNodes->length
2. count($dom->childNodes)
3. $dom->childNodes->count()

 

  • Like 1

Share this post


Link to post
Share on other sites

Hmm,

        $dom = new DOMDocument;
        libxml_use_internal_errors(true) ;
        $dom->loadHTMLFile($parent_node) ;

        if($dom->childNodes->length <>0) {
            $kids[] = array (
                'url' => $parent_node,
                'No_of_kids' => count($dom->childNodes) 
            );   
        }
		echo '<pre>',print_r( $kids ),'</pre>'; 

Results in:

    [0] => Array
        (
            [url] => http://mysite.com/test/php/intro.pdo.html
            [No_of_kids] => 2
        )

    [1] => Array
        (
            [url] => http://mysite.com/test/php/pdo.setup.html
            [No_of_kids] => 2
        )

    [2] => Array
        (
            [url] => http://mysite.com/test/php/pdo.constants.html
            [No_of_kids] => 2

Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys?

Share this post


Link to post
Share on other sites
1 hour ago, ludo1960 said:

Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys?

It looks like you're scraping pages from the PHP manual, so taking one of those as an example, the HTML looks like this (super-stripped down for simplicity):

<?php
$html = '<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <!-- lots more goes here -->
</html>';
$dom = new DOMDocument();
$dom->loadHTML($html);

var_dump($dom->childNodes->length);
foreach ($dom->childNodes as $childNode) {
    var_dump(get_class($childNode));
}

The above outputs the following:

int(2)
string(15) "DOMDocumentType"
string(10) "DOMElement"

This shows that the document ($dom) has two child nodes: 1. the document type (<!DOCTYPE html>) and 2. the "html" element.

Hope that helps. 🙂

Share this post


Link to post
Share on other sites

Thanks for your reply, lots of stuff to read up on, then it's play time :)

Share this post


Link to post
Share on other sites

One last question, if i'm crawling a site e.g. index page -> level 1 page -> level 2 page etc (no more child pages after this) How do I know i've reached the end point? Should I expect no. of childnodes = 0  ? Or have I got the wrong end of the stick?

Share this post


Link to post
Share on other sites

Depends, but the answer is probably that you have to figure it out for yourself. Typically by storing a list of the URLs you've hit then checking it whenever you think you want to crawl a new page.

Share this post


Link to post
Share on other sites

My crystal ball says that there are some in_array() gymnastics coming my way! Cheers!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.