ludo1960 Posted March 7, 2019 Share Posted March 7, 2019 Hi guys, Reading this from php.net, has got me a wee bit confused. Trying to implement is has got me doubly confused! My code: $dom = new DOMDocument; libxml_use_internal_errors(true); $dom->loadHTMLFile($parent_node); if($dom->childNodes <>0) { $kids = array ( 'url' => $parent_node, 'No_of_kids' => count($dom->childNodes) ); } Results in '' Notice: Object of class DOMNodeList could not be converted to int' How the heck am i supposed to count the childNodes? Quote Link to comment Share on other sites More sharing options...
requinix Posted March 7, 2019 Share Posted March 7, 2019 By reading the documentation? DOMNodeList Or by looking at your own code a couple lines lower. Quote Link to comment Share on other sites More sharing options...
ludo1960 Posted March 7, 2019 Author Share Posted March 7, 2019 Surely you mean this link http://php.net/manual/en/domnodelist.count.php and not the link you sent http://php.net/manual/en/class.domnodelist.php which says the object is countable, but thanks for your non-answer anyway. Quote Link to comment Share on other sites More sharing options...
salathe Posted March 7, 2019 Share Posted March 7, 2019 A (very) brief note about Countable objects Classes implementing the Countable interface define and implement their own count() method. The DOMNodeList class is one such class. Instances of classes that implement the Countable interface can be passed to the count() function, and their own special count() method gets called. In DOMNodeList's case, that method returns the number of nodes in the list. There is nothing stopping you from calling the count() method on the object (e.g. $myobject->count()) rather than the count() function (e.g. count($myobject)), if that's what you want to do. How the heck am i supposed to count the childNodes? Back to your original question. There are several ways to get the number of nodes in a DOMNodeList (which is what your $dom->childNodes is). 1. $dom->childNodes->length 2. count($dom->childNodes) 3. $dom->childNodes->count() 1 Quote Link to comment Share on other sites More sharing options...
ludo1960 Posted March 7, 2019 Author Share Posted March 7, 2019 Hmm, $dom = new DOMDocument; libxml_use_internal_errors(true) ; $dom->loadHTMLFile($parent_node) ; if($dom->childNodes->length <>0) { $kids[] = array ( 'url' => $parent_node, 'No_of_kids' => count($dom->childNodes) ); } echo '<pre>',print_r( $kids ),'</pre>'; Results in: [0] => Array ( [url] => http://mysite.com/test/php/intro.pdo.html [No_of_kids] => 2 ) [1] => Array ( [url] => http://mysite.com/test/php/pdo.setup.html [No_of_kids] => 2 ) [2] => Array ( [url] => http://mysite.com/test/php/pdo.constants.html [No_of_kids] => 2 Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys? Quote Link to comment Share on other sites More sharing options...
salathe Posted March 7, 2019 Share Posted March 7, 2019 1 hour ago, ludo1960 said: Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys? It looks like you're scraping pages from the PHP manual, so taking one of those as an example, the HTML looks like this (super-stripped down for simplicity): <?php $html = '<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> <!-- lots more goes here --> </html>'; $dom = new DOMDocument(); $dom->loadHTML($html); var_dump($dom->childNodes->length); foreach ($dom->childNodes as $childNode) { var_dump(get_class($childNode)); } The above outputs the following: int(2) string(15) "DOMDocumentType" string(10) "DOMElement" This shows that the document ($dom) has two child nodes: 1. the document type (<!DOCTYPE html>) and 2. the "html" element. Hope that helps. ? Quote Link to comment Share on other sites More sharing options...
ludo1960 Posted March 8, 2019 Author Share Posted March 8, 2019 Thanks for your reply, lots of stuff to read up on, then it's play time Quote Link to comment Share on other sites More sharing options...
ludo1960 Posted March 8, 2019 Author Share Posted March 8, 2019 One last question, if i'm crawling a site e.g. index page -> level 1 page -> level 2 page etc (no more child pages after this) How do I know i've reached the end point? Should I expect no. of childnodes = 0 ? Or have I got the wrong end of the stick? Quote Link to comment Share on other sites More sharing options...
requinix Posted March 8, 2019 Share Posted March 8, 2019 Depends, but the answer is probably that you have to figure it out for yourself. Typically by storing a list of the URLs you've hit then checking it whenever you think you want to crawl a new page. Quote Link to comment Share on other sites More sharing options...
ludo1960 Posted March 8, 2019 Author Share Posted March 8, 2019 My crystal ball says that there are some in_array() gymnastics coming my way! Cheers! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.