
ludo1960
Members-
Posts
123 -
Joined
-
Last visited
Everything posted by ludo1960
-
My crystal ball says that there are some in_array() gymnastics coming my way! Cheers!
-
One last question, if i'm crawling a site e.g. index page -> level 1 page -> level 2 page etc (no more child pages after this) How do I know i've reached the end point? Should I expect no. of childnodes = 0 ? Or have I got the wrong end of the stick?
-
Thanks for your reply, lots of stuff to read up on, then it's play time
-
Hmm, $dom = new DOMDocument; libxml_use_internal_errors(true) ; $dom->loadHTMLFile($parent_node) ; if($dom->childNodes->length <>0) { $kids[] = array ( 'url' => $parent_node, 'No_of_kids' => count($dom->childNodes) ); } echo '<pre>',print_r( $kids ),'</pre>'; Results in: [0] => Array ( [url] => http://mysite.com/test/php/intro.pdo.html [No_of_kids] => 2 ) [1] => Array ( [url] => http://mysite.com/test/php/pdo.setup.html [No_of_kids] => 2 ) [2] => Array ( [url] => http://mysite.com/test/php/pdo.constants.html [No_of_kids] => 2 Pretty sure the answer aint 2 every time, something fishy going on. Any ideas guys?
-
Surely you mean this link http://php.net/manual/en/domnodelist.count.php and not the link you sent http://php.net/manual/en/class.domnodelist.php which says the object is countable, but thanks for your non-answer anyway.
-
Hi guys, Reading this from php.net, has got me a wee bit confused. Trying to implement is has got me doubly confused! My code: $dom = new DOMDocument; libxml_use_internal_errors(true); $dom->loadHTMLFile($parent_node); if($dom->childNodes <>0) { $kids = array ( 'url' => $parent_node, 'No_of_kids' => count($dom->childNodes) ); } Results in '' Notice: Object of class DOMNodeList could not be converted to int' How the heck am i supposed to count the childNodes?
-
oops, all good now. Thanks for pointing me in the right direction, off to play now and I promise to read the manual .
-
Hi guys, Just starting to play with PHP Domdocument, only to fail at the very first step: <?php $html = 'test/php/somefile.html' ; if(!empty($html)){ $dom_1 = new domDocument ; $dom_1->loadHTML($html) ; $links = $dom_1->getElementsByTagName('li') ; foreach ( $links as $link) { // echo $link ; echo $link->nodeValue, PHP_EOL; } } ?> When I visit it in a browser I get a WSOD, what am I missing?
-
Good guess! Yeah I tried that, but some of the $lis2 don't have children, and I'm not sure how to deal with null results, $lis2 = $html2->find('.chunklist', 0)->children() ; results in: Fatal error: Uncaught Error: Call to a member function children() on null I've tried: for ( $j = 0 ; $j < count($lis2) ; $j++ ) { if ($lis2 > 0) { // and tried $lis2 <> 0 $parent_term = $lis2[$j]->first_child()->innertext . ', ' ; $parent_node = $lis2[$j]->children[0]->attr['href'] . '<br>' ; // echo count( $parent_node ) ; } else { echo "no data found" ; } } kinda stuck as to what to try next?
-
Hi guys, I'm using PHP Simple DOM, thanks to the good folk her I'm making progress. The html i'm parsing has a bunch of links in a li ul structure. I've managed to get the top layer of links extracted and I would like to have a count of the number of child nodes in the layer below the main links. Here is my code: $html = file_get_html('test/php/book.html'); if(!empty($html)){ $lis = $html->find('.chunklist', 0)->children() ; for ( $i = 0 ; $i < count($lis) ; $i++ ) { $parent_term = $lis[$i]->first_child()->innertext . ', ' ; $parent_node = $lis[$i]->children[0]->attr['href'] . '<br>' ; //echo count($parent_node->children()) ; this gives error Warning: count(): Parameter must be an array or an object that implements Countable echo $parent_term . $parent_node ; $parent_node = $const . $parent_node ; echo $parent_node ; $html2 = file_get_html($parent_node) ; $lis2 = $html2->find('.chunklist', 0)->children() ; } } I don't see anything in the manual regarding counting nodes, any idea how to go about this?
-
Thanks again, changed a bit of your code and it works great for ( $i = 0 ; $i < count($li) ; $i++ ) { echo $li[$i]->children[0]->attr['href'] . '<br>' ; //echo $li[$i]->children[0]->children[0]->_[4] . '<br>' ; This was my effort lol! echo $li[$i]->first_child()->innertext } So now I have all I need to construct my associative array ! Great answer! thanks again Maxxd
-
Yeah, got lots to learn. Wouldn't want it any other way! At least I try, that's got to count for something?
-
Already tried that: echo $li[$i]->children[0]->["parent"]->["_"]->[1]->[4] . '<br>' ; and echo $li[$i]->children[0]->["parent"]->['_']->[1]->[4] . '<br>' ; and lots of other guesses Thanks for chipping in though!
-
Trust me I'm trying, $li[$i]->children[0]->attr['href'] is obvious now $li[$i]->children[0]->children[0]->_[4] aint so obvious! I need a Prolific Member.....but I suppose we all do
-
Halle friggen lujah!!! Am I using the right approach? for ( $i = 0 ; $i < count($li) ; $i++ ) { echo $li[$i]->children[0]->attr['href'] . '<br>' ; //echo $li[$i]->children[0]->children[0] . '<br>' ; } Gets me the child nodes on the page visited, all good and well but I also need the text from the href, it's buried deeper in the array/object: object(simple_html_dom_node)#66 (9) { ["nodetype"]=> int(1) ["tag"]=> string(2) "li" ["attr"]=> array(0) { } ["children"]=> array(1) { [0]=> object(simple_html_dom_node)#67 (9) { ["nodetype"]=> int(1) ["tag"]=> string(1) "a" ["attr"]=> array(1) { ["href"]=> string(21) "pdo.requirements.html" } ["children"]=> array(0) { } ["nodes"]=> array(1) { [0]=> object(simple_html_dom_node)#68 (9) { ["nodetype"]=> int(3) ["tag"]=> string(4) "text" ["attr"]=> array(0) { } ["children"]=> array(0) { } ["nodes"]=> array(0) { } ["parent"]=> *RECURSION* ["_"]=> array(1) { [4]=> string(12) "Requirements" } The last bit "Requirements" just after the suspicious looking *RECURSION* I can see now how the objects and arrays work at the top level but how to address the ["_"][4]?
-
Trying your code: echo "<pre>"; print_r($html->find('li')); echo "</pre>"; results in: Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 1071648768 bytes) whereas: echo "<pre>"; var_dump($html->find('li')); echo "</pre>"; spits out the largest array known to mankind: array(9) { [0]=> object(simple_html_dom_node)#27 (9) { ["nodetype"]=> int(1) ["tag"]=> string(2) "li" ["attr"]=> array(1) { ["style"]=> string(12) "float: left;" } ["children"]=> array(1) { [0]=> object(simple_html_dom_node)#28 (9) { ["nodetype"]=> int(1) ["tag"]=> string(1) "a" ["attr"]=> array(1) { ["href"]=> string(14) "intro.pdo.html" } ["children"]=> array(0) { } ["nodes"]=> array(1) { [0]=> object(simple_html_dom_node)#29 (9) { ["nodetype"]=> int(3) ["tag"]=> string(4) "text" ["attr"]=> array(0) { } ["children"]=> array(0) { } ["nodes"]=> array(0) { } ["parent"]=> *RECURSION* ["_"]=> array(1) { [4]=> string(15) "« Introduction" } ["tag_start"]=> int(0) ["dom":"simple_html_dom_node":private]=> object(simple_html_dom)#2 (23) { ["root"]=> object(simple_html_dom_node)#3 (9) {............ad infinitum!!
-
I know you guys are trying to point me in the right direction, but after hours going around in circles I just can't figure out how to access an object that is in an array. Must be obvious for you guys, but I can't see it. Remember when you were first learning the dark art of PHP and you had a WTF moment? Well that's me right now,
-
Eek!! object(simple_html_dom_node)#27 (9) { ["nodetype"]=> int(1) ["tag"]=> string(2) "li" ["attr"]=> array(1) { ["style"]=> string(12) "float: left;" } ["children"]=> array(1) { [0]=> object(simple_html_dom_node)#28 (9) { ["nodetype"]=> int(1) ["tag"]=> string(1) "a" ["attr"]=> array(1) { ["href"]=> string(25) "function.odbc-tables.html" ...... Yes you are 100% correct, I need to learn some debugging techniques. Never seen an array this big! So to access the children I dont understand why $str1 = $li->children() ; Does not work? I thought that is how you access elements in an object. What am i missing?
-
Tried both your suggestions: foreach ($html->find('li') as $li) { $str1 = $li->find('a', 0)->first_child(); } Results in: Fatal error: Uncaught Error: Call to a member function first_child() on null $str2 = $html->find('a', 0); Results in: Fatal error: Allowed memory size of 2147483648 bytes exhausted I am missing something fundamental here, I'm all out of ideas. If I promise never to laugh again at your beloved President, would you help me out?
-
The way I read the docs, I need to find the child nodes of an element and that is what I tried foreach ($html->find('li') as $li) { $str1[] = $li->find('a')->first_child(); } I need s bigger hint! Come on guys, throw the dog a bone
-
First_child is on the traverse he dom tree page, but I think my usage of it is wrong, a small hint or pointer would be of great help!
-
Hello again, yes I have read the docs, but there seems to be a tiny wee gap in my interpretation foreach ($html->find('li') as $li) { $str1[] = $li->find('a')->first_child(); // foreach ($li->find('a') as $a) { // $a->find('#layout', 0)->children(1)->children(1)->children(2)->id ; // } } All attempts end in abject failure, What am I doing wrong?
-
@requinix Sorry if my request wasn't clear, it's just that I am confused as to how to traverse the ul's and the l'si to get the immediate children a's, the help for DOM parser isn't clear to me how to find child nodes if indeed there are any. The idea behind me wanting the output to be text1... is that I want the structure of the array to reflect the structure of the ul's and li's for simplicity's sake. Thank you both for taking the time to answer my post, your help is greatly appreciated.
-
Hi guys, I'm trying to build an array to replicate the hierarchy in a menu: <ul> <li><a href="file1.html">text1</a></li> <ul> <li><a href="file2.html">text2</a></li> <li><a href="file3.html">text3</a></li> <li><a href="file4.html">text4</a></li> <li><a href="file5.html">text5</a></li> </ul> </ul> An i would like the output to be: "text1" "text1", "text2" "text1", "text3" "text1", "text4" "text1", "text5" Here is my loop to go through the html hierarchy: foreach ($html2->find('ul') as $ul) { foreach ($ul->find('li') as $li) { foreach($li->find('a') as $a) { // need to filter out empty and index.html, tried if(!$->href = 'index.html) {do stuff} but didn't work $links2[] = $a->href ; $taxo2[] = $a->plaintext ; } } } This finds all the links but not the hierarchy, any ideas how to approach this? And also how to filter out blanks and references to index.html?
-
I managed it with str_replace(), thought I was going to be forced to use preg_replace(), and that I don't know about