Jump to content
ludo1960

array hierarchy and filter

Recommended Posts

Hi guys,

I'm trying to build an array to replicate the hierarchy in a menu:


<ul>
<li><a href="file1.html">text1</a></li>

<ul>
<li><a href="file2.html">text2</a></li>
<li><a href="file3.html">text3</a></li>
<li><a href="file4.html">text4</a></li>
<li><a href="file5.html">text5</a></li>
</ul>

</ul>

An i would like the output to be:

"text1"
"text1", "text2"
"text1", "text3"
"text1", "text4"
"text1", "text5"
 

Here is my loop to go through the html hierarchy:

        foreach ($html2->find('ul') as $ul) {
            foreach ($ul->find('li') as $li) {
                foreach($li->find('a') as $a) { 
                    // need to filter out empty and index.html, tried if(!$->href = 'index.html) {do stuff} but didn't work
                    $links2[] = $a->href ;
                    $taxo2[] = $a->plaintext ;                   
                }
            }
        }

This finds all the links but not the hierarchy, any ideas how to approach this? And also how to filter out blanks and references to index.html?

Share this post


Link to post
Share on other sites

It is not clear what you mean. Your code certainly will not do what you ask. You don't save the text from the <li> that follows the first <ul>. It sounds like you want an associative array where the key is 'text1' and the value is an array of 'text2', 'text3', etc.

Share this post


Link to post
Share on other sites

Yeah, unless there is a particular need for that 1/2, 1/3, 1/4, 1/5 list then it really should be an array containing one entry for 1, that itself has another array of 2-5.

array(
	"text1" => array(
		"text2",
		"text3",
		"text4",
		"text5"
	)
)

or

array(
	array(
		"name" => "text1",
		"items" => array(
			array(
				"name" => "text2",
				"items" => array()
			),
			array(
				"name" => "text3",
				"items" => array()
			),
			array(
				"name" => "text4",
				"items" => array()
			),
			array(
				"name" => "text5",
				"items" => array()
			)
		)
	)
)

Also your HTML is incorrect: the UL needs to be within the parent LI. Outside it is invalid.

Keep in mind that ->find() will find all children, so ->find(a) on the text1 LI will find all five links. A better approach would be to find the A from the LIs set of immediate children, then from there go recursively into the immediate child UL if any.

Share this post


Link to post
Share on other sites

@requinix

Quote

A better approach would be to find the A from the LIs set of immediate children, then from there go recursively into the immediate child UL if any.

Sorry if my request wasn't clear, it's just that I am confused as to how to traverse the ul's and the  l'si to get the immediate children a's, the help for DOM parser isn't clear to me how to find child nodes if indeed there are any. The idea behind me wanting the output to be text1... is that I want the structure of the array to reflect the structure of the ul's and li's for simplicity's sake. Thank you both for taking the time to answer my post, your help is greatly appreciated.

Share this post


Link to post
Share on other sites

Hello again,

yes I have read the docs, but there seems to be a tiny wee gap in my interpretation :)

    foreach ($html->find('li') as $li) {
       
        $str1[] = $li->find('a')->first_child();

       // foreach ($li->find('a') as $a) {
       // $a->find('#layout', 0)->children(1)->children(1)->children(2)->id ;
       // }
    }

All attempts end in abject failure, 

Quote

Fatal error: Uncaught Error: Call to a member function first_child() on array 

 

What am I doing wrong?

Share this post


Link to post
Share on other sites

I don't see anything "first_child" on that doc page I linked, but that "How to traverse the DOM tree?" example looks highly relevant.

Share this post


Link to post
Share on other sites

First_child is on the traverse he dom tree page, but I think my usage of it is wrong, a small hint or pointer would be of great help!

Share this post


Link to post
Share on other sites

The way I read the docs, I need to find the child nodes of an element and that is what I tried

    foreach ($html->find('li') as $li) {
       
        $str1[] = $li->find('a')->first_child();

    }

I need s bigger hint! Come on guys, throw the dog a bone :)

Share this post


Link to post
Share on other sites

Take a look at how you're making the call and how the documentation makes the call. I've updated the examples a bit to make the comparison more direct, I hope.

Yours:

$li->find('a')->first_child();

Theirs:

$li->find('a', 0)->first_child();

And the error message:

Call to a member function first_child() on array

And finally, the documentation itself states (modified for emphasis):

Quote

// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', n);

Hope that helps.

Share this post


Link to post
Share on other sites

Tried both your suggestions:

   foreach ($html->find('li') as $li) {
    
     $str1 = $li->find('a', 0)->first_child();

    }

Results in:  Fatal error: Uncaught Error: Call to a member function first_child() on null

$str2 = $html->find('a', 0);

Results in: 

Fatal error:  Allowed memory size of 2147483648 bytes exhausted 

I am missing something fundamental here, I'm all out of ideas.

If I promise never to laugh again at your beloved President, would you help me out?

Share this post


Link to post
Share on other sites

You need to learn some debugging techniques so you can help yourself. First use:

echo "<pre>";
var_dump($html->find('li'));
echo "</pre>";

to see exactly what you are retrieving. From there you should be able to figure out what to use to parse that result or if you are not getting what you expect.

Edited by gw1500se

Share this post


Link to post
Share on other sites

Eek!! 

object(simple_html_dom_node)#27 (9) {
  ["nodetype"]=>
  int(1)
  ["tag"]=>
  string(2) "li"
  ["attr"]=>
  array(1) {
    ["style"]=>
    string(12) "float: left;"
  }
  ["children"]=>
  array(1) {
    [0]=>
    object(simple_html_dom_node)#28 (9) {
      ["nodetype"]=>
      int(1)
      ["tag"]=>
      string(1) "a"
      ["attr"]=>
      array(1) {
        ["href"]=>
        string(25) "function.odbc-tables.html" ......

Yes you are 100% correct, I need to learn some debugging techniques. Never seen an array this big! So to access the children I dont understand why 

 $str1 = $li->children() ;

Does not work? I thought that is how you access elements in an object. What am i missing?

Share this post


Link to post
Share on other sites

$str1 is not a string so you can't treat it that way. It is an array. You can see that $str1[0] is the element you are looking for.

Share this post


Link to post
Share on other sites
20 hours ago, ludo1960 said:

Tried both your suggestions: 

They weren't intended to be copy and paste solutions. They were a mashup of the documentation and your code with the goal of showing you how the function call needs to be made to get you the results you want. Read the code, read the documentation, then apply logic. As @gw1500se says, $str1 isn't a string, it's an object with some properties that are arrays and it needs to be treated as such.

Share this post


Link to post
Share on other sites

I know you guys are trying to point me in the right direction, but after hours going around in circles I just can't figure out how to access an object that is in an array. Must be obvious for you guys, but I can't see it. Remember when you were first learning the dark art of PHP and you had a WTF moment? Well that's me right now, 

Share this post


Link to post
Share on other sites

The structure may be clearer to you if you use print_r() instead of var_dump()

echo "<pre>";
print_r($html->find('li'));
echo "</pre>";

 

Share this post


Link to post
Share on other sites

Trying your code:

    echo "<pre>";
    print_r($html->find('li'));
    echo "</pre>";

results in:

Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 1071648768 bytes) 

whereas:

    echo "<pre>";
    var_dump($html->find('li'));
    echo "</pre>";

spits out the largest array known to mankind:

array(9) {
  [0]=>
  object(simple_html_dom_node)#27 (9) {
    ["nodetype"]=>
    int(1)
    ["tag"]=>
    string(2) "li"
    ["attr"]=>
    array(1) {
      ["style"]=>
      string(12) "float: left;"
    }
    ["children"]=>
    array(1) {
      [0]=>
      object(simple_html_dom_node)#28 (9) {
        ["nodetype"]=>
        int(1)
        ["tag"]=>
        string(1) "a"
        ["attr"]=>
        array(1) {
          ["href"]=>
          string(14) "intro.pdo.html"
        }
        ["children"]=>
        array(0) {
        }
        ["nodes"]=>
        array(1) {
          [0]=>
          object(simple_html_dom_node)#29 (9) {
            ["nodetype"]=>
            int(3)
            ["tag"]=>
            string(4) "text"
            ["attr"]=>
            array(0) {
            }
            ["children"]=>
            array(0) {
            }
            ["nodes"]=>
            array(0) {
            }
            ["parent"]=>
            *RECURSION*
            ["_"]=>
            array(1) {
              [4]=>
              string(15) "« Introduction"
            }
            ["tag_start"]=>
            int(0)
            ["dom":"simple_html_dom_node":private]=>
            object(simple_html_dom)#2 (23) {
              ["root"]=>
              object(simple_html_dom_node)#3 (9) {............ad infinitum!!

 

Share this post


Link to post
Share on other sites

Give this a try:

$li = $html->find('li');
print("<p>{$li[0]->children[0]->attr['href']}</p>");

and see if you can follow the track through the output of the var_dump() function.

Then try  this:

$li = $html->find('li', 0);
print("<p>{$li->children[0]->attr['href']}</p>");

and follow that as well. Coupled with the documentation and the comments above, things will hopefully start to look a little clearer...

Share this post


Link to post
Share on other sites

Halle friggen lujah!!! Am I using the right approach?

    for ( $i = 0 ; $i < count($li) ; $i++ ) {
        echo $li[$i]->children[0]->attr['href'] . '<br>' ; 
        //echo $li[$i]->children[0]->children[0] . '<br>' ;
        
    } 

Gets me the child nodes on the page visited, all good and well but I also need the text from the href, it's buried deeper in the array/object:

object(simple_html_dom_node)#66 (9) {
  ["nodetype"]=>
  int(1)
  ["tag"]=>
  string(2) "li"
  ["attr"]=>
  array(0) {
  }
  ["children"]=>
  array(1) {
    [0]=>
    object(simple_html_dom_node)#67 (9) {
      ["nodetype"]=>
      int(1)
      ["tag"]=>
      string(1) "a"
      ["attr"]=>
      array(1) {
        ["href"]=>
        string(21) "pdo.requirements.html"
      }
      ["children"]=>
      array(0) {
      }
      ["nodes"]=>
      array(1) {
        [0]=>
        object(simple_html_dom_node)#68 (9) {
          ["nodetype"]=>
          int(3)
          ["tag"]=>
          string(4) "text"
          ["attr"]=>
          array(0) {
          }
          ["children"]=>
          array(0) {
          }
          ["nodes"]=>
          array(0) {
          }
          ["parent"]=>
          *RECURSION*
          ["_"]=>
          array(1) {
            [4]=>
            string(12) "Requirements"
          }

The last bit "Requirements" just after the suspicious looking *RECURSION*  I can see now how the objects and arrays work at the top level but how to address the ["_"][4]?

Share this post


Link to post
Share on other sites

["parent"]["_"][1][4]

I think. You may need to parse each element to drill down to what you want.

Edited by gw1500se

Share this post


Link to post
Share on other sites

Trust me I'm trying, 

$li[$i]->children[0]->attr['href'] is obvious now

$li[$i]->children[0]->children[0]->_[4] aint so obvious!

I need a Prolific Member.....but I suppose we all do :)

Edited by ludo1960

Share this post


Link to post
Share on other sites

Already tried that:

 echo $li[$i]->children[0]->["parent"]->["_"]->[1]->[4] . '<br>' ; 

and

 echo $li[$i]->children[0]->["parent"]->['_']->[1]->[4] . '<br>' ; 

and lots of other guesses

Thanks for chipping in though!

Share this post


Link to post
Share on other sites

You need to understand the difference between accessing an object via pointer (->) and an array element [...] or '=>'.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.