Jump to content

array hierarchy and filter


ludo1960

Recommended Posts

Hi guys,

I'm trying to build an array to replicate the hierarchy in a menu:


<ul>
<li><a href="file1.html">text1</a></li>

<ul>
<li><a href="file2.html">text2</a></li>
<li><a href="file3.html">text3</a></li>
<li><a href="file4.html">text4</a></li>
<li><a href="file5.html">text5</a></li>
</ul>

</ul>

An i would like the output to be:

"text1"
"text1", "text2"
"text1", "text3"
"text1", "text4"
"text1", "text5"
 

Here is my loop to go through the html hierarchy:

        foreach ($html2->find('ul') as $ul) {
            foreach ($ul->find('li') as $li) {
                foreach($li->find('a') as $a) { 
                    // need to filter out empty and index.html, tried if(!$->href = 'index.html) {do stuff} but didn't work
                    $links2[] = $a->href ;
                    $taxo2[] = $a->plaintext ;                   
                }
            }
        }

This finds all the links but not the hierarchy, any ideas how to approach this? And also how to filter out blanks and references to index.html?

Link to comment
Share on other sites

It is not clear what you mean. Your code certainly will not do what you ask. You don't save the text from the <li> that follows the first <ul>. It sounds like you want an associative array where the key is 'text1' and the value is an array of 'text2', 'text3', etc.

Link to comment
Share on other sites

Yeah, unless there is a particular need for that 1/2, 1/3, 1/4, 1/5 list then it really should be an array containing one entry for 1, that itself has another array of 2-5.

array(
	"text1" => array(
		"text2",
		"text3",
		"text4",
		"text5"
	)
)

or

array(
	array(
		"name" => "text1",
		"items" => array(
			array(
				"name" => "text2",
				"items" => array()
			),
			array(
				"name" => "text3",
				"items" => array()
			),
			array(
				"name" => "text4",
				"items" => array()
			),
			array(
				"name" => "text5",
				"items" => array()
			)
		)
	)
)

Also your HTML is incorrect: the UL needs to be within the parent LI. Outside it is invalid.

Keep in mind that ->find() will find all children, so ->find(a) on the text1 LI will find all five links. A better approach would be to find the A from the LIs set of immediate children, then from there go recursively into the immediate child UL if any.

Link to comment
Share on other sites

@requinix

Quote

A better approach would be to find the A from the LIs set of immediate children, then from there go recursively into the immediate child UL if any.

Sorry if my request wasn't clear, it's just that I am confused as to how to traverse the ul's and the  l'si to get the immediate children a's, the help for DOM parser isn't clear to me how to find child nodes if indeed there are any. The idea behind me wanting the output to be text1... is that I want the structure of the array to reflect the structure of the ul's and li's for simplicity's sake. Thank you both for taking the time to answer my post, your help is greatly appreciated.

Link to comment
Share on other sites

Hello again,

yes I have read the docs, but there seems to be a tiny wee gap in my interpretation :)

    foreach ($html->find('li') as $li) {
       
        $str1[] = $li->find('a')->first_child();

       // foreach ($li->find('a') as $a) {
       // $a->find('#layout', 0)->children(1)->children(1)->children(2)->id ;
       // }
    }

All attempts end in abject failure, 

Quote

Fatal error: Uncaught Error: Call to a member function first_child() on array 

 

What am I doing wrong?

Link to comment
Share on other sites

Take a look at how you're making the call and how the documentation makes the call. I've updated the examples a bit to make the comparison more direct, I hope.

Yours:

$li->find('a')->first_child();

Theirs:

$li->find('a', 0)->first_child();

And the error message:

Call to a member function first_child() on array

And finally, the documentation itself states (modified for emphasis):

Quote

// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', n);

Hope that helps.

Link to comment
Share on other sites

Tried both your suggestions:

   foreach ($html->find('li') as $li) {
    
     $str1 = $li->find('a', 0)->first_child();

    }

Results in:  Fatal error: Uncaught Error: Call to a member function first_child() on null

$str2 = $html->find('a', 0);

Results in: 

Fatal error:  Allowed memory size of 2147483648 bytes exhausted 

I am missing something fundamental here, I'm all out of ideas.

If I promise never to laugh again at your beloved President, would you help me out?

Link to comment
Share on other sites

You need to learn some debugging techniques so you can help yourself. First use:

echo "<pre>";
var_dump($html->find('li'));
echo "</pre>";

to see exactly what you are retrieving. From there you should be able to figure out what to use to parse that result or if you are not getting what you expect.

Edited by gw1500se
Link to comment
Share on other sites

Eek!! 

object(simple_html_dom_node)#27 (9) {
  ["nodetype"]=>
  int(1)
  ["tag"]=>
  string(2) "li"
  ["attr"]=>
  array(1) {
    ["style"]=>
    string(12) "float: left;"
  }
  ["children"]=>
  array(1) {
    [0]=>
    object(simple_html_dom_node)#28 (9) {
      ["nodetype"]=>
      int(1)
      ["tag"]=>
      string(1) "a"
      ["attr"]=>
      array(1) {
        ["href"]=>
        string(25) "function.odbc-tables.html" ......

Yes you are 100% correct, I need to learn some debugging techniques. Never seen an array this big! So to access the children I dont understand why 

 $str1 = $li->children() ;

Does not work? I thought that is how you access elements in an object. What am i missing?

Link to comment
Share on other sites

20 hours ago, ludo1960 said:

Tried both your suggestions: 

They weren't intended to be copy and paste solutions. They were a mashup of the documentation and your code with the goal of showing you how the function call needs to be made to get you the results you want. Read the code, read the documentation, then apply logic. As @gw1500se says, $str1 isn't a string, it's an object with some properties that are arrays and it needs to be treated as such.

Link to comment
Share on other sites

I know you guys are trying to point me in the right direction, but after hours going around in circles I just can't figure out how to access an object that is in an array. Must be obvious for you guys, but I can't see it. Remember when you were first learning the dark art of PHP and you had a WTF moment? Well that's me right now, 

Link to comment
Share on other sites

Trying your code:

    echo "<pre>";
    print_r($html->find('li'));
    echo "</pre>";

results in:

Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 1071648768 bytes) 

whereas:

    echo "<pre>";
    var_dump($html->find('li'));
    echo "</pre>";

spits out the largest array known to mankind:

array(9) {
  [0]=>
  object(simple_html_dom_node)#27 (9) {
    ["nodetype"]=>
    int(1)
    ["tag"]=>
    string(2) "li"
    ["attr"]=>
    array(1) {
      ["style"]=>
      string(12) "float: left;"
    }
    ["children"]=>
    array(1) {
      [0]=>
      object(simple_html_dom_node)#28 (9) {
        ["nodetype"]=>
        int(1)
        ["tag"]=>
        string(1) "a"
        ["attr"]=>
        array(1) {
          ["href"]=>
          string(14) "intro.pdo.html"
        }
        ["children"]=>
        array(0) {
        }
        ["nodes"]=>
        array(1) {
          [0]=>
          object(simple_html_dom_node)#29 (9) {
            ["nodetype"]=>
            int(3)
            ["tag"]=>
            string(4) "text"
            ["attr"]=>
            array(0) {
            }
            ["children"]=>
            array(0) {
            }
            ["nodes"]=>
            array(0) {
            }
            ["parent"]=>
            *RECURSION*
            ["_"]=>
            array(1) {
              [4]=>
              string(15) "« Introduction"
            }
            ["tag_start"]=>
            int(0)
            ["dom":"simple_html_dom_node":private]=>
            object(simple_html_dom)#2 (23) {
              ["root"]=>
              object(simple_html_dom_node)#3 (9) {............ad infinitum!!

 

Link to comment
Share on other sites

Give this a try:

$li = $html->find('li');
print("<p>{$li[0]->children[0]->attr['href']}</p>");

and see if you can follow the track through the output of the var_dump() function.

Then try  this:

$li = $html->find('li', 0);
print("<p>{$li->children[0]->attr['href']}</p>");

and follow that as well. Coupled with the documentation and the comments above, things will hopefully start to look a little clearer...

Link to comment
Share on other sites

Halle friggen lujah!!! Am I using the right approach?

    for ( $i = 0 ; $i < count($li) ; $i++ ) {
        echo $li[$i]->children[0]->attr['href'] . '<br>' ; 
        //echo $li[$i]->children[0]->children[0] . '<br>' ;
        
    } 

Gets me the child nodes on the page visited, all good and well but I also need the text from the href, it's buried deeper in the array/object:

object(simple_html_dom_node)#66 (9) {
  ["nodetype"]=>
  int(1)
  ["tag"]=>
  string(2) "li"
  ["attr"]=>
  array(0) {
  }
  ["children"]=>
  array(1) {
    [0]=>
    object(simple_html_dom_node)#67 (9) {
      ["nodetype"]=>
      int(1)
      ["tag"]=>
      string(1) "a"
      ["attr"]=>
      array(1) {
        ["href"]=>
        string(21) "pdo.requirements.html"
      }
      ["children"]=>
      array(0) {
      }
      ["nodes"]=>
      array(1) {
        [0]=>
        object(simple_html_dom_node)#68 (9) {
          ["nodetype"]=>
          int(3)
          ["tag"]=>
          string(4) "text"
          ["attr"]=>
          array(0) {
          }
          ["children"]=>
          array(0) {
          }
          ["nodes"]=>
          array(0) {
          }
          ["parent"]=>
          *RECURSION*
          ["_"]=>
          array(1) {
            [4]=>
            string(12) "Requirements"
          }

The last bit "Requirements" just after the suspicious looking *RECURSION*  I can see now how the objects and arrays work at the top level but how to address the ["_"][4]?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.