Jump to content

Can't scrape/find 3 html tags in php simple html dom perser


Recommended Posts

I want to scrape a website content. here is the example html source code of that site.

<div class="entry-content">
<h2>hi tags?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>sometext</li>
     <li>sometext</li>
</ul>
<h2>hi tags2 ?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>To ometext</li>
     <li>Theometext</li>
</ul> </div>

I want to extract data of <li> tags from first <ul> html code. Here I've tried.  

include('../simple_html_dom.php');
// get DOM from URL or file
//$html =  check above html code

$articles =  $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : [];
foreach($articles as $article) {
    $items = $article->find('ul',0) ? $article->find('ul',0) : false;
    if($items !==false){
        $lis = $item->find('li') ? $item->find('li') : [];
        foreach($lis as $b){ 
            $mcpcons .= $b->plaintext;
        }
    }
}


        Help me by giving the correct info how can I do that?

Link to comment
Share on other sites

Posted (edited)
$articles =  $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : [];
foreach($articles as $article) 
{
	$items = $article->find('ul',0) ? $article->find('ul',0) : false;
	if($items !==false)
	{
		$lis = $item->find('li') ? $item->find('li') : [];
		foreach($lis as $b)
		{ 
			$mcpcons .= $b->plaintext;
		}
	}
}

Not familiar with this type of exerise at all but I ask this:

Since the first line that creates $articles apparently collects multiple div blocks and you are looping thru them, do you not also need to do that for the $items collection of ul blocks?  And just where is $item defined?  Perhaps that needs to be in this (missing) loop?

Edited by ginerjm
Link to comment
Share on other sites

18 hours ago, ginerjm said:
$articles =  $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : [];
foreach($articles as $article) 
{
	$items = $article->find('ul',0) ? $article->find('ul',0) : false;
	if($items !==false)
	{
		$lis = $item->find('li') ? $item->find('li') : [];
		foreach($lis as $b)
		{ 
			$mcpcons .= $b->plaintext;
		}
	}
}

Not familiar with this type of exerise at all but I ask this:

Since the first line that creates $articles apparently collects multiple div blocks and you are looping thru them, do you not also need to do that for the $items collection of ul blocks?  And just where is $item defined?  Perhaps that needs to be in this (missing) loop?

Yeah after I collect data from $articles. I need to collect first ul blocks not others ul blocks. Then I need to get all li blocks from this ul blocks. 

Link to comment
Share on other sites

9 minutes ago, shaadamin44 said:

How can I find this div[class="entry-content"] class with Barand code. I'm not sure about simplexml load function.

like this

$html = '<html>
<body>
<div class="entry-content">
<h2>hi tags?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>sometext</li>
     <li>sometext</li>
</ul>
<h2>hi tags2 ?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>To ometext</li>
     <li>Theometext</li>
</ul> </div>
</body>
</html>';

$xml = simplexml_load_string($html);
$tags = (array) $xml->xpath("//div[@class='entry-content']/ul")[0]->li;

echo '<pre>$tags= ' . print_r($tags, 1) . '</pre>';

 

  • Like 1
Link to comment
Share on other sites

Here's a simple_html_dom/simplexml hybrid solution (the one you can't do !)

$html = '<html>
<body>
<div class="entry-content">
<h2>hi tags?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>sometext</li>
     <li>sometext</li>
</ul>
<h2>hi tags2 ?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>To ometext</li>
     <li>Theometext</li>
</ul> </div>
</body>
</html>';

$dom = new simple_html_dom();
$dom->load($html);

$div = $dom->find('div[class="entry-content"]')[0]->__toString();
$xml = simplexml_load_string($div);
$tags = (array) $xml->xpath("//ul")[0]->li;

echo '<pre>' . print_r($tags, 1) . '</pre>';

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.