Jump to content

Recommended Posts

I want to scrape a website content. here is the example html source code of that site.

<div class="entry-content">
<h2>hi tags?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>sometext</li>
     <li>sometext</li>
</ul>
<h2>hi tags2 ?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>To ometext</li>
     <li>Theometext</li>
</ul> </div>

I want to extract data of <li> tags from first <ul> html code. Here I've tried.  

include('../simple_html_dom.php');
// get DOM from URL or file
//$html =  check above html code

$articles =  $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : [];
foreach($articles as $article) {
    $items = $article->find('ul',0) ? $article->find('ul',0) : false;
    if($items !==false){
        $lis = $item->find('li') ? $item->find('li') : [];
        foreach($lis as $b){ 
            $mcpcons .= $b->plaintext;
        }
    }
}


        Help me by giving the correct info how can I do that?

$articles =  $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : [];
foreach($articles as $article) 
{
	$items = $article->find('ul',0) ? $article->find('ul',0) : false;
	if($items !==false)
	{
		$lis = $item->find('li') ? $item->find('li') : [];
		foreach($lis as $b)
		{ 
			$mcpcons .= $b->plaintext;
		}
	}
}

Not familiar with this type of exerise at all but I ask this:

Since the first line that creates $articles apparently collects multiple div blocks and you are looping thru them, do you not also need to do that for the $items collection of ul blocks?  And just where is $item defined?  Perhaps that needs to be in this (missing) loop?

Edited by ginerjm

Maybe

$xml = simplexml_load_string($html);
$ul = $xml->xpath("//ul");
$tags = (array) $ul[0]->li;

 

$tags= Array
(
    [0] => some text
    [1] => sometext
    [2] => sometext
    [3] => sometext
)

 

18 hours ago, ginerjm said:
$articles =  $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : [];
foreach($articles as $article) 
{
	$items = $article->find('ul',0) ? $article->find('ul',0) : false;
	if($items !==false)
	{
		$lis = $item->find('li') ? $item->find('li') : [];
		foreach($lis as $b)
		{ 
			$mcpcons .= $b->plaintext;
		}
	}
}

Not familiar with this type of exerise at all but I ask this:

Since the first line that creates $articles apparently collects multiple div blocks and you are looping thru them, do you not also need to do that for the $items collection of ul blocks?  And just where is $item defined?  Perhaps that needs to be in this (missing) loop?

Yeah after I collect data from $articles. I need to collect first ul blocks not others ul blocks. Then I need to get all li blocks from this ul blocks. 

9 minutes ago, shaadamin44 said:

How can I find this div[class="entry-content"] class with Barand code. I'm not sure about simplexml load function.

like this

$html = '<html>
<body>
<div class="entry-content">
<h2>hi tags?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>sometext</li>
     <li>sometext</li>
</ul>
<h2>hi tags2 ?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>To ometext</li>
     <li>Theometext</li>
</ul> </div>
</body>
</html>';

$xml = simplexml_load_string($html);
$tags = (array) $xml->xpath("//div[@class='entry-content']/ul")[0]->li;

echo '<pre>$tags= ' . print_r($tags, 1) . '</pre>';

 

  • Like 1

Here's a simple_html_dom/simplexml hybrid solution (the one you can't do !)

$html = '<html>
<body>
<div class="entry-content">
<h2>hi tags?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>sometext</li>
     <li>sometext</li>
</ul>
<h2>hi tags2 ?</h2>
<ul>
     <li>some text</li>
     <li>sometext</li>
     <li>To ometext</li>
     <li>Theometext</li>
</ul> </div>
</body>
</html>';

$dom = new simple_html_dom();
$dom->load($html);

$div = $dom->find('div[class="entry-content"]')[0]->__toString();
$xml = simplexml_load_string($div);
$tags = (array) $xml->xpath("//ul")[0]->li;

echo '<pre>' . print_r($tags, 1) . '</pre>';

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.