shaadamin44 Posted May 29, 2022 Share Posted May 29, 2022 I want to scrape a website content. here is the example html source code of that site. <div class="entry-content"> <h2>hi tags?</h2> <ul> <li>some text</li> <li>sometext</li> <li>sometext</li> <li>sometext</li> </ul> <h2>hi tags2 ?</h2> <ul> <li>some text</li> <li>sometext</li> <li>To ometext</li> <li>Theometext</li> </ul> </div> I want to extract data of <li> tags from first <ul> html code. Here I've tried. include('../simple_html_dom.php'); // get DOM from URL or file //$html = check above html code $articles = $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : []; foreach($articles as $article) { $items = $article->find('ul',0) ? $article->find('ul',0) : false; if($items !==false){ $lis = $item->find('li') ? $item->find('li') : []; foreach($lis as $b){ $mcpcons .= $b->plaintext; } } } Help me by giving the correct info how can I do that? Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/ Share on other sites More sharing options...
ginerjm Posted May 29, 2022 Share Posted May 29, 2022 So what exactly are you able to retrieve? Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596809 Share on other sites More sharing options...
shaadamin44 Posted May 29, 2022 Author Share Posted May 29, 2022 19 minutes ago, ginerjm said: So what exactly are you able to retrieve? I can only able to get first <ul> data. I need to make an array using <li> html datas. That's why i need to get li tags data. Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596810 Share on other sites More sharing options...
ginerjm Posted May 29, 2022 Share Posted May 29, 2022 (edited) $articles = $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : []; foreach($articles as $article) { $items = $article->find('ul',0) ? $article->find('ul',0) : false; if($items !==false) { $lis = $item->find('li') ? $item->find('li') : []; foreach($lis as $b) { $mcpcons .= $b->plaintext; } } } Not familiar with this type of exerise at all but I ask this: Since the first line that creates $articles apparently collects multiple div blocks and you are looping thru them, do you not also need to do that for the $items collection of ul blocks? And just where is $item defined? Perhaps that needs to be in this (missing) loop? Edited May 29, 2022 by ginerjm Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596814 Share on other sites More sharing options...
Barand Posted May 29, 2022 Share Posted May 29, 2022 Maybe $xml = simplexml_load_string($html); $ul = $xml->xpath("//ul"); $tags = (array) $ul[0]->li; $tags= Array ( [0] => some text [1] => sometext [2] => sometext [3] => sometext ) Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596817 Share on other sites More sharing options...
shaadamin44 Posted May 30, 2022 Author Share Posted May 30, 2022 18 hours ago, ginerjm said: $articles = $html->find('div[class="entry-content"]') ? $html->find('div[class="entry-content"]') : []; foreach($articles as $article) { $items = $article->find('ul',0) ? $article->find('ul',0) : false; if($items !==false) { $lis = $item->find('li') ? $item->find('li') : []; foreach($lis as $b) { $mcpcons .= $b->plaintext; } } } Not familiar with this type of exerise at all but I ask this: Since the first line that creates $articles apparently collects multiple div blocks and you are looping thru them, do you not also need to do that for the $items collection of ul blocks? And just where is $item defined? Perhaps that needs to be in this (missing) loop? Yeah after I collect data from $articles. I need to collect first ul blocks not others ul blocks. Then I need to get all li blocks from this ul blocks. Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596839 Share on other sites More sharing options...
ginerjm Posted May 30, 2022 Share Posted May 30, 2022 Have you not read Barand's post? I think he is suggesting to you a better way. Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596841 Share on other sites More sharing options...
shaadamin44 Posted May 30, 2022 Author Share Posted May 30, 2022 50 minutes ago, ginerjm said: Have you not read Barand's post? I think he is suggesting to you a better way. I can't use simple dom parser & simplexml_load_string at the same time. Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596859 Share on other sites More sharing options...
ginerjm Posted May 30, 2022 Share Posted May 30, 2022 If that were true, why would Barand suggest it to you? Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596860 Share on other sites More sharing options...
Barand Posted May 30, 2022 Share Posted May 30, 2022 I was offering an alternative Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596861 Share on other sites More sharing options...
shaadamin44 Posted May 30, 2022 Author Share Posted May 30, 2022 6 minutes ago, ginerjm said: If that were true, why would Barand suggest it to you? How can I find this div[class="entry-content"] class with Barand code. I'm not sure about simplexml load function. Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596862 Share on other sites More sharing options...
shaadamin44 Posted May 30, 2022 Author Share Posted May 30, 2022 6 minutes ago, Barand said: I was offering an alternative I know that. But I need to do it on simple html dom. Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596863 Share on other sites More sharing options...
Barand Posted May 30, 2022 Share Posted May 30, 2022 9 minutes ago, shaadamin44 said: How can I find this div[class="entry-content"] class with Barand code. I'm not sure about simplexml load function. like this $html = '<html> <body> <div class="entry-content"> <h2>hi tags?</h2> <ul> <li>some text</li> <li>sometext</li> <li>sometext</li> <li>sometext</li> </ul> <h2>hi tags2 ?</h2> <ul> <li>some text</li> <li>sometext</li> <li>To ometext</li> <li>Theometext</li> </ul> </div> </body> </html>'; $xml = simplexml_load_string($html); $tags = (array) $xml->xpath("//div[@class='entry-content']/ul")[0]->li; echo '<pre>$tags= ' . print_r($tags, 1) . '</pre>'; 1 Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596864 Share on other sites More sharing options...
Barand Posted May 30, 2022 Share Posted May 30, 2022 Here's a simple_html_dom/simplexml hybrid solution (the one you can't do !) $html = '<html> <body> <div class="entry-content"> <h2>hi tags?</h2> <ul> <li>some text</li> <li>sometext</li> <li>sometext</li> <li>sometext</li> </ul> <h2>hi tags2 ?</h2> <ul> <li>some text</li> <li>sometext</li> <li>To ometext</li> <li>Theometext</li> </ul> </div> </body> </html>'; $dom = new simple_html_dom(); $dom->load($html); $div = $dom->find('div[class="entry-content"]')[0]->__toString(); $xml = simplexml_load_string($div); $tags = (array) $xml->xpath("//ul")[0]->li; echo '<pre>' . print_r($tags, 1) . '</pre>'; Quote Link to comment https://forums.phpfreaks.com/topic/314860-cant-scrapefind-3-html-tags-in-php-simple-html-dom-perser/#findComment-1596866 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.