shaadamin44 Posted June 17, 2021 Share Posted June 17, 2021 require_once 'phpSimpleHtmlDomClass.php'; $html = '<div> <div class="man">Name: madac</div> <div class="man">Age: 18 <div class="man">Class: 12</div> </div>' $name=$html->find('div[class="man"]', 0)->innertext; $age=$html->find('div[class="man"]', 1)->innertext; $cls=$html->find('div[class="man"]', 2)->innertext; wanna get a text from each div class="man" but it didn't work because there is a missing closing div tag on 2nd line of html code. please help me to fix this. thanks in advance. Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/ Share on other sites More sharing options...
requinix Posted June 17, 2021 Share Posted June 17, 2021 Add the closing div tag? Where is the HTML coming from? Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587311 Share on other sites More sharing options...
shaadamin44 Posted June 17, 2021 Author Share Posted June 17, 2021 16 minutes ago, requinix said: Add the closing div tag? Where is the HTML coming from? From a website Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587315 Share on other sites More sharing options...
requinix Posted June 17, 2021 Share Posted June 17, 2021 30 minutes ago, shaadamin44 said: From a website A website that you control? Someone else's? Can you tell them to fix their markup so that it's, you know, syntactically valid HTML? Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587316 Share on other sites More sharing options...
Barand Posted June 17, 2021 Share Posted June 17, 2021 (edited) The only difference that missing </div> makes is that the second find() returns the text up to the next </div>, thus giving $name = "Name: madac" $age = "Age: 18 <div class='man'>Class: 12</div> " $cls = "Class: 12" So you just need to look for and trim off the excess "<div> ... </div>" Perhaps... $html = str_get_html('<div> <div class="man">Name: madac</div> <div class="man">Age: 18 <div class="man">Class: 12</div> </div>'); $name = trim_html($html->find('div[class="man"]', 0)->innertext); $age = trim_html($html->find('div[class="man"]', 1)->innertext); $cls = trim_html($html->find('div[class="man"]', 2)->innertext); function trim_html($str) { if ( ($p = strpos($str, '<')) !== false) { $str = substr($str, 0, $p); } return trim($str); } Edited June 17, 2021 by Barand 1 Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587317 Share on other sites More sharing options...
shaadamin44 Posted June 17, 2021 Author Share Posted June 17, 2021 2 hours ago, requinix said: A website that you control? Someone else's? Can you tell them to fix their markup so that it's, you know, syntactically valid HTML? That's a government site. I've no access to the site. Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587322 Share on other sites More sharing options...
requinix Posted June 17, 2021 Share Posted June 17, 2021 Regular expressions are also an option. Once you've drilled down as far into the HTML as you can, you can much more safely look for things like "Age: <number>". Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587323 Share on other sites More sharing options...
shaadamin44 Posted June 17, 2021 Author Share Posted June 17, 2021 2 hours ago, Barand said: The only difference that missing </div> makes is that the second find() returns the text up to the next </div>, thus giving $name = "Name: madac" $age = "Age: 18 <div class='man'>Class: 12</div> " $cls = "Class: 12" So you just need to look for and trim off the excess "<div> ... </div>" Perhaps... $html = str_get_html('<div> <div class="man">Name: madac</div> <div class="man">Age: 18 <div class="man">Class: 12</div> </div>'); $name = trim_html($html->find('div[class="man"]', 0)->innertext); $age = trim_html($html->find('div[class="man"]', 1)->innertext); $cls = trim_html($html->find('div[class="man"]', 2)->innertext); function trim_html($str) { if ( ($p = strpos($str, '<')) !== false) { $str = substr($str, 0, $p); } return trim($str); } I've tried your code. That's amazingly worked for me. Thank you so much. Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587325 Share on other sites More sharing options...
Psycho Posted June 17, 2021 Share Posted June 17, 2021 You really need to analyze the source data to identify the different data and types of issues that can exist in order to determine the appropriate solution. For example, the solution @Barand provided would be acceptable if: 1) The only issues are missing closing tags 2) None of the "data" contains a greater than sign If there are other types of issues or if data can contain the < symbol then a different solution is in order. Quote Link to comment https://forums.phpfreaks.com/topic/312930-need-help-for-fixing-html-when-scrape-using-php-simple-html-dom-parser/#findComment-1587326 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.