NotionCommotion Posted July 15, 2016 Share Posted July 15, 2016 Given string $html, how can I: 1) Tell that it is valid HTML? 2) Count p.p for each given data-id (i.e. 1=1, 2=2, 3=1) $html = <<<EOD <p>Hello</p> <div> <p class="p" data-id="3"></p> <p class="p" data-id="2"></p> </div> <div> <p class="p" data-id="1"></p> <div> <p class="p" data-id="2"></p> </div> </div> EOD; Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/ Share on other sites More sharing options...
ginerjm Posted July 15, 2016 Share Posted July 15, 2016 Huh? Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534532 Share on other sites More sharing options...
Barand Posted July 15, 2016 Share Posted July 15, 2016 does this help? $html = '<html>'; $html .= <<<EOD <p>Hello</p> <div> <p class="p" data-id="3"></p> <p class="p" data-id="2"></p> </div> <div> <p class="p" data-id="1"></p> <div> <p class="p" data-id="2"></p> </div> </div> EOD; $html .= '</html>'; $x = simplexml_load_string($html); $counts = array_fill_keys(range(1,4),0); foreach ($x->xpath('//p[@class="p"]') as $p) { $id = intval($p['data-id']); $counts[$id]++; } echo '<pre>' . print_r($counts, 1) . '</pre>'; Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534534 Share on other sites More sharing options...
requinix Posted July 15, 2016 Share Posted July 15, 2016 To validate the HTML... well, you know HTML is rather flexible. "string" is valid. " string" is valid". Some stuff you might consider invalid is still acceptable to browsers. I'm thinking (a) just load it with anything in PHP that supports HTML and see if it complains, or (b) try Tidy. Maybe your end result should be less valid/invalid but whether it's supposed to be valid already and you can just fix it if it has minor errors? Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534550 Share on other sites More sharing options...
Jacques1 Posted July 15, 2016 Share Posted July 15, 2016 How on earth did you even end up with this weird problem? What's the overall goal you're trying to achieve? The second part sounds like you're webscraping somebody else's site, but then why would you validate the markup? To file a complaint if it's invalid? Or is this your markup? Then why would you choose HTML as your data format? That's a terrible choice compared to pretty much anything else. Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534552 Share on other sites More sharing options...
NotionCommotion Posted July 16, 2016 Author Share Posted July 16, 2016 does this help? Thank you, yes, it seems to help. I've used the DOMDocument class in the past, but never simplexml_load_string(), and need to spend a little more time playing around with it. Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534561 Share on other sites More sharing options...
NotionCommotion Posted July 16, 2016 Author Share Posted July 16, 2016 (edited) To validate the HTML... well, you know HTML is rather flexible. "string" is valid. "<p>string" is valid". Some stuff you might consider invalid is still acceptable to browsers. I'm thinking (a) just load it with anything in PHP that supports HTML and see if it complains, or (b) try Tidy. Maybe your end result should be less valid/invalid but whether it's supposed to be valid already and you can just fix it if it has minor errors? In regards to loading it with anything in PHP that supports HTML, you mean something like simplexml_load_string()? Below is the output of a slightly modified version of Barand's script (changed <p> to <xp>). I suppose I can then set a new error handler to "catch" the warning, and then restoring it to the previous error handler afterwards. Warning: simplexml_load_string(): Entity: line 1: parser error : Opening and ending tag mismatch: xp line 1 and p in /var/www/application/lib/testing/parse.php on line 20 Warning: simplexml_load_string(): <html><xp>Hello</p> in /var/www/application/lib/testing/parse.php on line 20 Warning: simplexml_load_string(): ^ in /var/www/application/lib/testing/parse.php on line 20 Fatal error: Call to a member function xpath() on a non-object in /var/www/application/lib/testing/parse.php on line 24 I've never used tidy() before. How do you envision this working? Another option maybe is an api into https://validator.w3.org/docs/api.html? Don't know how it will work yet. Thanks Edited July 16, 2016 by NotionCommotion Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534562 Share on other sites More sharing options...
NotionCommotion Posted July 16, 2016 Author Share Posted July 16, 2016 How on earth did you even end up with this weird problem? What's the overall goal you're trying to achieve? The second part sounds like you're webscraping somebody else's site, but then why would you validate the markup? To file a complaint if it's invalid? Or is this your markup? Then why would you choose HTML as your data format? That's a terrible choice compared to pretty much anything else. Creating a simple Drupal module which allows the site's administrator to add multiple HTML strings which will be stored in a DB and later displayed on the site. Special elements identified by a given class (I used class "p", but in practice will use something more descriptive) will be replaced with some other content (TBD either clientside using JavaScript of serverside using PHP), and that content will be based on the data-id value. There is also a record corresponding to each data-id value, and I wish to prevent that record from being deleted should an element with that given data-id value exist in any of the saved HTML strings. So, I wish to validate the user's HTML, and wish to determine (i.e. count) whether an element with a given data-id attribute exists in any of the HTML strings. Concerns? Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534563 Share on other sites More sharing options...
NotionCommotion Posted July 17, 2016 Author Share Posted July 17, 2016 I've never used tidy() before. How do you envision this working? I can envision how it will work, and think it is perfect. I might have some questions regarding doctype and the like, but will if necessary create a separate post. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/301484-validating-html-and-counting-elements/#findComment-1534579 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.