rarebit Posted November 14, 2008 Share Posted November 14, 2008 Whats a standardly included way of parsing html / xml documents? Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/ Share on other sites More sharing options...
.josh Posted November 14, 2008 Share Posted November 14, 2008 parse it for what? Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689847 Share on other sites More sharing options...
rhodesa Posted November 14, 2008 Share Posted November 14, 2008 If it's strict XML or XHTML, you can use any of the PHP XML Parsing packages. SimpleXML is by far the easiest. If it's just HTML from some page, in which you can't guarantee it's well formed, you will have to use regular expressions to strip out the data you want. Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689851 Share on other sites More sharing options...
rarebit Posted November 14, 2008 Author Share Posted November 14, 2008 Basically into some form of array so it can be studied for seo quality... DOMDocument, Tidy and a few others don't seem to be installed on my server, so i'm currently having a look at XML Parser and simpleXML, but have also grabbed a few preg_split concoctions. Any suggestions? Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689852 Share on other sites More sharing options...
rhodesa Posted November 14, 2008 Share Posted November 14, 2008 If you are just reading the content, SimpleXML is probably best. DomDocument becomes better when you are creating/manipulating complex XML files. Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689855 Share on other sites More sharing options...
rarebit Posted November 14, 2008 Author Share Posted November 14, 2008 How do you parse a definitely html page using SimpleXML, I keep getting errors... $fn = "../test_pages/lq/index.html"; $s = file_get_contents($fn); $dom = null; try { $dom = @new SimpleXMLElement($s); } catch (Exception $e) { echo "error"; } print_r($dom); Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689860 Share on other sites More sharing options...
rarebit Posted November 14, 2008 Author Share Posted November 14, 2008 I used this as a base... Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689902 Share on other sites More sharing options...
rhodesa Posted November 14, 2008 Share Posted November 14, 2008 $fn = "../test_pages/lq/index.html"; $s = file_get_contents($fn); $dom = simplexml_load_string($s); print_r($dom); ...but again...unless it's strict XHTML, it will error out Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689941 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.