rarebit Posted November 14, 2008 Share Posted November 14, 2008 Whats a standardly included way of parsing html / xml documents? Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/ Share on other sites More sharing options...
.josh Posted November 14, 2008 Share Posted November 14, 2008 parse it for what? Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689847 Share on other sites More sharing options...
rhodesa Posted November 14, 2008 Share Posted November 14, 2008 If it's strict XML or XHTML, you can use any of the PHP XML Parsing packages. SimpleXML is by far the easiest. If it's just HTML from some page, in which you can't guarantee it's well formed, you will have to use regular expressions to strip out the data you want. Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689851 Share on other sites More sharing options...
rarebit Posted November 14, 2008 Author Share Posted November 14, 2008 Basically into some form of array so it can be studied for seo quality... DOMDocument, Tidy and a few others don't seem to be installed on my server, so i'm currently having a look at XML Parser and simpleXML, but have also grabbed a few preg_split concoctions. Any suggestions? Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689852 Share on other sites More sharing options...
rhodesa Posted November 14, 2008 Share Posted November 14, 2008 If you are just reading the content, SimpleXML is probably best. DomDocument becomes better when you are creating/manipulating complex XML files. Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689855 Share on other sites More sharing options...
rarebit Posted November 14, 2008 Author Share Posted November 14, 2008 How do you parse a definitely html page using SimpleXML, I keep getting errors... $fn = "../test_pages/lq/index.html"; $s = file_get_contents($fn); $dom = null; try { $dom = @new SimpleXMLElement($s); } catch (Exception $e) { echo "error"; } print_r($dom); Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689860 Share on other sites More sharing options...
rarebit Posted November 14, 2008 Author Share Posted November 14, 2008 I used this as a base... Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689902 Share on other sites More sharing options...
rhodesa Posted November 14, 2008 Share Posted November 14, 2008 $fn = "../test_pages/lq/index.html"; $s = file_get_contents($fn); $dom = simplexml_load_string($s); print_r($dom); ...but again...unless it's strict XHTML, it will error out Quote Link to comment https://forums.phpfreaks.com/topic/132645-solved-html-parsing/#findComment-689941 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.