ajit_india Posted September 25, 2015 Share Posted September 25, 2015 Hi Everybody, I am looking for best and fast html parser. So can anyone please suggest which html parser i need to use. https://en.wikipedia.org/wiki/Comparison_of_HTML_parsers Regards, Ajit Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted September 25, 2015 Share Posted September 25, 2015 (edited) What language are you more comfortable with or going to be using? Depends what you really want to parse, data trying to get, cleaned html, possibly works better but then maybe not, having more control or not. That list is a lot of third party premade classes or applications and parsed how they deemed it. I suppose can extend onto those classes more if willing to study them a while. If you want to do it directly and have control of what gets parsed along with output...use DOM, SimpleXML, for anything malformed or not within tags you can do preg_match / preg_match_all with some regex As far as I know is not one complete solution that does every document type and also everything within the document let alone handle malformed data well, you have to make your own most of the time or learn to embrace errors. I know this because I had to make a universal website,page,document,media parser using the above methods. EDIT: Another suggestion is to use curl and follow any redirects including javascript. If you use anything else ensure is a protocol and create a stream context or can fail easily with the connection. Edited September 26, 2015 by QuickOldCar Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.