Jump to content

Which one is best html parser SimpleHtmlDom or PHPQuery or Ganon?


Recommended Posts

What language are you more comfortable with or going to be using?

 

Depends what you really want to parse, data trying to get, cleaned html, possibly works better but then maybe not, having more control or not.

That list is a lot of third party premade classes or applications and parsed how they deemed it. I suppose can extend onto those classes more if willing to study them a while.

 

If you want to do it directly and have control of what gets parsed along with output...use DOM, SimpleXML, for anything malformed or not within tags you can do preg_match / preg_match_all with some regex

 

As far as I know is not one complete solution that does every document type and also everything within the document let alone handle malformed data well, you have to make your own most of the time or learn to embrace errors. I know this because I had to make a universal website,page,document,media parser using the above methods.

 

EDIT:

Another suggestion is to use curl and follow any redirects including javascript.

If you use anything else ensure is a protocol and create a stream context or can fail easily with the connection.

Edited by QuickOldCar
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.