Jump to content

PHP DOM Object


drewbee

Recommended Posts

I was wondering if anyone in here has much experience with the php DOM Object within PHP 5?

 

I am going to be parsing some data off of any given submited site, and was originally developing regular expressions to handle this. However, I ran accross this PHP DOM Object (I never new it existed) and I am wondering how forgiving it is in relation to poorly formated html/xhtml.

 

Does the document have to be perfect for this to work well? Or does it pick up on all the random garbage that consists of a poorly coded page?

 

I am just looking into some insight on this...

 

Thanks for your comments / thoughts.

Link to comment
https://forums.phpfreaks.com/topic/94570-php-dom-object/
Share on other sites

Wow... it definately isn't THAT strict.

 

<html>

<head>

<title>title test</title>

<body><a href="link.html">Normal Link</a>

<a href=link.html>Link no Quotes</a>

<a href=link.html rel=nofollow>Rel no follow no quotes</a>

<a rel=nofollow href=link.html>Rel no follow first no quotes</a>

</body>

</html>

 

Outputted once parsed by the DOM and DOMXpath

 

Normal Link ()

Link no Quotes ()

Rel no follow no quotes (nofollow)

Rel no follow first no quotes (nofollow)

 

As far as I am concerned, if the html is bad enough not to be picked up by this, then I dont need to be scrapping there page :D This is more then satisfactory.

Link to comment
https://forums.phpfreaks.com/topic/94570-php-dom-object/#findComment-485375
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.