haku: I think his problem is specific. The one that don't understand is you.
I'm also getting annoyed by the mode that DOM Functions in PHP validate content.
Example:
I want to parse a website, and I know that a <p> tag with the "id" attribute set to "caption" holds user input.
But the user input can be ANYTHING, including those '<' and '>' characters (no htmlentities(), of course).
So, if I get the HTML:
<html>
<head>
</head>
<body>
<p id="caption">hi guyzzzz my name iz h4x0000r and my clan tag iz <1337> soo im <1337>h4x0000r =)</p>
</body>
</html>
I can use the C14N() method to get raw data from a node, but the problem it's when i use loadHTML() method.
That method adds some extra information, and parses the tag <1337>, to something like:
<html>
<head>
</head>
<body>
<p id="caption">hi guyzzzz my name iz h4x0000r and my clan tag iz <1337> soo im <1337>h4x0000r =)</1337></1337></p>
</body>
</html>
It adds the closing tags. Yes, correct form would be to change <1337> to </1337>, but user doesn't know this.
And the web-site doesn't use htmlentities(), (yes, tags are not displayed in screen, you can only see them in source code).
So, how can I get the REAL raw data?
I mean
<p id="caption">hi guyzzzz my name iz h4x0000r and my clan tag iz <1337> soo im <1337>h4x0000r =)</p>
Instead of
<p id="caption">hi guyzzzz my name iz h4x0000r and my clan tag iz <1337> soo im <1337>h4x0000r =)</1337></1337></p>
Thanks in advance.