Jump to content

Problem with DOMDocument->DOMDocumentType


dave1

Recommended Posts

I assumed that this property would be null if no doctype exists, instead it's never null.

 

If no doctype is present, it will still return a DOMDocumentType object that is:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

 

example:

$dom = new DomDocument();

$dom->loadHTML($html);

$doctype = $dom->doctype;

echo $doctype->internalSubset;

 

Anyone know what's going on, or an alternate way I can parse the html and extract the doctype? Can this be done with DOMXPath?

 

Thanks.

 

As you can see, when using DOMDocument::loadHTML with no doctype present then it will automatically add the HTML 4.0 Transitional doctype to the document. As far as I'm aware, there is no flag to turn this behaviour off.  If you want to find out the presence of a doctype then it might mean manually inspecting the source HTML (doctypes need to be at the beginning of the source so it's not too difficult).

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.