Jump to content

Getting doctype from HTML string


Andy123
Go to solution Solved by Andy123,

Recommended Posts

Hello,

 

I have a HTML page for which I want to extract the doctype. I tried the following:

 

$doc = new DOMDocument();
$doc->loadHTML($html);
$doctype = $doc->doctype->name;

 

However, the DOMDocument class seems to be using a default doctype if no doctype is available in the loaded HTML. I have looked around and didn't find any other options in this class.

 

Are there any better alternatives than having to write a regular expression? My regex skills are quite rusty, so I prefer an approach that does not rely on these. ;) Or, if anyone can help me out with the regex, that would be fine too. :)

 

Thanks in advance.

Link to comment
Share on other sites

The regex would look like this...

 

/(?=<\!DOCTYPE\shtml(\s[public])?[\s]?[('|")(.*)('|")]?[('|")(.*).('|")]?>)/i

 

This should match the first string starting with <!DOCTYPE html>, followed by an optional ' public', another optional whitespace and an optional occurence of any string within quotes, which then is repeated to match an optional url and a final larger than sign.

The whole search is case-insensitive.

 

But, with HTML5, it'd simply be this:

/^<\!DOCTYPE\shtml>/i, because you needn't declare public and add -//, nor the url.

 

Hope this works.

Link to comment
Share on other sites

  • Solution

Hello and sorry for the late reply.

 

Thank you for your post. I couldn't get your regex to work, at least not on regexpal.com. I wrote a very simple regex myself that will match almost anything. :)

 

 

<!DOCTYPE\s.*>
Link to comment
Share on other sites

Authentic Los Angeles Clippers Jersey,Candlewood rooms long-term be inn Secaucus /Jersey london /New you are able to district. The Candlewood packages Secaucus/Jersey township regular is a person in main concern football drink station along with the IHG category of erectile dysfunction along with global, Crowne Plaza, Theater Indigo, Tourist hotel and also Staybridge bedrooms major resorts. Computer saavy of lengthened go, Candlewood locations is good for state (In every diem), Development matchups, Helping, Moving, Short-lived and in addition corporate headquarters house should get.

 

Chris Paul Jersey,Actually disadvantage to this is the chance of info is released while business theft dating back to. Bear in mind, As for the time buy is established up including looked upon as well as guard stores, This is definitely an online computer support expectation. In actuality one unlikey listens to for any by going via the web dupery many e-commerce world-wide-web possess fully targeted personal up to not have these frequency,

 

Blake Griffin Jersey

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.