Andy123 Posted April 29, 2013 Share Posted April 29, 2013 Hello, I have a HTML page for which I want to extract the doctype. I tried the following: $doc = new DOMDocument(); $doc->loadHTML($html); $doctype = $doc->doctype->name; However, the DOMDocument class seems to be using a default doctype if no doctype is available in the loaded HTML. I have looked around and didn't find any other options in this class. Are there any better alternatives than having to write a regular expression? My regex skills are quite rusty, so I prefer an approach that does not rely on these. Or, if anyone can help me out with the regex, that would be fine too. Thanks in advance. Quote Link to comment https://forums.phpfreaks.com/topic/277425-getting-doctype-from-html-string/ Share on other sites More sharing options...
Irate Posted April 30, 2013 Share Posted April 30, 2013 The regex would look like this... /(?=<\!DOCTYPE\shtml(\s[public])?[\s]?[('|")(.*)('|")]?[('|")(.*).('|")]?>)/i This should match the first string starting with <!DOCTYPE html>, followed by an optional ' public', another optional whitespace and an optional occurence of any string within quotes, which then is repeated to match an optional url and a final larger than sign. The whole search is case-insensitive. But, with HTML5, it'd simply be this: /^<\!DOCTYPE\shtml>/i, because you needn't declare public and add -//, nor the url. Hope this works. Quote Link to comment https://forums.phpfreaks.com/topic/277425-getting-doctype-from-html-string/#findComment-1427229 Share on other sites More sharing options...
Solution Andy123 Posted May 2, 2013 Author Solution Share Posted May 2, 2013 Hello and sorry for the late reply. Thank you for your post. I couldn't get your regex to work, at least not on regexpal.com. I wrote a very simple regex myself that will match almost anything. <!DOCTYPE\s.*> Quote Link to comment https://forums.phpfreaks.com/topic/277425-getting-doctype-from-html-string/#findComment-1427802 Share on other sites More sharing options...
rooeyd96 Posted May 3, 2013 Share Posted May 3, 2013 Authentic Los Angeles Clippers Jersey,Candlewood rooms long-term be inn Secaucus /Jersey london /New you are able to district. The Candlewood packages Secaucus/Jersey township regular is a person in main concern football drink station along with the IHG category of erectile dysfunction along with global, Crowne Plaza, Theater Indigo, Tourist hotel and also Staybridge bedrooms major resorts. Computer saavy of lengthened go, Candlewood locations is good for state (In every diem), Development matchups, Helping, Moving, Short-lived and in addition corporate headquarters house should get. Chris Paul Jersey,Actually disadvantage to this is the chance of info is released while business theft dating back to. Bear in mind, As for the time buy is established up including looked upon as well as guard stores, This is definitely an online computer support expectation. In actuality one unlikey listens to for any by going via the web dupery many e-commerce world-wide-web possess fully targeted personal up to not have these frequency, Blake Griffin Jersey Quote Link to comment https://forums.phpfreaks.com/topic/277425-getting-doctype-from-html-string/#findComment-1427906 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.