web_craftsman Posted September 8, 2016 Share Posted September 8, 2016 At my CMS I want to give site moderators ability to associate any meta information to a page. For meta keywords and description I have different fields but all other stuff are inserted like raw html , like this: <meta name="Generator" content="SomeCMS" /> <meta name="robots" content="nofollow" /> <link rel="canonical" href="http://example.com/content/poisk-i-upravlenie-kontentom" /> This html will be echoed to the page. Mainly only meta tags and link(rel=canonical) will be here. And now I think I have to make sure there is no xss attack in this code. So I need to filter it before saving to database. HtmlPurifier or http://github.com/voku/anti-xss don't work with meta tags. So what would you advise me? To parse text with regexp for meta tags and then check every metatag found for any style or on attributes or http-equiv="refresh"(to deny malicious metatag)? Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/ Share on other sites More sharing options...
Jacques1 Posted September 8, 2016 Share Posted September 8, 2016 (edited) What exactly prevents you from storing the meta elements as key/value pairs rather than raw HTML? This will drastically reduce the risk of XSS vulnerabilities. In any case, allowing arbitrary meta elements is a risk, no matter how hard you try to blacklist dangerous combinations. There will always be a problem you haven't considered yet (for example: using a <meta charset> element to break HTML-escaping). Link elements are even worse, because now we're talking about external resources like stylesheets (which can be used for attacks). Edited September 8, 2016 by Jacques1 Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537149 Share on other sites More sharing options...
requinix Posted September 8, 2016 Share Posted September 8, 2016 DTD, XSD, or Relax NG schema validation could do a lot of work, but AFAIK wouldn't be able to validate actual attribute values (eg, that the canonical URL has the right domain). That basically leaves you with your own validation routines. As long as you approach it like a whitelist - specifically allow certain structures, disallow everything else - then this is quite possible. However it gets exponentially more difficult as you allow more complex HTML; just that example there would be fine, but I'm more concerned about what else would be possible. Unless you want to get really sophisticated with this, you should do the specific key/value meta pairs thing (and a separate entry for the canonical URL, plus whatever else). It's the safest course of action, and it doesn't require that the user understand writing HTML markup. As a secondary option for the user, you could allow them to input HTML and then scan it for particular elements to keep. As in load the string into DOMDocument (no regular expressions), search for and tags, then extract the data into that key/value system. I could write a proof of concept for the "sophisticated" approach, if someone asks for it, but I just started playing FFXIV and right now I'd rather do that. Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537153 Share on other sites More sharing options...
Jacques1 Posted September 8, 2016 Share Posted September 8, 2016 Avoiding raw HTML is also a matter of usability. I do know HTML, and I would still very much prefer a proper GUI with a combobox over having to manually write down tags, some of which I would first have to look up. Now imagine a layman struggling with a syntax error somewhere in a big block of markup. 1 Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537154 Share on other sites More sharing options...
requinix Posted September 8, 2016 Share Posted September 8, 2016 (edited) The only reason I can think of that someone would have the markup is because they're copying it from somewhere, and that's an instance where not being familiar with HTML causes the opposite problem. For example, Google Analytics gives you some On that note, 1. Stuff like the generator should be automatic anyways - not manually written out by someone. 2. The robots thing should be a global- or page-level option that a user enables in some configuration area, then rendered into HTML appropriately - not manually written out by someone. 3. The canonical URL should definitely be automatic - unless you want someone to be able to say that a particular page is derivative of some other page on some other website (which would be quite suspicious). Edited September 8, 2016 by requinix Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537156 Share on other sites More sharing options...
web_craftsman Posted September 8, 2016 Author Share Posted September 8, 2016 What exactly prevents you from storing the meta elements as key/value pairs rather than raw HTML? This will drastically reduce the risk of XSS vulnerabilities. I will need to create a whole meta tag constructor for this, with all features like changing order, adding, deleting, it is a big piece of work and there is one crutial problem: It looks that seo specialists like to add some very specific meta tags, how could I guess what they need? For example, by googling there is info than meta tag could have the next attributes: name, content, scheme, http-equiv. It does not say about charset attribute, in which case it is a single meta tag's attribute. And after looking at web sites I very soon found meta tag like: <meta property="fb:app_id" content="966242223397117" /> So It looks a bit comlicated to create constructor for all cases Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537157 Share on other sites More sharing options...
web_craftsman Posted September 8, 2016 Author Share Posted September 8, 2016 (edited) The only reason I can think of that someone would have the markup is because they're copying it from somewhere, and that's an instance where not being familiar with HTML causes the opposite problem. When people use all kings of WISYWIG editors they are working with raw html too Edited September 8, 2016 by web_craftsman Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537158 Share on other sites More sharing options...
web_craftsman Posted September 8, 2016 Author Share Posted September 8, 2016 Avoiding raw HTML is also a matter of usability. I do know HTML, and I would still very much prefer a proper GUI with a combobox over having to manually write down tags, some of which I would first have to look up. Now imagine a layman struggling with a syntax error somewhere in a big block of markup. Web site content managers are supposed to know html Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537159 Share on other sites More sharing options...
Solution Jacques1 Posted September 8, 2016 Solution Share Posted September 8, 2016 So It looks a bit comlicated to create constructor for all cases How many cases are there in reality? You definitely don't want the admin to mess with the document encoding, so the charset attribute is out of the question. Setting arbitrary HTTP options also isn't recommended, so http-equiv is irrelevant as well. That leaves you with exactly two cases: <meta name="..." content="..."> (HTML) and <meta property="..." content="..."> (RDFa). Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537161 Share on other sites More sharing options...
web_craftsman Posted September 8, 2016 Author Share Posted September 8, 2016 Jacques1, thanks, I will follow your advice Quote Link to comment https://forums.phpfreaks.com/topic/302100-how-to-filter-meta-tags-from-xss/#findComment-1537164 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.