how to filter meta tags from xss

web_craftsman · September 8, 2016

At my CMS I want to give site moderators ability to associate any meta information to a page. For meta keywords and description I have different fields but all other stuff are inserted like raw html , like this:

<meta name="Generator" content="SomeCMS" />
<meta name="robots" content="nofollow" />
<link rel="canonical" href="http://example.com/content/poisk-i-upravlenie-kontentom" />

This html will be echoed to the page.

Mainly only meta tags and link(rel=canonical) will be here. And now I think I have to make sure there is no xss attack in this code. So I need to filter it before saving to database.

HtmlPurifier or http://github.com/voku/anti-xss don't work with meta tags. So what would you advise me? To parse text with regexp for meta tags and then check every metatag found for any style or on attributes or http-equiv="refresh"(to deny malicious metatag)?

Jacques1 · September 8, 2016

What exactly prevents you from storing the meta elements as key/value pairs rather than raw HTML? This will drastically reduce the risk of XSS vulnerabilities.

In any case, allowing arbitrary meta elements is a risk, no matter how hard you try to blacklist dangerous combinations. There will always be a problem you haven't considered yet (for example: using a <meta charset> element to break HTML-escaping).

Link elements are even worse, because now we're talking about external resources like stylesheets (which can be used for attacks).

Edited September 8, 2016 by Jacques1

requinix · September 8, 2016

DTD, XSD, or Relax NG schema validation could do a lot of work, but AFAIK wouldn't be able to validate actual attribute values (eg, that the canonical URL has the right domain).

That basically leaves you with your own validation routines. As long as you approach it like a whitelist - specifically allow certain structures, disallow everything else - then this is quite possible. However it gets exponentially more difficult as you allow more complex HTML; just that example there would be fine, but I'm more concerned about what else would be possible.

Unless you want to get really sophisticated with this, you should do the specific key/value meta pairs thing (and a separate entry for the canonical URL, plus whatever else). It's the safest course of action, and it doesn't require that the user understand writing HTML markup.

As a secondary option for the user, you could allow them to input HTML and then scan it for particular elements to keep. As in load the string into DOMDocument (no regular expressions), search for and tags, then extract the data into that key/value system.

I could write a proof of concept for the "sophisticated" approach, if someone asks for it, but I just started playing FFXIV and right now I'd rather do that.

Jacques1 · September 8, 2016

Avoiding raw HTML is also a matter of usability. I do know HTML, and I would still very much prefer a proper GUI with a combobox over having to manually write down tags, some of which I would first have to look up. Now imagine a layman struggling with a syntax error somewhere in a big block of markup.

requinix · September 8, 2016

The only reason I can think of that someone would have the markup is because they're copying it from somewhere, and that's an instance where not being familiar with HTML causes the opposite problem. For example, Google Analytics gives you some

On that note,

1. Stuff like the generator should be automatic anyways - not manually written out by someone.

2. The robots thing should be a global- or page-level option that a user enables in some configuration area, then rendered into HTML appropriately - not manually written out by someone.

3. The canonical URL should definitely be automatic - unless you want someone to be able to say that a particular page is derivative of some other page on some other website (which would be quite suspicious).

Edited September 8, 2016 by requinix

web_craftsman · September 8, 2016

What exactly prevents you from storing the meta elements as key/value pairs rather than raw HTML? This will drastically reduce the risk of XSS vulnerabilities.

I will need to create a whole meta tag constructor for this, with all features like changing order, adding, deleting, it is a big piece of work and there is one crutial problem:

It looks that seo specialists like to add some very specific meta tags, how could I guess what they need?

For example, by googling there is info than meta tag could have the next attributes: name, content, scheme, http-equiv.

It does not say about charset attribute, in which case it is a single meta tag's attribute.

And after looking at web sites I very soon found meta tag like:

<meta property="fb:app_id" content="966242223397117" />

So It looks a bit comlicated to create constructor for all cases

web_craftsman · September 8, 2016

The only reason I can think of that someone would have the markup is because they're copying it from somewhere, and that's an instance where not being familiar with HTML causes the opposite problem.

When people use all kings of WISYWIG editors they are working with raw html too

Edited September 8, 2016 by web_craftsman

web_craftsman · September 8, 2016

Avoiding raw HTML is also a matter of usability. I do know HTML, and I would still very much prefer a proper GUI with a combobox over having to manually write down tags, some of which I would first have to look up. Now imagine a layman struggling with a syntax error somewhere in a big block of markup.

Web site content managers are supposed to know html

Jacques1 · September 8, 2016

So It looks a bit comlicated to create constructor for all cases

How many cases are there in reality? You definitely don't want the admin to mess with the document encoding, so the charset attribute is out of the question. Setting arbitrary HTTP options also isn't recommended, so http-equiv is irrelevant as well.

That leaves you with exactly two cases: <meta name="..." content="..."> (HTML) and <meta property="..." content="..."> (RDFa).

web_craftsman · September 8, 2016

Jacques1, thanks, I will follow your advice

Sign In

how to filter meta tags from xss

Recommended Posts

web_craftsman

Link to comment

Share on other sites

Jacques1

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Jacques1

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

web_craftsman

Link to comment

Share on other sites

web_craftsman

Link to comment

Share on other sites

web_craftsman

Link to comment

Share on other sites

Jacques1

Link to comment

Share on other sites

web_craftsman

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information