Jump to content

html entities


digitalecartoons

Recommended Posts

To make my php mailform script secure I've used some basic security measures:

 

1. Regex to check the From field for a normal name

2. Regex to check the Email field for a valid email address

3. Htmlentities() to strip the message body text from any html tags to protect if from malicious scripts

 

About htmlentities: a name like "René Janssen" would indeed show up in my email as "René Janssen". I know that by sending it as html/text with mail() it would show as "René Janssen" again and when I choose the option in my webmail 'show as plain text mail' I would get "René Janssen" again.

 

But what if I wanted to have emails send to me as plain/text mails with mail(), filter it first with htmlentities, but still want to receive the original characters in the plain/text message body. Can that be done?

 

So basically it would have to do this:

1. input text field is posted

2. input text is filtered with htmlentities: no tags in message body text

3. mail is sent with mail(); as plain/text mail

4. mail is received as plain/text mail

5. any entity codes like é for é show up in the message body text as their original characters again

 

Is there a simple way to do this in php?

Link to comment
Share on other sites

Yes, as wheelerc said, once you decode it again, you're decoding the bad stuff too.

 

You shouldn't need to strip out HTML/tags for emails with headers set to plaintext, unless you are using an insecure email client.

 

The only thing you have to worry about is email injection. Make SURE that your header fields, like to, subject, and from, will NOT accept any sort of new line characters. If you google around for "php email injection" there are a bunch of regular expressions around for this sort of thing.

Link to comment
Share on other sites

But what's the point of first strip the (bad) html tags, then decoding them back again and then sending it with mail()?  ???

That would still send them as potentially bad tags?

 

I've always put the headers as plain/text with iso-8859-1.

 

Until I've read about the possibiliby of malicious scripts and making it safe with stripping the tagswith htmlentities. That's why I asked.

 

But you're saying that that only would be a problem with sending it as html/text mails? And that when they are talking about making your php script secure that only applies to people sending their mails as html/text not as plain/text?

 

So what would happen if a hacker would input a bad script in the message part, and send it being send as plain/text? Then there wouldn't be a security risk?

Link to comment
Share on other sites

Uptil now I've always used this part to specify the message/headers:

 

$message = "Naam:\r\n".$naam."\r\n\r\n";
$message .= "Emailadres:\r\n".$email."\r\n\r\n";
$message .= "Bericht:\r\n".$bericht."\r\n";

$headers = "MIME-Version: 1.0\r\n";  
$headers .= "Content-type: text/plain; charset=iso-8859-1\r\n";
$headers .= "From: ".mb_encode_mimeheader($naam, "iso-8859-1", "Q")." <".$email.">\r\n";  

mail($to, $subject, $message, $headers);

 

Which gave me the emails as plain/text emails, while keeping international characters like in "René O'Janssen".

 

But I was wondering, in relation to php script security: do I need to use htmlentities, to strip message body text from html tags? Or is that only neccessary with using html/mail?

Link to comment
Share on other sites

I've tested it with several options.

 

I've made an input field and tested it entering the following line:

rene<janssen>sman

 

This is the code I use to both send it to an email addres and displaying the resulting code:

 

<?php

$to = "mail@myemail.com";

$subject = "Sending a test mail";

$mijnnaam = ($_POST["tekst"]);

 

$message = "1. ".$mijnnaam."\r\n";

$message .= "2. ".htmlentities($mijnnaam)."\r\n";

$message .= "3. ".strip_tags($mijnnaam)."\r\n";

$message .= "4. ".htmlentities( strip_tags($mijnnaam))."\r\n";

$message .= "5. ". html_entity_decode (htmlentities($mijnnaam));

 

$headers = "MIME-Version: 1.0\r\n"; 

$headers .= "Content-type: text/plain; charset=iso-8859-1\r\n";

 

mail($to, $subject, $message, $headers);

echo $message;

?>

 

What I get in my mail is this:

 

1. rene<janssen>sman

2. rene<janssen>sman

3. renesman

4. renesman

5. rene<janssen>sman

 

What is displayed in a browser window so I can see the html is this:

1. Rene<Janssen>Sman

2. Rene<Janssen>Sman

3. ReneSman

4. ReneSman

5. Rene<Janssen>Sman

 

To make my php script secure against attacking scripts is to input:

Rene<Janssen>Sman

having the inputted line filtered from html tags which could form such a script, e.g. with htmlentities so that it becomes:

Rene<Janssen>Sman

arriving as a plain/test email  as:

Rene<Janssen>Sman

(so no html tags are present)

but displaying in the message part of the plain/text email as:

Rene<Janssen>Sman

 

Is such a thing possible?

 

Link to comment
Share on other sites

Couldn't find an edit button, but updated above post. I've but this post between code tags cause I noticed that when I inputted html tags they were processed as html tags instead of being displayed als characters. So everyting between <b> </b> became just bold. Anyway

I've tested it with several options.

I've made an input field and tested it entering the following line:
Dit is <b>vette</b> tekst?

This is the code I use to both send it to an email addres and displaying the resulting code:

<?php
$to = "mail@myemail.com";
$subject = "Sending a test mail";
$mijnnaam = ($_POST["tekst"]);

$message = "1. ".$mijnnaam."\r\n";
$message .= "2. ".htmlentities($mijnnaam)."\r\n";
$message .= "3. ".strip_tags($mijnnaam)."\r\n";
$message .= "4. ".htmlentities( strip_tags($mijnnaam))."\r\n";
$message .= "5. ". html_entity_decode (htmlentities($mijnnaam));

$headers = "MIME-Version: 1.0\r\n";  
$headers .= "Content-type: text/plain; charset=iso-8859-1\r\n";

mail($to, $subject, $message, $headers);
echo $message;
?>

What I get in my mail is this:

1. Dit is <b>vette</b> tekst?
2. Dit is <b>vette</b> tekst?
3. Dit is vette tekst?
4. Dit is vette tekst?
5. Dit is <b>vette</b> tekst?

What is displayed in a browser window so I can see the html is this:
1. Dit is vette tekst?
("vette" being displayed bold)
2. Dit is <b>vette</b> tekst?
3. Dit is vette tekst?
4. Dit is vette tekst?
5. Dit is vette tekst?
("vette" being displayed bold)

What I would like to hebben is to input:

"Dit is <b>vette</b> tekst?"

showing in my text/plain email as:

"Dit is <b>vette</b> tekst?"

and when I choose for text/html (so I can check for any html tags) showing as:
"Dit is <b>vette</b> tekst?"

not:

"Dit is <b>vette</b> tekst?"

cause that would mean that html tags have been sent with mail() afterall.

Is such a thing possible? 

Link to comment
Share on other sites

To make my php script secure against attacking scripts is to input:

Rene<Janssen>Sman

having the inputted line filtered from html tags which could form such a script, e.g. with htmlentities so that it becomes:

Rene<Janssen>Sman

arriving as a plain/test email  as:

Rene<Janssen>Sman

(so no html tags are present)

but displaying in the message part of the plain/text email as:

Rene<Janssen>Sman

 

Is such a thing possible?

 

So what you want is to send an email as plain text, but have html entities interpreted?  I don't think that's possible.  Either you send it as html (in which case you must encode html entities), or you send it as text (in which case you must NOT encode html entities).

 

If the mail reader interprets plain text as html, then that's a problem with the reader.  There's not much you can do about that.

 

Just to clarify, the security issue here is the possible display of html in the recipient's email client, right?

Link to comment
Share on other sites

Just to clarify, the security issue here is the possible display of html in the recipient's email client, right?

 

That's right. I want to prevent people to insert possible bad scripts into the message box. And if possible without using htmlentities. I've read that as long as I close my headers properly and insert an empty line after the headers, the body text isn't interpreted as html? What's true about that?

 

But how about webmail? When I sent myself an html mail with in the body text a alertbox javascript, I can see it in my isp's webmail as html mail without the javascript being executed. And when I swith to "view as plain text email" I get the same line without htmlentities. So I'm thinking there must be a way to have something sent as html without the mail reader interpreting it as html and executing it?

Link to comment
Share on other sites

If the mail reader interprets plain text as html, then that's a problem with the reader.  There's not much you can do about that.

 

So as long as I read my emails through my isp's webmail and the mails are sent as text/plain the emails aren't in danger of being interpreted as html? I've tried sending a javascript (alert box) to myself and it arrived as plain text as it should and then there isn't an option "show it as html mail" anyway. And when I sent it as html mail is arrives as html mail, but the javascript isn't executed too.

 

As email reader I'm using Thunderbird which I think doesn't show arriving plain text mails as html anyway?

 

Don't know much about scripts being inserted in mailforms, but wasn't able to get my javascript executed. Not while sending it as text/plain as sending it as text/html. So how do these things work anyway? Maybe it's just as I described above? That when you close/format your headers properly and add an empty line after it, the message variable isn't interpreted as html?

Link to comment
Share on other sites

Again, any email client that renders HTML on a email with a plaintext content-type is considered insecure. Any mainstream client will not do it. Thunderbird will not.

 

Even webmail clients SHOULD decode the characters like htmlentities does before displaying them.

 

 

How much HTML is ran on an HTML content-type email is up to the email client's HTML processor.

Link to comment
Share on other sites

Again, any email client that renders HTML on a email with a plaintext content-type is considered insecure. Any mainstream client will not do it. Thunderbird will not.

 

So when I'm using Thunderbird (which doesn't show plaintext content as html) or webmail (which shows plaintext content as plaintext), I'm in no danger?

 

Even webmail clients SHOULD decode the characters like htmlentities does before displaying them.

 

So when somebody sends html tags/scripts in the message content even a webmail client filters them through htmlentities first before sending them further? So why, if I sent html content and open the mail in webmail, it shows the message content as html text (without excecuting them as tags) and when I switch to plaintext view, shows the same text without entity codes?

 

Link to comment
Share on other sites

So why, if I sent html content and open the mail in webmail, it shows the message content as html text (without excecuting them as tags) and when I switch to plaintext view, shows the same text without entity codes?

 

If you send a plaintext email to a webbased email client, it should run it through something like htmlentities. You don't see it because your browser decodes it. The email itself still has those real characters (like <, >, and such).

 

The forum we are posting on does something similar. Like if you look at the HTML source, you'll notice all of the <, >, and such characters are all html encoded, even though we the end users never see the encoding when we edit our posts or quote someone.

Link to comment
Share on other sites

I went to my webmail (which uses squirrelmail), sent myself an email (webmail uses "plain text" I guess cause when it arrives it doesn't show as html mail with a 'switch to plain text view' option) with the text "René". When I look at the source it doesn't show Ren&#233;, but still René.

 

Sending an html mail does show Ren&#233; so I guess htmlentities are used there.

 

But why does it show René in my webmail when I send myself an plain text mail then?

Link to comment
Share on other sites

When I use my webmail (squirrelmail) to send a plain text mail with this line:

<script>alert('xss')</script>

 

it arrives as (fake) plain text mail in my webmail displaying:

<pre><script>alert('xss')</script></pre>

 

But webmail sees it as plain text mail. If it were seeing it as html mail it would show a 'view as plain text mail' option. Like it does when I send myself a html mail.

 

So how can I have mail() send it as plain text mail, make webmail see it as plain text mail en show it as plain text mail (but with htmlentities used)?

 

Should I perhaps have it send as html (with htmlentities used) and use some tags like <pre> to trick webmail into displaying it as plain text mail?

Link to comment
Share on other sites

No, I guess I should remain sending it as text/plain, cause in the webmail mail info it says (when I send myself an plain text mail):

 

User-Agent: SquirrelMail/1.4.8

MIME-Version: 1.0

Content-Type: text/plain;charset=iso-8859-1

Content-Transfer-Encoding: 8bit

 

Which puzzles me even more  ???

I sent myself a plain text mail (see content-type), it arrives as plain text but with htmlentities used (having webmail show it as <script>alert('xss')</script>). And if I open the plain text mail in Thunderbird it displays <script>alert('xss')</script> too. Or , if I'd use a email reader which displays plain as html text, would that actually execute the alert box script?

 

So, correct my if I'm wrong but this is how I get it thus far when sending it as text/plain without using htmlentities:

1. opened in webmail: tags in mail are processed with htmlentities anyway so original characters are shown.

2. opened in thunderbird: mail is shown in plain text so no html is executed anyway

3. opened in email reader which by default shows plain text as html: tags are executed, so security risk

 

Which email readers show plain text mails as html?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.