Jump to content


This topic is now archived and is closed to further replies.


After 30 minutes, I'm even in more trouble! Help, please?

Recommended Posts

I need a simple, all inclusive solution, that stops hackers and bots from entering scripts into my form text fields.

I just want to stop anyone from entering any scripts or html. After reading here and following thread links for half an hour, I finally ended the tutorial for htmlspecialchars [url=http://us3.php.net/manual/en/function.htmlspecialchars.php]http://us3.php.net/manual/en/function.htmlspecialchars.php[/url] which made my head swim by all the various chars from various language encodes that could slip through.

If I allow plain text and numbers only:

[code]if (!preg_match("/[a-z0-9]$/i", $text))
    die ("HTML not allowed");

where $text = $_Post('description');[/code]

Would that prohibit punctuation? And If I allow punctuation, do I specifically have to dissallow potential html symbols? Like " < > ’ ^ & * ( ) < > õ ø" etc.?

Anyone have a safe form expression?

Share this post

Link to post
Share on other sites
As you have it right now
[code]preg_match("/[a-z0-9]$/i", $text)[/code]
will only match one alpha-numeric character at the end of a the first line, I'd expect almost every thing that hits that if statement to fail.

Tell us a bit more about the form input. Are these one-liners? Or a whole bunch of text? Does this go into a database? (In which case I use the PEAR:DB to mitigate those risks)

From what you've posted:
[code]preg_match("/[a-z0-9]+/i", $text)[/code]
Will match any alpha-numeric character (No punctuation) at least one time. This will fail if you have whitespace in $text, but will also fail if text contains anything that's not alpha-numeric.

Share this post

Link to post
Share on other sites
I've been thinking about how to reply.

I realize that simply restricting chars doesn't solve the problem of XSS (Cross site scritping) via form injections (though it helps). I had someone hack my forms and was sending spam directly from my own IP - using my own IP headers! 

There are so many encode type symbols for less than and greater than, in so many languages, that it is difficult to catch them all - and forcing strict charset encoding doesn't help.

The key to safe forms seems to start in the headers on the action page. I found this online and it seems a good idea as a start:

[code]$email = $HTTP_POST_VARS[email];
$mailto = "email@address";
$mailsubj = "Form submission";
$mailhead = "From: $email\n";
reset ($HTTP_POST_VARS);
$mailbody = "Values submitted from web site form:\n";
while (list ($key, $val) = each ($HTTP_POST_VARS)) { $mailbody .= "$key : $val\n"; }
if (!eregi("\n",$HTTP_POST_VARS[email])) { mail($mailto, $mailsubj, $mailbody, $mailhead); }[/code]

[quote]If you're collecting an email address on your form (as we are above), it's important that this is checked within the php script for extra line feeds. One of the latest techniques used by spammers is to inject their own headers into the email. To do this, they enter a random email address followed by a line feed. This is then followed by a blind carbon copy (Bcc) containing many email addresses.
Using this technique, it's also possible for the spammer to insert their own email message and send it to many other addresses via your script.
In the above script, we're using the 'eregi' function to check the email address just before sending the email. Ideally, all data which may be used within the email headers should be checked.[/quote]

Share this post

Link to post
Share on other sites
[quote]I just want to stop anyone from entering any scripts or html.[/quote]

take a look at [url=http://www.php.net/strip_tags]strip_tags()[/url]

or you could just check for the tags < and >
if (preg_match('/<|>/', $string)) {

echo "Scripts are not allowed!";

} else {

// Do your stuff here


Share this post

Link to post
Share on other sites
[quote]or you could just check for the tags < and >[/quote]

Actually, I had that. And it doesn't work at all. If you read the message on the striptag link you gave from "dumb at coder dot com", he points out the flaw in this. And it deals with the various charsets and languages:

[quote]Within <textarea>, Browsers auto render & display certain "HTML Entities" and "HTML Entity Codes" as characters:
&lt; shows as <    --    &amp; shows as &    --    etc.

Browsers also auto change any "HTML Entity Codes" entered in a <textarea> into the resultant display characters BEFORE UPLOADING.  There's no way to change this, making it difficult to edit html in a <textarea>

"HTML Entity Codes" (ie, use of &#60 to represent "<", &#38 to represent "&" &#160 to represent "&nbsp;") can be used instead.  Therefore, we need to "HTML-Entitize" the data for display, which changes the raw/displayed characters into their HTML Entity Code equivalents before being shown in a <textarea>.

how would I get a textarea to contain "&lt;" as a literal string of characters and not have it display a "<"
&amp;lt; is indeed the correct way of doing that. And if you wanted to display that, you'd need to use &amp;amp;lt;'. That's just how HTML entities work.

htmlspecialchars() is a subset of htmlentities()
the reverse (ie, changing html entity codes into displayed characters, is done w/ html_entity_decode()[/quote]

Recently I also read another thread somewhere that said htmlentities is incomplete and does not cover the html characters for all languages. My form was hit with Chinese.

That's what I meant by there must be SOMEONE SOMEWHERE since 2003 who has come up with a fairly solid solution to make it too hard for form insertion to be worth the trouble by nipping it in the bud during the hackers' initial, bot automated recon spidering, that looks for an easy grab to make it worth the time for further intrusion.

Share this post

Link to post
Share on other sites
What are some of the non-standard characters that are breaking the script?

Share this post

Link to post
Share on other sites
Actually, effigy, that's the issue. I'm wide open for XSS. I had a simple validate that only "disabled" the < and > tags. But they got around that easily.

There IS a long list of comments on the htmlentities thread [url=http://us3.php.net/manual/en/function.htmlentities.php]http://us3.php.net/manual/en/function.htmlentities.php[/url] and they all show various ways it breaks.

I just tried the really great sugesstion there from "cameron at prolifique dot com" posted May 5 2006 who was annoyed as well and looking for a final solution. Unfortunately, his sugesstion, which is quite sweet, requires the "mb_convert_encoding" function to be enabled.

In my PHP 4.01 I can't find that function (aren't included PHP functions either turned on or off? Do you have to download them like mods?)

Share this post

Link to post
Share on other sites
You'll need to reconfigure php to enable the suite of [url=http://us3.php.net/manual/en/ref.mbstring.php]Multibyte String Functions[/url].

Share this post

Link to post
Share on other sites


Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.