Jump to content

HTML Injection precautions


Joshua4550

Recommended Posts

Hey, it's me again - YES! ;o

 

From the title, you may be thinking "HTML? This is the PHP section!", but please read on...

 

I have created (finally) a basic dynamic page which users can edit using forms, and I have allowed HTML - because I want HTML! But I just don't want them to be allowed to inject to mess up the default-code of the template - only their own code.

 

I was thinking of maybe making an application to check the HTML code, and if error-full, give them an error message and not proceed with the submission, but hey - this would probably be timely to create. This in mind, I decided to post here for you experts' advice/answers :)

 

Okay, so basically I need to know if, and the most logic way, to make it so they cannot inject to the default page. What I mean is:

 

Template:

<?php
/*
* $input would be defined here, grabbing content from
* a database, but after knowing this - you should understand what I mean.
*/
  echo '
    <div id="class1">
      Hello there
      ' . $input . '
    </div>
  ';
?>

Okay, so if this was the case - what i'm asking is for the most logic/efficient way to let the $input be printed, but not interfere with the other code, so if $input = "</div>" then it shouldn't end the <div="class1">.

 

Get it?

 

Hope theres an answer, and looking forward to what you think :)

Link to comment
Share on other sites

I don't want to strip tags, because the template i'm ACTUALLY using has almost every tag they'd want to use. I want them to still be available to use any tag, but for them to not intefere with the rest of the page.

 

If only php/html had a way to print it as a "new page" inside this one, kind of like a tag.

 

Any other answers, or is this just not possible?

Link to comment
Share on other sites

Well, you would have to write a function to check how many <div> tags there are in the user's code. Then check how many </div> tags there are, and compare the two. If there are more </div> tags, then clip the code on the last one (or just remove any excess tags).

 

I'll give it a crack now.

Link to comment
Share on other sites

Funny, when I said an application for a HTML code checker - that's EXACTLY what I had in mind. Thing is, there must be a way in php to ask if they use <"anything here"> or </"anything here"> or <"anything here" />, right?

 

If so, it wouldn't be that hard to make, I suppose.

Link to comment
Share on other sites

EDIT AGAIN: Ok this works.

 

<?php

if(!function_exists("stripos")) { // For PHP 4 (My Local Server)
    function stripos($haystack, $needle){
        return strpos($haystack, stristr($haystack, $needle));
    }
}

$data = 'aaa <div class="main">content</div> bbb</div>';
preg_match_all("/<div.*?>/", $data, $starttags);
preg_match_all("/<\/div>/", $data, $endtags);
$countstart = count($starttags[0]);
$countend = count($endtags[0]);

if($countstart > 0 || $countend > 0) {
    if($countstart > $countend) {
        $data .= str_repeat("</div>", ($countstart - $countend));
    } else if($countstart < $countend) {
        $reverse = strrev($data);
        $i = ($countend - $countstart);
        while($i > 0) {
            $nextdiv = stripos($reverse, ">vid/<"); // ">vid/<" is "</div>" reversed.
            $reverse = substr($reverse, 0, $nextdiv) . substr($reverse, ($nextdiv + 6)); // Get up to the ">vid/<" and after it.
            $i--;
        }
        $data = strrev($reverse);
    }
}

echo $data;

?>

Link to comment
Share on other sites

Seems logical, and looks like it'd work - but what about other tags? i'm guessing theres something you can do - like your code says .*?, so instead of div, something that represents "anything", but maybe anything except a /

 

My sentence may not make sense, but bare with me because i'm sleepy :P But do you get what I mean?

Link to comment
Share on other sites

Well, the .*? actually allows for anything after the <div part, so things like <div id="asdf"> are matched as well as <div>.

 

To do it for other tags, you would need to put it in a loop and have the "div" replaced with your tag name. This is always the problem with the HTML validation systems, because a computer will never have human instinct. It can't know what tags to look for unless you specify them.

Link to comment
Share on other sites

But surely, just like you used .*?, theres something to specify anything. Infact if .*? means anything, couldn't we use anything like:

preg_match_all("/<.*?>/", $data, $starttags);
preg_match_all("/<\/.*?>/", $data, $endtags);

?

Also, maybe one for a tag that ends itself (eg: <div />)?

Link to comment
Share on other sites

In theory, yes. Although the problem with this is that it will confuse tags. Say that you have '<div><a href="/"><span></span>'. The first array will have '<div><a><span>' and the second '</span>'.

 

It will see that there are less </ tags than <*> tags, and add two more </div>s. You need to be able to know what end tags to add and where. That is where human instinct is needed. The best you can do is to put it in a loop and go through all tags you want to check.

Link to comment
Share on other sites

<?php
$data = 'lol <a dd="/"> <div> <gflool> </gflool> </div> </a> </lol>';
preg_match_all("/<.*?>/", $data, $starttags);
preg_match_all("/<\/.*?>/", $data, $endtags);
$countstart = count($starttags[0]);
$countend = count($endtags[0]);
$output = "";

if($countstart > 0 || $countend > 0) {
  if($countstart > $countend) {
    $output = "Theres more opening tags than closing tags"; 
  } else if($countstart < $countend) {
    $output = "Theres more closing tags than opening tags";
  } else {
    $output = "Your code is fine!";
  }
}

echo $output;
?>

Seems that it doesn't work, because it always says: "Theres more opening tags than closing tags", but theres actually more closing tags?

I made the last closing tag an opening tag, but he output remained the same.

 

I also tried using elseif rather than else if, but still no workie! :(

 

Anything i'm doing wrong?

Link to comment
Share on other sites

Ooh, it would seem otherwise:

 

$data = 'lol <a dd="/"> <div> <gflool> </gflool> </div> </a> </lol>';
preg_match_all("/<.*?>/", $data, $starttags);
preg_match_all("/<\/.*?>/", $data, $endtags);
print_r($starttags[0]);
print_r($endtags[0]);

Array ( [0] =>  [1] =>
[2] => [3] => [4] =>
[5] => [6] => ) 
Array ( [0] => [1] => [2] => [3] => )

Link to comment
Share on other sites

Try these regular expressions:

 

preg_match_all("/<([A-Z][A-Z0-9]*)\b[^>]*>/", $data, $starttags);
preg_match_all("/<\/([A-Z][A-Z0-9]*)\b[^>]*>/", $data, $endtags);

If that fails, try these:

 

preg_match_all("/</?\w+\s+[^>]*>/", $data, $starttags);
preg_match_all("/<\/?\w+\s+[^>]*>/", $data, $endtags);

I actually just realised that image tags will be counted by the first but not by the second. You'll need to come up with a fix to exclude image tags. I can't help as I'm off now, but good luck!

Link to comment
Share on other sites

U can use htmlentities function, this will display the exact HTML on the page.

 

 

<?php
  echo '
    <div id="class1">
      Hello there
      ' . htmlentities($input). '
    </div>
  ';
?>

Link to comment
Share on other sites

Ok I checked this, and it seems to work as expected. I also allowed for <img /> tags to be missed, as they are a standalone element.

 

<?php

if(!function_exists("stripos")) { // For PHP 4 (My Local Server)
    function stripos($haystack, $needle){
        return strpos($haystack, stristr($haystack, $needle));
    }
}

$data = 'lol <a dd="/"> <div> <gflool> </gflool> </div> </a> </lol> <img src="" />';
preg_match_all("/<(?:\"[^\"]*\"['\"]*|'[^']*'['\"]*|[^'\">])+>/i", $data, $starttags);
$i = 0;
foreach($starttags[0] as $starttag) {
    if(substr($starttag, 0, 2) == "</") {
        unset($starttags[0][$i]);
    } else if(substr($starttag, -2) == "/>") {
        unset($starttags[0][$i]);
    }
    $i++;
}
//print_r($starttags);
preg_match_all("/<\/(?:\"[^\"]*\"['\"]*|'[^']*'['\"]*|[^'\">])+>/i", $data, $endtags);
//print_r($endtags);
$countstart = count($starttags[0]);
$countend = count($endtags[0]);

if($countstart > 0 || $countend > 0) {
    if($countstart > $countend) {
        /*
        $data .= str_repeat("</".$tag.">", ($countstart - $countend));
        */
        echo "You have too many opening tags! Please try again.";
    } else if($countstart < $countend) {
        /*
        $reverse = strrev($data);
        $i = ($countend - $countstart);
        while($i > 0) {
            $nextdiv = stripos($reverse, ">".$tag."/<"); // ">vid/<" is "</div>" reversed.
            $reverse = substr($reverse, 0, $nextdiv) . substr($reverse, ($nextdiv + 6)); // Get up to the ">vid/<" and after it.
            $i--;
        }
        $data = strrev($reverse);
        */
        echo "You have too many closing tags! Please try again.";
    } else {
        echo $data;
    }
}

?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.