Jump to content

Archived

This topic is now archived and is closed to further replies.

Flinch

Regex for HTML

Recommended Posts

Hey all, long time no post.

So anyway, I'm writing a function that will parse user input from text forms based on a set of instructions passed to it by a multi-dimmensional array. Right now, I'm, working on a section that will allow me to specify what HTML tags are allowed through the array ($instruct), and parse them accordingly. The way my function will work is it starts off checking what HTML is allowed, and will replace any allowed tags's < & > with [[ & ]] so my next section, that converts non-allowed HTML tags into > & <, will not parse the wanted HTML. I'm not very good at explaining, so here's an example:

[code]//Wanted HTML: <br>
$text = "Here is a <br> <b>new</b> line!";

//my function will find any occurences of <br> and replace it with[[br]].
$return = parser($text, $instruct);

echo $return; //will return the same string, with the <br> tag intact, and the <b> & </b> tags replaced with <b>.
[/code]

I think that demonstrates what I'm trying to do. So far, I haven't had too much problem, but I fear that the regular expression I wrote to do this checking is a little sub-par, and may not work.

Here's my regular expression being used in a foreach loop.

[code](<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)[/code]

The ."$k." parts are because all the tags that have been allowed are broken into another array, and cycled through the input text replacing where needed. So in our above example for the <br>, it would look like this:

[code](<|</)(\s)*br(\s)*([^>br].+?)?(>|/>)[/code]

This seems to work like I want, but when it comes to closing tags (</td></tr>) they just get replaced as [[td]] and [[tr]] instead of [[/td] and [[/tr]]. I'm wondering if anyone has any help or suggestions for me that I could use to tweak this regex to make my script work. It's been troubling for a few days now.

Here's the concerned area of the script I'm talking about:

[code]
//use the normal regex
                if(preg_match("#(<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)#im", $txt, $matches)) {
            
            
                /*+------------THIS IS VERY IMPORTANT Y'ALL!-------------+
                  + First and fifth elements are the < and > respectively
                  + Second and third will ALWAYS be spaces
                  + Fourth will be any markup inside the tag
                  +------------------------------------------------------+*/

                //now, begin the replacement technique.
                print_r($matches);
                if(preg_match("#/#is", $matches[1]) ) {
                    $add_in1 = "/";
                }
                
                $add_in2 = ($matches[4] != "") ? " ".$matches[4] : "";
                
                $this_tag = "[[".$add_in1.$k.$add_in2."]]";
                
                $txt = preg_replace("#(<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)#im", $this_tag, $txt);
                unset($this_tag, $add_in1, $add_in2);
                
                }
[/code]

Thanks.

Share this post


Link to post
Share on other sites
I just did something very similar. Here's what I wrote. It's not well tested but should be working.

[code]<?php
$tags = array('b', 'i', 'h\d');

function preserveHTML($text, $tags) {
    global $tags;
    foreach ($tags as $tag) {
        $text = preg_replace('/<(\/)?(' . $tag . '(\s.*?)?)>/i', '[$1$2]', $text);
    }
    return $text;
}

function restoreHTML($text, $tags) {
    global $tags;
    foreach ($tags as $tag) {
        $text = preg_replace('/\[(\/)?(' . $tag . '(\s.*?)?)\]/i', '<$1$2>', $text);
    }
    return $text;
}
?>[/code]

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.