Jump to content

Regex for HTML


Flinch

Recommended Posts

Hey all, long time no post.

So anyway, I'm writing a function that will parse user input from text forms based on a set of instructions passed to it by a multi-dimmensional array. Right now, I'm, working on a section that will allow me to specify what HTML tags are allowed through the array ($instruct), and parse them accordingly. The way my function will work is it starts off checking what HTML is allowed, and will replace any allowed tags's < & > with [[ & ]] so my next section, that converts non-allowed HTML tags into > & <, will not parse the wanted HTML. I'm not very good at explaining, so here's an example:

[code]//Wanted HTML: <br>
$text = "Here is a <br> <b>new</b> line!";

//my function will find any occurences of <br> and replace it with[[br]].
$return = parser($text, $instruct);

echo $return; //will return the same string, with the <br> tag intact, and the <b> & </b> tags replaced with <b>.
[/code]

I think that demonstrates what I'm trying to do. So far, I haven't had too much problem, but I fear that the regular expression I wrote to do this checking is a little sub-par, and may not work.

Here's my regular expression being used in a foreach loop.

[code](<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)[/code]

The ."$k." parts are because all the tags that have been allowed are broken into another array, and cycled through the input text replacing where needed. So in our above example for the <br>, it would look like this:

[code](<|</)(\s)*br(\s)*([^>br].+?)?(>|/>)[/code]

This seems to work like I want, but when it comes to closing tags (</td></tr>) they just get replaced as [[td]] and [[tr]] instead of [[/td] and [[/tr]]. I'm wondering if anyone has any help or suggestions for me that I could use to tweak this regex to make my script work. It's been troubling for a few days now.

Here's the concerned area of the script I'm talking about:

[code]
//use the normal regex
                if(preg_match("#(<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)#im", $txt, $matches)) {
            
            
                /*+------------THIS IS VERY IMPORTANT Y'ALL!-------------+
                  + First and fifth elements are the < and > respectively
                  + Second and third will ALWAYS be spaces
                  + Fourth will be any markup inside the tag
                  +------------------------------------------------------+*/

                //now, begin the replacement technique.
                print_r($matches);
                if(preg_match("#/#is", $matches[1]) ) {
                    $add_in1 = "/";
                }
                
                $add_in2 = ($matches[4] != "") ? " ".$matches[4] : "";
                
                $this_tag = "[[".$add_in1.$k.$add_in2."]]";
                
                $txt = preg_replace("#(<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)#im", $this_tag, $txt);
                unset($this_tag, $add_in1, $add_in2);
                
                }
[/code]

Thanks.
Link to comment
Share on other sites

I just did something very similar. Here's what I wrote. It's not well tested but should be working.

[code]<?php
$tags = array('b', 'i', 'h\d');

function preserveHTML($text, $tags) {
    global $tags;
    foreach ($tags as $tag) {
        $text = preg_replace('/<(\/)?(' . $tag . '(\s.*?)?)>/i', '[$1$2]', $text);
    }
    return $text;
}

function restoreHTML($text, $tags) {
    global $tags;
    foreach ($tags as $tag) {
        $text = preg_replace('/\[(\/)?(' . $tag . '(\s.*?)?)\]/i', '<$1$2>', $text);
    }
    return $text;
}
?>[/code]
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.