Flinch Posted March 9, 2006 Share Posted March 9, 2006 Hey all, long time no post. So anyway, I'm writing a function that will parse user input from text forms based on a set of instructions passed to it by a multi-dimmensional array. Right now, I'm, working on a section that will allow me to specify what HTML tags are allowed through the array ($instruct), and parse them accordingly. The way my function will work is it starts off checking what HTML is allowed, and will replace any allowed tags's < & > with [[ & ]] so my next section, that converts non-allowed HTML tags into > & <, will not parse the wanted HTML. I'm not very good at explaining, so here's an example:[code]//Wanted HTML: <br>$text = "Here is a <br> <b>new</b> line!";//my function will find any occurences of <br> and replace it with[[br]].$return = parser($text, $instruct);echo $return; //will return the same string, with the <br> tag intact, and the <b> & </b> tags replaced with <b>.[/code]I think that demonstrates what I'm trying to do. So far, I haven't had too much problem, but I fear that the regular expression I wrote to do this checking is a little sub-par, and may not work.Here's my regular expression being used in a foreach loop.[code](<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)[/code]The ."$k." parts are because all the tags that have been allowed are broken into another array, and cycled through the input text replacing where needed. So in our above example for the <br>, it would look like this:[code](<|</)(\s)*br(\s)*([^>br].+?)?(>|/>)[/code]This seems to work like I want, but when it comes to closing tags (</td></tr>) they just get replaced as [[td]] and [[tr]] instead of [[/td] and [[/tr]]. I'm wondering if anyone has any help or suggestions for me that I could use to tweak this regex to make my script work. It's been troubling for a few days now.Here's the concerned area of the script I'm talking about:[code]//use the normal regex if(preg_match("#(<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)#im", $txt, $matches)) { /*+------------THIS IS VERY IMPORTANT Y'ALL!-------------+ + First and fifth elements are the < and > respectively + Second and third will ALWAYS be spaces + Fourth will be any markup inside the tag +------------------------------------------------------+*/ //now, begin the replacement technique. print_r($matches); if(preg_match("#/#is", $matches[1]) ) { $add_in1 = "/"; } $add_in2 = ($matches[4] != "") ? " ".$matches[4] : ""; $this_tag = "[[".$add_in1.$k.$add_in2."]]"; $txt = preg_replace("#(<|</)(\s)*".$k."(\s)*([^>".$k."].+?)?(>|/>)#im", $this_tag, $txt); unset($this_tag, $add_in1, $add_in2); }[/code]Thanks. Quote Link to comment Share on other sites More sharing options...
wickning1 Posted March 9, 2006 Share Posted March 9, 2006 I just did something very similar. Here's what I wrote. It's not well tested but should be working.[code]<?php$tags = array('b', 'i', 'h\d');function preserveHTML($text, $tags) { global $tags; foreach ($tags as $tag) { $text = preg_replace('/<(\/)?(' . $tag . '(\s.*?)?)>/i', '[$1$2]', $text); } return $text;}function restoreHTML($text, $tags) { global $tags; foreach ($tags as $tag) { $text = preg_replace('/\[(\/)?(' . $tag . '(\s.*?)?)\]/i', '<$1$2>', $text); } return $text;}?>[/code] Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.