paladin_sh Posted January 24, 2007 Share Posted January 24, 2007 Hey There,I've having some trouble with Preg_Replace. I need it to remove everything except the Alphanumeric Characters of a string and the HTML.I want anything between the HTML brackets to be allowed, despite the Removal of everything non-alphanumeric. This way I can sort it out with the tags() function to only allow the HTML I want.I used this to remove Alphanumeric:[code]$string = preg_replace("/[^A-Za-z1-9]/","",$string);[/code]But I need that statment to ignore anything between HTML tags too.Thanks for any help that is offered.- Paladin Link to comment https://forums.phpfreaks.com/topic/35464-preg_replace-to-remove-all-except-html-and-alphanumeric/ Share on other sites More sharing options...
effigy Posted January 24, 2007 Share Posted January 24, 2007 See [url=http://www.phpfreaks.com/forums/index.php/topic,122857.0.html]this[/url] topic. It covers how to distinguish between HTML and non-HTML. Link to comment https://forums.phpfreaks.com/topic/35464-preg_replace-to-remove-all-except-html-and-alphanumeric/#findComment-167815 Share on other sites More sharing options...
paladin_sh Posted January 24, 2007 Author Share Posted January 24, 2007 Thanks,I read through the topic, and while I can remove HTML code, and mess around with the stuff inside it I can't get my script to remove everything 'except' the HTML. Let alone get it to remove everything except the HTML and the Alphanumerics. When I give it a try, removing everything but the HTML, it just removes all the spaces and leaves everything else. hehe.I admit, Regex isn't my strong suit, and I am still learning quite a bit about it especially how it is used in PHP. I learn mostly by example.I do appreciate the help so far though.- Paladin Link to comment https://forums.phpfreaks.com/topic/35464-preg_replace-to-remove-all-except-html-and-alphanumeric/#findComment-167835 Share on other sites More sharing options...
effigy Posted January 24, 2007 Share Posted January 24, 2007 [code]<pre><?php $html = <<<HTML <html> <head><title>T%i#t*l@e!</title></head> <body> abcde 12345 <font color="red">|+_()*!@#$%</font> </body> </html>HTML; $html = preg_replace_callback( '/(?<=>)([^<]+)/', create_function( '$matches', 'return preg_replace("/\W/", "", $matches[0]);' ), $html ); echo htmlspecialchars($html);?></pre>[/code]If you want to preserve the whitespace, change[tt] /\W/ [/tt]to[tt] /(?![\s])\W/[/tt]. Also, note that the[tt] w/W [/tt]shorthand includes/excludes an underscore as well, so you may want to change this. Link to comment https://forums.phpfreaks.com/topic/35464-preg_replace-to-remove-all-except-html-and-alphanumeric/#findComment-168081 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.