ryy705 Posted September 26, 2008 Share Posted September 26, 2008 Hello, I am trying to strip out junk tags from html generated by M$ office. It generates code like the following: <o:SmartTagType =namespaceuri=3D"urn:schemas-microsoft-com:office:smarttags" name=3D"place"/> Some times there is a line break and sometimes there isn't. The following is my preg_match function: preg_replace("<\w:.*>", '', $str); It returns: < =namespaceuri=3D"ur name=3D"place"/> I'm trying my best to learn regex but it seems as though I should qualify for four year degree by the time I finish. Please help. I don't think I can do this by myself. I thank you in advance. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted September 26, 2008 Share Posted September 26, 2008 I think one major issue is the < and > in your pattern is considered the delimiters. Avoid this. Is this what you are looking for? $str = preg_replace('#<\w:[^>]+>#', '', $str); Quote Link to comment Share on other sites More sharing options...
ryy705 Posted September 28, 2008 Author Share Posted September 28, 2008 Thanks it works. What is the purpose of #? A little more help please. I need to replace pairs of line break with a single line break. So, something like the following needs to be replaced by three line breaks. <br /> <br /> <br /> <br /> <br /> <br /> <br /> So I need to replace every pair of line break with a single line break as long as they are only separated by newline, blank space, or no space. The following is what I'm trying to use. $str = preg_replace("#<br />*\s$*<br />#", '<br />', $str) This is returning error saying 'Compilation failed' and no string is returned. Kindly help me out. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted September 28, 2008 Share Posted September 28, 2008 Thanks it works. What is the purpose of #? No problem. As for your question, the # is a delimiter. It doesn't have to be a # character (most commonly, it is the / character). I suggest you read up on preg expressions and get up to speed on things. It really is worth taking the time to familiarize yourself with regex and learn the basics. While it is the easy route to ask for solutions on these forums, you are still left without really understanding the mechanics of it all. If you learn regex, you'll be far more efficient at solving such problems in the future (I mean no offense by any of this of course). A little more help please. I need to replace pairs of line break with a single line break. So, something like the following needs to be replaced by three line breaks. <br /> <br /> <br /> <br /> <br /> <br /> <br /> So I need to replace every pair of line break with a single line break as long as they are only separated by newline, blank space, or no space. The following is what I'm trying to use. $str = preg_replace("#<br />*\s$*<br />#", '<br />', $str) This is returning error saying 'Compilation failed' and no string is returned. Kindly help me out. If I understand correctly, perhaps this is what you are looking for? (not sure if I got it right or not though). $str = <<<DATA <br /> <br /> <br /> <br /> <br /> <br /> <br /> DATA; $str = preg_replace('#<br />(\r\n|\x20)<br />#', '<br />', $str); echo $str; You would have to right-click and view source in your browser to see the code replacement. I use the \r\n (return carriage new line) and \x20 (hex value for an explicit space). Cheers, NRG Quote Link to comment Share on other sites More sharing options...
ryy705 Posted September 28, 2008 Author Share Posted September 28, 2008 Thank you again. Should I put a * after (\r\n|\x20) ? In case there are multiple spaces or returns? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted September 28, 2008 Share Posted September 28, 2008 Thank you again. Should I put a * after (\r\n|\x20) ? In case there are multiple spaces or returns? You sure can. I neglected that part. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.