ianco Posted March 5, 2012 Share Posted March 5, 2012 Hi, I'm trying to turn '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div> into [infobox][infoboxtitle]this is a title[/infoboxtitle][infoboxtext]example text[/infoboxtext][/infobox] so, using preg_replace i have: $pagecontent = preg_replace('#<div class="infoboxtitle">([^<]+)</div>#i', "[infoboxtitle]$1[\/infoboxtitle]", $pagecontent); $pagecontent = preg_replace('#<div class="infoboxtext">([^<]+)</div>#i', "[infoboxtext]$1[\/infoboxtext]", $pagecontent); $pagecontent = preg_replace('#<div class="infobox">([^<]+)</div>#i', "[infobox]$1[\/infobox]", $pagecontent); But it's not giving me anything back. Any idea where I'm going wrong? Thanks Ian Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/ Share on other sites More sharing options...
xyph Posted March 5, 2012 Share Posted March 5, 2012 If you want to parse nested data, you're going to end up with a VERY complex parser. This is beyond the scope of RegEx alone. If the examples are as simple as the ones you've provided, then I don't see what's going wrong. <?php $pagecontent = '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>'; $pagecontent = preg_replace('#<div class="infoboxtitle">([^<]+)</div>#i', "[infoboxtitle]$1[\/infoboxtitle]", $pagecontent); $pagecontent = preg_replace('#<div class="infoboxtext">([^<]+)</div>#i', "[infoboxtext]$1[\/infoboxtext]", $pagecontent); $pagecontent = preg_replace('#<div class="infobox">([^<]+)</div>#i', "[infobox]$1[\/infobox]", $pagecontent); echo $pagecontent; ?> Returns [infobox][infoboxtitle]this is a title[\/infoboxtitle][infoboxtext]example text[\/infoboxtext][\/infobox] Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324250 Share on other sites More sharing options...
joe92 Posted March 5, 2012 Share Posted March 5, 2012 You don't need to escape the forward slash in the replacement side. That is not causing the problem though, just thought I'd mention it. You also don't need to run 3 preg_replace's for what could be achieved in one. I notice that the code you want to change it too has the class name as the tag name, therefore the following will suffice for all three, and be quicker too: $pagecontent = preg_replace('#<div class="([a-z]+)">([^<]+)</div>#i', "[$1]$2[/$1]", $pagecontent); If you want the class name to be exact then change the ([a-z]+) after class= into (infobox(?:text|title)?) (edit: Forgot to make the or part of the capture non capturing, fixed it). It will still be captured in the first parenthesis. In what way is it returning nothing? Are you making sure that the input is correct? Hope this helps you, Joe Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324252 Share on other sites More sharing options...
xyph Posted March 5, 2012 Share Posted March 5, 2012 You also don't need to run 3 preg_replace's for what could be achieved in one. I notice that the code you want to change it too has the class name as the tag name, therefore the following will suffice for all three, and be quicker too: <?php $pagecontent = '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>'; $pagecontent = preg_replace('#<div class="([a-z]+)">([^<]+)</div>#i', "[$1]$2[/$1]", $pagecontent); echo $pagecontent; ?> Returns [infoboxtitle]this is a title[/infoboxtitle][infoboxtext]example text[/infoboxtext] ... which is not the same. He needs all 3 for the RegEx to work as designed. They must also be executed in the opposite order (inside to outside) that the nested DIVs are placed. Again, if you want to parse nested tags, it's much more complex than simply executing a regular expression. You need to code a parser. Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324254 Share on other sites More sharing options...
ianco Posted March 5, 2012 Author Share Posted March 5, 2012 thanks guys Xyph, your solution works but it can't handle line breaking tags i.e. <br> or <br />, it makes some of the tags disappear. Can you give me more info on the parser joe92 i need the nested tags so i don't think your way will work Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324257 Share on other sites More sharing options...
xyph Posted March 5, 2012 Share Posted March 5, 2012 If you read your RegEx, it's really no surprise that a line-break will cause it to function in ways you don't want it to. My guess is you didn't write that RegEx, or if you did, you've cobbled it together without actually understanding what it does. Step 1 is to understand what you've coded, and understand why it's failing when you add a line-break tag. Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324262 Share on other sites More sharing options...
joe92 Posted March 5, 2012 Share Posted March 5, 2012 Ok, well I am confused. Run this: <?php $pagecontent = '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>'; $pagecontent = preg_replace('#<div class="([a-z]+)">([^<]+)</div>#i', "[$1]$2[/$1]", $pagecontent, 1); $pagecontent = preg_replace('#<#i', "<", $pagecontent); $pagecontent = preg_replace('#<#i', ">", $pagecontent); echo $pagecontent; ?> And the result is: <div class="infobox">[infoboxtitle]this is a title[/infoboxtitle]<div class="infoboxtext">example text</div></div> Ahhhh, and as I typed it I just got why mine wasn't working. The central part is looking for anything that isn't <, meaning it fails because the next div starts straight away. Duh. And without checking the contents, you're never going to be able to match up the correct tags so mine will never work. I suggest you look into making a recursive preg_replace_callback where the callback checks the contents for any nested content and alters the search pattern accordingly. As Xyph said, you are going to need a parser. Good luck and if you get stuck, ask again! Joe Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324264 Share on other sites More sharing options...
ianco Posted March 5, 2012 Author Share Posted March 5, 2012 xyph, you're not wrong, I previously had $pagecontent = preg_replace('#<h2>([^<]+)</h2>#i', "==$1==", $pagecontent); and you wouldn't line break a heading Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324265 Share on other sites More sharing options...
ianco Posted March 5, 2012 Author Share Posted March 5, 2012 joe92, thanks I'll look into preg_replace_callback Quote Link to comment https://forums.phpfreaks.com/topic/258339-bit-of-regex-trouble/#findComment-1324269 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.