Jump to content

bit of regex trouble


ianco

Recommended Posts

Hi, I'm trying to turn

'<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>

into

[infobox][infoboxtitle]this is a title[/infoboxtitle][infoboxtext]example text[/infoboxtext][/infobox]

 

so, using preg_replace i have:

$pagecontent = preg_replace('#<div class="infoboxtitle">([^<]+)</div>#i', "[infoboxtitle]$1[\/infoboxtitle]", $pagecontent);
$pagecontent = preg_replace('#<div class="infoboxtext">([^<]+)</div>#i', "[infoboxtext]$1[\/infoboxtext]", $pagecontent);			    
$pagecontent = preg_replace('#<div class="infobox">([^<]+)</div>#i', "[infobox]$1[\/infobox]", $pagecontent);

 

But it's not giving me anything back. Any idea where I'm going wrong?

 

Thanks

 

Ian

Link to comment
Share on other sites

If you want to parse nested data, you're going to end up with a VERY complex parser. This is beyond the scope of RegEx alone.

 

If the examples are as simple as the ones you've provided, then I don't see what's going wrong.

<?php

$pagecontent = '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>';

$pagecontent = preg_replace('#<div class="infoboxtitle">([^<]+)</div>#i', "[infoboxtitle]$1[\/infoboxtitle]", $pagecontent);
$pagecontent = preg_replace('#<div class="infoboxtext">([^<]+)</div>#i', "[infoboxtext]$1[\/infoboxtext]", $pagecontent);
$pagecontent = preg_replace('#<div class="infobox">([^<]+)</div>#i', "[infobox]$1[\/infobox]", $pagecontent);

echo $pagecontent;

?>

 

Returns

 

[infobox][infoboxtitle]this is a title[\/infoboxtitle][infoboxtext]example text[\/infoboxtext][\/infobox]

Link to comment
Share on other sites

You don't need to escape the forward slash in the replacement side. That is not causing the problem though, just thought I'd mention it.

 

You also don't need to run 3 preg_replace's for what could be achieved in one. I notice that the code you want to change it too has the class name as the tag name, therefore the following will suffice for all three, and be quicker too:

$pagecontent = preg_replace('#<div class="([a-z]+)">([^<]+)</div>#i', "[$1]$2[/$1]", $pagecontent);

 

If you want the class name to be exact then change the ([a-z]+) after class= into (infobox(?:text|title)?) (edit: Forgot to make the or part of the capture non capturing, fixed it). It will still be captured in the first parenthesis.

 

In what way is it returning nothing? Are you making sure that the input is correct?

 

Hope this helps you,

Joe

Link to comment
Share on other sites

You also don't need to run 3 preg_replace's for what could be achieved in one. I notice that the code you want to change it too has the class name as the tag name, therefore the following will suffice for all three, and be quicker too:

 

<?php

$pagecontent = '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>';

$pagecontent = preg_replace('#<div class="([a-z]+)">([^<]+)</div>#i', "[$1]$2[/$1]", $pagecontent);

echo $pagecontent;

?>

 

Returns

 

[infoboxtitle]this is a title[/infoboxtitle][infoboxtext]example text[/infoboxtext]

 

... which is not the same. He needs all 3 for the RegEx to work as designed. They must also be executed in the opposite order (inside to outside) that the nested DIVs are placed.

 

Again, if you want to parse nested tags, it's much more complex than simply executing a regular expression. You need to code a parser.

Link to comment
Share on other sites

thanks guys

 

Xyph, your solution works but it can't handle line breaking tags i.e. <br> or <br />, it makes some of the tags disappear.

 

Can you give me more info on the parser

 

joe92 i need the nested tags so i don't think your way will work

 

 

 

 

 

 

Link to comment
Share on other sites

If you read your RegEx, it's really no surprise that a line-break will cause it to function in ways you don't want it to.

 

My guess is you didn't write that RegEx, or if you did, you've cobbled it together without actually understanding what it does. Step 1 is to understand what you've coded, and understand why it's failing when you add a line-break tag.

Link to comment
Share on other sites

Ok, well I am confused. Run this:

 

<?php

$pagecontent = '<div class="infobox"><div class="infoboxtitle">this is a title</div><div class="infoboxtext">example text</div></div>';

$pagecontent = preg_replace('#<div class="([a-z]+)">([^<]+)</div>#i', "[$1]$2[/$1]", $pagecontent, 1);

$pagecontent = preg_replace('#<#i', "<", $pagecontent);
$pagecontent = preg_replace('#<#i', ">", $pagecontent);

echo $pagecontent;

?>

 

And the result is:

<div class="infobox">[infoboxtitle]this is a title[/infoboxtitle]<div class="infoboxtext">example text</div></div>

 

Ahhhh, and as I typed it I just got why mine wasn't working. The central part is looking for anything that isn't <, meaning it fails because the next div starts straight away. Duh. And without checking the contents, you're never going to be able to match up the correct tags so mine will never work. I suggest you look into making a recursive preg_replace_callback where the callback checks the contents for any nested content and alters the search pattern accordingly. As Xyph said, you are going to need a parser.

 

Good luck and if you get stuck, ask again!

Joe

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.