Jump to content

Reordering non-XHTML compliant BB code with regex


jordanwb

Recommended Posts

So I found a good BB Parser class online and I noticed that it doesn't fix the BB to make it XHTML compliant. Let's say I have the following BB code:

 

[b]Lorem ipsum [i]dolor[/b] sit amet.[/i]

 

in order to make it XHTML compliant it would have to turn into this:

 

[b]Lorem ipsum [i]dolor[/i][/b][i] sit amet.[/i]

 

But the thing is, it may not be b and i BB tags it could be:

 

[u][b]Lorem[/u] [i]ipsum[/b] dolor sit amet.[/i]

 

I hope this isn't too complicated to do with Regex. I'd like it to be a function that I can drop into the PHP class.

Link to comment
Share on other sites

I've got an idea to make a recursive function and this is what I have so far:

 

<?php

function BbCodeFixer ($bb_code)
{
$bb_code_tags = '(b|i|u|s|url|img|quote|code|size|color)';

$bb_start_finder = '#\['.$bb_code_tags.'(.*?)\](.*?)\[/'.$bb_code_tags.'\]#';

preg_match ($bb_start_finder, $bb_code, $matches);

print $matches[0];
}

BbCodeFixer ("[b]This [i]is[/i] text[/b]");

?>

 

But I knew it would output

[b]This [i]is[/i]

instead of

[b]This [i]is[/i] text[/b]

 

How would I fix that?

 

I also realized that if I had

 

[b][i][u][s]This is text[/b][/i][/u]/[/s]

 

That would be a problem as to how I would fix the order.

Link to comment
Share on other sites

What about this? I realize it's not exactly what you're after, but it does make mention of...

 

It does not simply do some regex calls, but is complete stack based parse engine. This ensures that all tags are properly nested, if not, extra tags are added to maintain the nesting. This parser should only produce xhtml 1.0 compliant code. All tags are validated and so are all their attributes. It should be easy to extend this parser with your own tags.

 

...so the code may give you some ideas.

Link to comment
Share on other sites

Yes I saw that but it turns

 

[i][b]This is[/i] text[/b]

 

into

 

[i][b]This is[/b][/i] text

 

instead of (IMO)

 

[i][b]This is[/b][/i][b] text[/b]

 

OR

 

[b][i]This is[/i] text[/b]

The last is what I'm trying to do.

 

I'm thinking of changing the DOCTYPE to HTML 4.01 Transitional, but that may be a last resort.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.