[SOLVED] Breaking up long words without interfering with URLs

BluMess · August 10, 2009

Hey

I've been modifying an existing BBCode script for a forum I am making, but I found if someone put in a really long word, maybe just as spam, it broke up the content of the page.

I'm using these lines to stop that from happening:

$character_limit = 50;
$BBCode_Text = preg_replace('/([^\s]{'.$character_limit.'})(?=[^\s])/m', '$1 ', $BBCode_Text);

Which works great, but my problem is that if someone put in a URL, e.g.

http://www.a-very-long-website-name-maybe-a-huuuuuge-link-to-an-obscure-file.com/a-folder/some-other-folder-with-a-name-so-long-it-should-be-made-illegal/file.html

It gets broken up and doesn't work when it's clicked. I have the same problem for images, because their source is getting split apart. Is there any way of preventing this from happening, maybe by looking for a "www." or an ending such as ".com" or ".png" - I'm pretty stuck on this one

Thank you for your time and help

~Chris

rea|and · August 11, 2009

Hi,

I've added 2 patterns in your regex. One to match links, the other to match images. They are generic patterns, you could change the first with something more specific or add some other images extensions in the second. Try it

$BBCode_Text=preg_replace('/(https?:\/\/(www)?)\S+|\S+\.(jpe?g|gif|png)|\S{'.$character_limit.'}(?=\S)/im', "$0 ", $BBCode_Text);

BluMess · August 11, 2009

Thanks v much for the response

Unfortunately it's not made any difference. I've placed the code at the end of all the BBCode stuff so that it doesn't interfere with for example changing tags into HTML links.

This is what I've put in:

And this is what has been displayed:

<a href="http://www.hastheworldbeendestroyedbythelarg ehadroncollideryet.com">http://www.hastheworldbeen destroyedbythelargehadroncollideryet.com</a>

It hasn't really made any difference.

Would it be better / easier to make a regex code to remove spaces from parts between a or tag?

I'm brushing up on my regex atm, this is quite a tricky problem to sort out because it interferes with all the tags if there are enough characters before it, causing them to be missed out of searches

BluMess · August 11, 2009

Sorry, this is what I put in:

[ url=http://www.hastheworldbeendestroyedbythelargehadroncollideryet.com]http://www.hastheworldbeendestroyedbythelargehadroncollideryet.com[/url ]

(without spaces around the url tags)

thebadbad · August 11, 2009

You can use this to only match 50 non-whitespace characters if they are found outside BB code tags:

$character_limit = 50;
$BBCode_Text = preg_replace('~\S{' . $character_limit. '}(?![^\[]*?\])~', '$0 ', $BBCode_Text);

I'm utilizing a negative lookahead.

The pattern searches for the 50 characters, and when found, checks the following characters. If any other character than [ is found 0 or more times, immediately followed by a ], the whole thing fails to match (= the keyword is inside a BB code tag, i.e. between [ and ]).

Hope that solves your problem.

BluMess · August 12, 2009

thebadbad, that script works brilliantly to prevent the tags from breaking up -thanks very much

The only problem I have, which I have been playing around with your script to fix, is this:

or

Just specifically for those tags, it would be great if the regex could look for either the tags (w/o spaces) and not add any spaces for the things inside those tags.

Is that possible?

This is what I tried, but it failed miserably I'm afraid:

$BBCode_Text = preg_replace('~\S{' . $character_limit. '}(?![^\[]*?\])|~\S{' . $character_limit. '}(?![^\[](url|img)\].*?\[\/(url|img)\])~', '$0 ', $BBCode_Text);

The error was "an unknown modifier: / ", am I at least on the right line? :S

thebadbad · August 12, 2009

Oh, I see. That's a problem.

It would actually be easiest to do the replacing after you've translated the BB code to (X)HTML. 'Cause then you could just use

$character_limit = 50;
$html = preg_replace('~\S{' . $character_limit. '}(?![^<]*?>)~', '$0 ', $html);

to make sure that only long strings found outside tags are broken up.

If that's possible in your current setup.

BluMess · August 12, 2009

Wow, thank you so much - it works perfectly!

I think I understand how it works - It looks for 50 ($character_limit) non-whitespace characters and then looks back for "<tagname with properties etc>", that's actually genius, and so simple

Cheers, that's helped such a lot. Good luck in future!

~Chris

thebadbad · August 12, 2009

You're welcome, and thanks

But you haven't got it entirely; the tricky part (?![^<]*?>) is a negative lookahead, meaning if the regular expression is matched, the overall pattern fails to match. And I already explained what the green expression (as part of the lookahead) does:

... If any other character than < is found 0 or more times, immediately followed by a >, the whole thing fails to match (= the keyword is inside a HTML tag, i.e. between < and >) ...

BluMess · August 12, 2009

Ahh, ok that makes more sense.

Thanks again!

Sign In

[SOLVED] Breaking up long words without interfering with URLs

Recommended Posts

BluMess

Link to comment

Share on other sites

rea|and

Link to comment

Share on other sites

BluMess

Link to comment

Share on other sites

BluMess

Link to comment

Share on other sites

thebadbad

Link to comment

Share on other sites

BluMess

Link to comment

Share on other sites

thebadbad

Link to comment

Share on other sites

BluMess

Link to comment

Share on other sites

thebadbad

Link to comment

Share on other sites

BluMess

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information