BluMess Posted August 10, 2009 Share Posted August 10, 2009 Hey I've been modifying an existing BBCode script for a forum I am making, but I found if someone put in a really long word, maybe just as spam, it broke up the content of the page. I'm using these lines to stop that from happening: $character_limit = 50; $BBCode_Text = preg_replace('/([^\s]{'.$character_limit.'})(?=[^\s])/m', '$1 ', $BBCode_Text); Which works great, but my problem is that if someone put in a URL, e.g. http://www.a-very-long-website-name-maybe-a-huuuuuge-link-to-an-obscure-file.com/a-folder/some-other-folder-with-a-name-so-long-it-should-be-made-illegal/file.html It gets broken up and doesn't work when it's clicked. I have the same problem for images, because their source is getting split apart. Is there any way of preventing this from happening, maybe by looking for a "www." or an ending such as ".com" or ".png" - I'm pretty stuck on this one Thank you for your time and help ~Chris Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/ Share on other sites More sharing options...
rea|and Posted August 11, 2009 Share Posted August 11, 2009 Hi, I've added 2 patterns in your regex. One to match links, the other to match images. They are generic patterns, you could change the first with something more specific or add some other images extensions in the second. Try it $BBCode_Text=preg_replace('/(https?:\/\/(www)?)\S+|\S+\.(jpe?g|gif|png)|\S{'.$character_limit.'}(?=\S)/im', "$0 ", $BBCode_Text); Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-895429 Share on other sites More sharing options...
BluMess Posted August 11, 2009 Author Share Posted August 11, 2009 Thanks v much for the response Unfortunately it's not made any difference. I've placed the code at the end of all the BBCode stuff so that it doesn't interfere with for example changing tags into HTML links. This is what I've put in: And this is what has been displayed: <a href="http://www.hastheworldbeendestroyedbythelarg ehadroncollideryet.com">http://www.hastheworldbeen destroyedbythelargehadroncollideryet.com</a> It hasn't really made any difference. Would it be better / easier to make a regex code to remove spaces from parts between a or tag? I'm brushing up on my regex atm, this is quite a tricky problem to sort out because it interferes with all the tags if there are enough characters before it, causing them to be missed out of searches Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-895814 Share on other sites More sharing options...
BluMess Posted August 11, 2009 Author Share Posted August 11, 2009 Sorry, this is what I put in: [ url=http://www.hastheworldbeendestroyedbythelargehadroncollideryet.com]http://www.hastheworldbeendestroyedbythelargehadroncollideryet.com[/url ] (without spaces around the url tags) Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-895821 Share on other sites More sharing options...
thebadbad Posted August 11, 2009 Share Posted August 11, 2009 You can use this to only match 50 non-whitespace characters if they are found outside BB code tags: $character_limit = 50; $BBCode_Text = preg_replace('~\S{' . $character_limit. '}(?![^\[]*?\])~', '$0 ', $BBCode_Text); I'm utilizing a negative lookahead. The pattern searches for the 50 characters, and when found, checks the following characters. If any other character than [ is found 0 or more times, immediately followed by a ], the whole thing fails to match (= the keyword is inside a BB code tag, i.e. between [ and ]). Hope that solves your problem. Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-895941 Share on other sites More sharing options...
BluMess Posted August 12, 2009 Author Share Posted August 12, 2009 thebadbad, that script works brilliantly to prevent the tags from breaking up -thanks very much The only problem I have, which I have been playing around with your script to fix, is this: or Just specifically for those tags, it would be great if the regex could look for either the tags (w/o spaces) and not add any spaces for the things inside those tags. Is that possible? This is what I tried, but it failed miserably I'm afraid: $BBCode_Text = preg_replace('~\S{' . $character_limit. '}(?![^\[]*?\])|~\S{' . $character_limit. '}(?![^\[](url|img)\].*?\[\/(url|img)\])~', '$0 ', $BBCode_Text); The error was "an unknown modifier: / ", am I at least on the right line? :S Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-896377 Share on other sites More sharing options...
thebadbad Posted August 12, 2009 Share Posted August 12, 2009 Oh, I see. That's a problem. It would actually be easiest to do the replacing after you've translated the BB code to (X)HTML. 'Cause then you could just use $character_limit = 50; $html = preg_replace('~\S{' . $character_limit. '}(?![^<]*?>)~', '$0 ', $html); to make sure that only long strings found outside tags are broken up. If that's possible in your current setup. Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-896397 Share on other sites More sharing options...
BluMess Posted August 12, 2009 Author Share Posted August 12, 2009 Wow, thank you so much - it works perfectly! I think I understand how it works - It looks for 50 ($character_limit) non-whitespace characters and then looks back for "<tagname with properties etc>", that's actually genius, and so simple Cheers, that's helped such a lot. Good luck in future! ~Chris Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-896696 Share on other sites More sharing options...
thebadbad Posted August 12, 2009 Share Posted August 12, 2009 You're welcome, and thanks But you haven't got it entirely; the tricky part (?![^<]*?>) is a negative lookahead, meaning if the regular expression is matched, the overall pattern fails to match. And I already explained what the green expression (as part of the lookahead) does: ... If any other character than < is found 0 or more times, immediately followed by a >, the whole thing fails to match (= the keyword is inside a HTML tag, i.e. between < and >) ... Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-896765 Share on other sites More sharing options...
BluMess Posted August 12, 2009 Author Share Posted August 12, 2009 Ahh, ok that makes more sense. Thanks again! Quote Link to comment https://forums.phpfreaks.com/topic/169655-solved-breaking-up-long-words-without-interfering-with-urls/#findComment-896769 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.