Jump to content

Regex For Link


Drongo_III

Recommended Posts

Hi Guys

 

I'm retrieving a twitter feed as json but I have a slight issue.

 

The feed outputs as text and so links in tweets come through as plain text, e.g.:

 

More Tweets to discover... in the Discover tab on http://t.co/coKFdEQL. http://t.co/6OfRxQeW

 

I've used preg_replace to reform the links into html links using the following code:

 

$twitterFeed = file_get_contents('https://api.twitter.com/1/statuses/user_timeline.json?screen_name=twitter&count=4');
$feedArray = json_decode($twitterFeed, true);

//Pattern to correct date format
$pattern = '/\+[0-9\s\b]+$/';
$replacement = '';

//$linkPattern = '/(http:\/\/[^\s]+)(?!\.)$/'; // This was my shot at forward reference

// Pattern to match links
$linkPattern = '/(http:\/\/[^\s]+)/';
$linkReplace = "<a href=\"$1\">$1</a>"; // Replacement pattern to create a link

foreach($feedArray as $k=>$v){

$date = preg_replace($pattern, $replacement, $v['created_at']);
$text = preg_replace($linkPattern, $linkReplace , $v['text']);

echo $date . "<br/>" . $text."<br/><br/>";
}

 

The problem is that the linkPattern also captures the trailing full stop at the end of the sentence (seen in example above). So the resultant link ends up as a 404 because that full stop shouldn't be part of the link.

 

Therefore can anyone suggest either:

 

1) how the linkPattern can be adjusted so that it doesn't capture the trailing full stop

2) how I can rtrim the capture reference if it's a full stop

 

Or do I just need to do another preg_replace? :)

 

Thanks!

Link to comment
https://forums.phpfreaks.com/topic/268916-regex-for-link/
Share on other sites

This is the Regular Expression I'm using for URIs, should be good for your purposes too:

$RegExp = '#^(??:(?:f|ht)tps?|dchub)://)?((?:[\\w\\pL-]+\\.)+[a-z\\pL]{2,5})((?:/[\\w\\%-]*)+(??:\\.\\w{1,6})?(\\?(?:[\\w-]+=[\\w-]+)(?:&[\\w-]+(?:=[\\w-]+)?)*&?)?)?)?\\z#ui';

 

May be that you can use the filter_var () functionality for this too, but no guarantees there. Worth checking out at least.

Link to comment
https://forums.phpfreaks.com/topic/268916-regex-for-link/#findComment-1381784
Share on other sites

This is the Regular Expression I'm using for URIs, should be good for your purposes too:

$RegExp = '#^(??:(?:f|ht)tps?|dchub)://)?((?:[\\w\\pL-]+\\.)+[a-z\\pL]{2,5})((?:/[\\w\\%-]*)+(??:\\.\\w{1,6})?(\\?(?:[\\w-]+=[\\w-]+)(?:&[\\w-]+(?:=[\\w-]+)?)*&?)?)?)?\\z#ui';

 

May be that you can use the filter_var () functionality for this too, but no guarantees there. Worth checking out at least.

 

That doesn't seem to match replace the plain text links as links. And I won't lie, I don't follow half of that!

 

If I just wanted to rtrim the reference how can I do that? Is that possible to do?

Link to comment
https://forums.phpfreaks.com/topic/268916-regex-for-link/#findComment-1381786
Share on other sites

You're right in that I don't use it for replacing URIs with HTML anchors, but I use it to validate them. To get it to replace, all you need to do is to take out the RegExp anchors. So that it's not tied to the start and end of the string.

Next step is to put a pair of parentheses around the whole exp<b></b>ression, to save the result in sub group 1. Which you can then use in the replacement text.

 

Edit: In short, replace the first caret (^) with an opening parenthesis, and the "\\z" with a closing parenthesis, and you're set.

 

Upon re-reading your post, I see that you're only fetching links from Twitter feeds. Which means you can simplify the RegExp quite a bit. Mine above matches a complete URI, and isn't needed for your purposes. Sorry about missing that the first time around.

 

In short, what you need is the following:

$RegExp = '#(http://(?:\\w+\\.)+\\w+/\\w+)#u';

Link to comment
https://forums.phpfreaks.com/topic/268916-regex-for-link/#findComment-1381790
Share on other sites

Well that worked perfectly :) Thank you!

 

I will spend some time deciphering that regex too...i'll have it cracked by sometime next month ;)

 

You're right in that I don't use it for replacing URIs with HTML anchors, but I use it to validate them. To get it to replace, all you need to do is to take out the RegExp anchors. So that it's not tied to the start and end of the string.

Next step is to put a pair of parentheses around the whole expression, to save the result in sub group 1. Which you can then use in the replacement text.

 

Edit: In short, replace the first caret (^) with an opening parenthesis, and the "\\z" with a closing parenthesis, and you're set.

Link to comment
https://forums.phpfreaks.com/topic/268916-regex-for-link/#findComment-1381793
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.