Regex For Link

Drongo_III · September 29, 2012

Hi Guys

I'm retrieving a twitter feed as json but I have a slight issue.

The feed outputs as text and so links in tweets come through as plain text, e.g.:

More Tweets to discover... in the Discover tab on http://t.co/coKFdEQL. http://t.co/6OfRxQeW

I've used preg_replace to reform the links into html links using the following code:

$twitterFeed = file_get_contents('https://api.twitter.com/1/statuses/user_timeline.json?screen_name=twitter&count=4');
$feedArray = json_decode($twitterFeed, true);

//Pattern to correct date format
$pattern = '/\+[0-9\s\b]+$/';
$replacement = '';

//$linkPattern = '/(http:\/\/[^\s]+)(?!\.)$/'; // This was my shot at forward reference

// Pattern to match links
$linkPattern = '/(http:\/\/[^\s]+)/';
$linkReplace = "<a href=\"$1\">$1</a>"; // Replacement pattern to create a link

foreach($feedArray as $k=>$v){

$date = preg_replace($pattern, $replacement, $v['created_at']);
$text = preg_replace($linkPattern, $linkReplace , $v['text']);

echo $date . "<br/>" . $text."<br/><br/>";
}

The problem is that the linkPattern also captures the trailing full stop at the end of the sentence (seen in example above). So the resultant link ends up as a 404 because that full stop shouldn't be part of the link.

Therefore can anyone suggest either:

1) how the linkPattern can be adjusted so that it doesn't capture the trailing full stop

2) how I can rtrim the capture reference if it's a full stop

Or do I just need to do another preg_replace?

Thanks!

Jessica · September 29, 2012

rtrim()

Christian F. · September 29, 2012

This is the Regular Expression I'm using for URIs, should be good for your purposes too:

$RegExp = '#^(??:(?:f|ht)tps?|dchub)://)?((?:[\\w\\pL-]+\\.)+[a-z\\pL]{2,5})((?:/[\\w\\%-]*)+(??:\\.\\w{1,6})?(\\?(?:[\\w-]+=[\\w-]+)(?:&[\\w-]+(?:=[\\w-]+)?)*&?)?)?)?\\z#ui';

May be that you can use the filter_var () functionality for this too, but no guarantees there. Worth checking out at least.

Drongo_III · September 29, 2012

This is the Regular Expression I'm using for URIs, should be good for your purposes too:
$RegExp = '#^(??:(?:f|ht)tps?|dchub)://)?((?:[\\w\\pL-]+\\.)+[a-z\\pL]{2,5})((?:/[\\w\\%-]*)+(??:\\.\\w{1,6})?(\\?(?:[\\w-]+=[\\w-]+)(?:&[\\w-]+(?:=[\\w-]+)?)*&?)?)?)?\\z#ui';
May be that you can use the filter_var () functionality for this too, but no guarantees there. Worth checking out at least.

That doesn't seem to match replace the plain text links as links. And I won't lie, I don't follow half of that!

If I just wanted to rtrim the reference how can I do that? Is that possible to do?

Christian F. · September 29, 2012

You're right in that I don't use it for replacing URIs with HTML anchors, but I use it to validate them. To get it to replace, all you need to do is to take out the RegExp anchors. So that it's not tied to the start and end of the string.

Next step is to put a pair of parentheses around the whole exp<b></b>ression, to save the result in sub group 1. Which you can then use in the replacement text.

Edit: In short, replace the first caret (^) with an opening parenthesis, and the "\\z" with a closing parenthesis, and you're set.

Upon re-reading your post, I see that you're only fetching links from Twitter feeds. Which means you can simplify the RegExp quite a bit. Mine above matches a complete URI, and isn't needed for your purposes. Sorry about missing that the first time around.

In short, what you need is the following:

$RegExp = '#(http://(?:\\w+\\.)+\\w+/\\w+)#u';

Drongo_III · September 29, 2012

Well that worked perfectly Thank you!

I will spend some time deciphering that regex too...i'll have it cracked by sometime next month

You're right in that I don't use it for replacing URIs with HTML anchors, but I use it to validate them. To get it to replace, all you need to do is to take out the RegExp anchors. So that it's not tied to the start and end of the string.

Next step is to put a pair of parentheses around the whole expression, to save the result in sub group 1. Which you can then use in the replacement text.

Edit: In short, replace the first caret (^) with an opening parenthesis, and the "\\z" with a closing parenthesis, and you're set.

Christian F. · September 29, 2012

You're welcome, glad I could help.

Good luck on the deciphering as well.

Sign In

Regex For Link

Recommended Posts

Drongo_III

Link to comment

Share on other sites

Jessica

Link to comment

Share on other sites

Christian F.

Link to comment

Share on other sites

Drongo_III

Link to comment

Share on other sites

Christian F.

Link to comment

Share on other sites

Drongo_III

Link to comment

Share on other sites

Christian F.

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information