Jump to content

Regex For Link


Drongo_III

Recommended Posts

Hi Guys

 

I'm retrieving a twitter feed as json but I have a slight issue.

 

The feed outputs as text and so links in tweets come through as plain text, e.g.:

 

More Tweets to discover... in the Discover tab on http://t.co/coKFdEQL. http://t.co/6OfRxQeW

 

I've used preg_replace to reform the links into html links using the following code:

 

$twitterFeed = file_get_contents('https://api.twitter.com/1/statuses/user_timeline.json?screen_name=twitter&count=4');
$feedArray = json_decode($twitterFeed, true);

//Pattern to correct date format
$pattern = '/\+[0-9\s\b]+$/';
$replacement = '';

//$linkPattern = '/(http:\/\/[^\s]+)(?!\.)$/'; // This was my shot at forward reference

// Pattern to match links
$linkPattern = '/(http:\/\/[^\s]+)/';
$linkReplace = "<a href=\"$1\">$1</a>"; // Replacement pattern to create a link

foreach($feedArray as $k=>$v){

$date = preg_replace($pattern, $replacement, $v['created_at']);
$text = preg_replace($linkPattern, $linkReplace , $v['text']);

echo $date . "<br/>" . $text."<br/><br/>";
}

 

The problem is that the linkPattern also captures the trailing full stop at the end of the sentence (seen in example above). So the resultant link ends up as a 404 because that full stop shouldn't be part of the link.

 

Therefore can anyone suggest either:

 

1) how the linkPattern can be adjusted so that it doesn't capture the trailing full stop

2) how I can rtrim the capture reference if it's a full stop

 

Or do I just need to do another preg_replace? :)

 

Thanks!

Edited by Drongo_III
Link to comment
Share on other sites

This is the Regular Expression I'm using for URIs, should be good for your purposes too:

$RegExp = '#^(??:(?:f|ht)tps?|dchub)://)?((?:[\\w\\pL-]+\\.)+[a-z\\pL]{2,5})((?:/[\\w\\%-]*)+(??:\\.\\w{1,6})?(\\?(?:[\\w-]+=[\\w-]+)(?:&[\\w-]+(?:=[\\w-]+)?)*&?)?)?)?\\z#ui';

 

May be that you can use the filter_var () functionality for this too, but no guarantees there. Worth checking out at least.

Link to comment
Share on other sites

This is the Regular Expression I'm using for URIs, should be good for your purposes too:

$RegExp = '#^(??:(?:f|ht)tps?|dchub)://)?((?:[\\w\\pL-]+\\.)+[a-z\\pL]{2,5})((?:/[\\w\\%-]*)+(??:\\.\\w{1,6})?(\\?(?:[\\w-]+=[\\w-]+)(?:&[\\w-]+(?:=[\\w-]+)?)*&?)?)?)?\\z#ui';

 

May be that you can use the filter_var () functionality for this too, but no guarantees there. Worth checking out at least.

 

That doesn't seem to match replace the plain text links as links. And I won't lie, I don't follow half of that!

 

If I just wanted to rtrim the reference how can I do that? Is that possible to do?

Link to comment
Share on other sites

You're right in that I don't use it for replacing URIs with HTML anchors, but I use it to validate them. To get it to replace, all you need to do is to take out the RegExp anchors. So that it's not tied to the start and end of the string.

Next step is to put a pair of parentheses around the whole exp<b></b>ression, to save the result in sub group 1. Which you can then use in the replacement text.

 

Edit: In short, replace the first caret (^) with an opening parenthesis, and the "\\z" with a closing parenthesis, and you're set.

 

Upon re-reading your post, I see that you're only fetching links from Twitter feeds. Which means you can simplify the RegExp quite a bit. Mine above matches a complete URI, and isn't needed for your purposes. Sorry about missing that the first time around.

 

In short, what you need is the following:

$RegExp = '#(http://(?:\\w+\\.)+\\w+/\\w+)#u';

Edited by Christian F.
Link to comment
Share on other sites

Well that worked perfectly :) Thank you!

 

I will spend some time deciphering that regex too...i'll have it cracked by sometime next month ;)

 

You're right in that I don't use it for replacing URIs with HTML anchors, but I use it to validate them. To get it to replace, all you need to do is to take out the RegExp anchors. So that it's not tied to the start and end of the string.

Next step is to put a pair of parentheses around the whole expression, to save the result in sub group 1. Which you can then use in the replacement text.

 

Edit: In short, replace the first caret (^) with an opening parenthesis, and the "\\z" with a closing parenthesis, and you're set.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.