Jump to content

Need help picking links from strings of text (Tweets)


Recommended Posts

I'm using the twitter streaming API to get json data of any tweet contaning http.

 

I'm needing to pull the individual link out of each tweet. Currently the way I'm checking only has about 60% accuracy as some other characters in the tweet can screw with it. This is what I'm currently doing.

$pos = substr($li, 0, 4); 

if( $pos == 'http' ){
$link = $li;
$long_url = '';
} else {
$link = 'not found';
}

 

Basically what I'm asking is, is there a better way to single out links (http://) from a string of text and into a variable?

I'm using the twitter streaming API to get json data of any tweet contaning http.

 

I'm needing to pull the individual link out of each tweet. Currently the way I'm checking only has about 60% accuracy as some other characters in the tweet can screw with it. This is what I'm currently doing.

$pos = substr($li, 0, 4); 

if( $pos == 'http' ){
$link = $li;
$long_url = '';
} else {
$link = 'not found';
}

 

Basically what I'm asking is, is there a better way to single out links (http://) from a string of text and into a variable?

 

Are you aware PHP (>=5.0) has a json_decode function which can put the data into arrays for you?

I'm using the twitter streaming API to get json data of any tweet contaning http.

 

I'm needing to pull the individual link out of each tweet. Currently the way I'm checking only has about 60% accuracy as some other characters in the tweet can screw with it. This is what I'm currently doing.

$pos = substr($li, 0, 4); 

if( $pos == 'http' ){
$link = $li;
$long_url = '';
} else {
$link = 'not found';
}

 

Basically what I'm asking is, is there a better way to single out links (http://) from a string of text and into a variable?

 

Are you aware PHP (>=5.0) has a json_decode function which can put the data into arrays for you?

 

Yes, I'm using json_decode earlier in the script to seperate the data into arrays. But how will this help me separate out links?

It would help if I knew the array structure. I'm sure it's in a simple format, if you could post an example of the array json_decode puts out (using print_r around it) and show me.

 

{"favorited":false,"text":"biz mkan mLah ngantuk.... *_* http://myloc.me/2rmzG","in_reply_to_user_id":null,"in_reply_to_status_id":null,"in_reply_to_screen_name":null,"geo":null,"source":"<a href=\"http://ubertwitter.com\" rel=\"nofollow\">UberTwitter</a>","created_at":"Tue Dec 29 08:52:18 +0000 2009","user":{"profile_background_tile":false,"profile_sidebar_border_color":"87bc44","url":null,"verified":false,"followers_count":18,"friends_count":83,"description":null,"profile_background_color":"9ae4e8","geo_enabled":false,"favourites_count":0,"notifications":null,"created_at":"Mon Nov 09 05:50:20 +0000 2009","profile_text_color":"000000","time_zone":null,"protected":false,"profile_image_url":"http://s.twimg.com/a/1262036730/images/default_profile_3_normal.png","statuses_count":371,"profile_link_color":"0000ff","location":"Jakarta, Jakarta","name":"jeje","following":null,"profile_background_image_url":"http://s.twimg.com/a/1262036730/images/themes/theme1/bg.png","screen_name":"jejechappy","id":88602797,"utc_offset":null,"profile_sidebar_fill_color":"e0ff92"},"truncated":false,"id":7154163945}
{"favorited":false,"text":"Q:Numa briga entre Satan Goss e George Fo... A:Depende. Um juiz normal morreria durant... http://formspring.me/VVec/q/11135868 #formspringme","in_reply_to_user_id":null,"in_reply_to_status_id":null,"in_reply_to_screen_name":null,"geo":null,"source":"<a href=\"http://formspring.me\" rel=\"nofollow\">formspring.me</a>","created_at":"Tue Dec 29 08:52:19 +0000 2009","user":{"profile_background_tile":false,"profile_sidebar_border_color":"181A1E","url":"http://www.orkut.com.br/Main#Profile.aspx?rl=mp&uid=4664721737017959749","verified":false,"followers_count":11,"friends_count":14,"description":"When my uncle Ben was murdered, my father Oden banished me from Asgard as I was exposed to a gamma zombie. Um Nerd 5/Metaleiro 5/Artista 5, com spec de Gamer.","profile_background_color":"1A1B1F","geo_enabled":false,"favourites_count":1,"notifications":null,"created_at":"Thu Jul 30 05:44:35 +0000 2009","profile_text_color":"666666","time_zone":"Santiago","protected":false,"profile_image_url":"http://a3.twimg.com/profile_images/495142819/V_Avatar01_c_pia_normal.jpg","statuses_count":562,"profile_link_color":"2FC2EF","location":"In your head.","name":"V Dias","following":null,"profile_background_image_url":"http://s.twimg.com/a/1262047188/images/themes/theme9/bg.gif","screen_name":"_vec","id":61404047,"utc_offset":-14400,"profile_sidebar_fill_color":"252429"},"truncated":false,"id":7154163964}

what is $li defined as..

If it is the entirety of the text array key, your current code will only return if the first 4 chars are http..

You would need to check the whole text value for http using regex or something.

what is $li defined as..

If it is the entirety of the text array key, your current code will only return if the first 4 chars are http..

You would need to check the whole text value for http using regex or something.

Sorry, I just realized I need to post more of the code (hey, it's 4 am for me).

 

Basically it explodes the string at the spaces and then checks if the first couple characters are http, if so then it's the link. if not then it's not.

$link = explode(' ', $tweet->text);

foreach ($link as $li){
$pos = substr($li, 0, 4); 

if( $pos == 'http' ){
	$link = $li;
} else {
	$link = 'not found';
}

} // end foreach //

So what exactly is wrong with your method?

 

It would like to see it be more accurate in retrieving the link. It's only successful about 60% of the time.

 

I've found a common mistake its having is if there is a link then a hash tag like this

http://wiurl.com/twitlinks #php

 

for some reason it wont collect the link from something like the code above.  :-\

With that example is it returning part of the url and cutting off the #php part?

It will be doing that because you are exploding the string by ' '

 

You would be better off using a regex function(IMO) and search the whole string..

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.