curl? scrape? how what?

monkeytooth · October 28, 2008

Ok, I know this is possible its been done by others.. I know there is no simple answer to this (which if there is lay it on me). So is it curl? That I would use to get information off a myspace profile, or facebook profile? I am mostly aiming at myspace at the moment. Im not looking to log in or change anything at the moment.. just mostly get things off any given profile and display it on another site. What is it I am looking for is there any key word descirptions I can google for a tutorial or something to get me started on this or is there anything anyone here can do to help me get started on what I need to do for this?

DarkWater · October 28, 2008

Well, you'd have to log in unless they had a public profile, but otherwise, cURL and regex could probably do whatever you're asking for.

monkeytooth · October 28, 2008

Well for my purposes at the moment I am assuming public based profiles.. you know of any good tutorials on how to use regex and cURL outside of php.net don't even care if it deals with myspace profile (though that would be nice) but something a bit on the advanced side of learning either or both.

aseaofflames · October 29, 2008

a quick function (will work until myspace changes the profile):

function get_myspace_info($url) {

//get the contents of the page
$page = file_get_contents($url);
//get html info info table
$e = preg_match('#<table id="Table2"(.*?)</table>#msi',$page,$i);
$f = preg_match('#</td></td>(.*?)</td>#msi',$i[1],$in);
//get profile picture
$g = preg_match('#<img border="0"(.*?)</a>#msi',$i[1],$pic);
//get mood
$h = preg_match('#<span class="searchMonkey-mood">(.*?)</span>#msi',$i[1],$cm);

$mood = $cm[1];
$picture = str_replace("</a>","",$pic[0]);

$info = explode("<br />",strip_tags(trim($in[1]),"<br><img>"));

$last_login = str_replace("Last Login: ","",$info[9]);

//build info array

$myspace = array("url" => $url, "slogan" => $info[0], "sex" => $info[2], "age" => $info[3], "location" => $info[4], "country" => $info[5], "last_login" => $last_login, "picture" => $picture, "mood" => $mood);

return $myspace;
}

call example:

$myspace_info = get_myspace_info("http://www.myspace.com/tom");

$myspace_info will contain an array similar to this:

Array
(
    [url] => http://www.myspace.com/tom
    [slogan] => ":-)"
    [sex] => Male
	                
    [age] => 33 years old
	                
    [location] => Santa Monica, CALIFORNIA
	                
    [country] => United States
	                
    [last_login] => 10/28/2008
	                
    [picture] => <img border="0" alt="" src="http://b2.ac-images.myspacecdn.com/00000/20/52/2502_m.jpg" />
				    
    [mood] => pinkfloyd
)

test: http://backup.aseaofflames.com/myspace.php

hope this can get you started.

monkeytooth · October 29, 2008

That is deffinately a good start.. curiosity has me though #msi is seen through out, is that something myspace specific?

also is this reading the meta tag somehow or the actual page and breaking the page down? lastly is there anyway i can get a raw like output so I can work on getting other information if possible

aseaofflames · October 30, 2008

the #msi is some sort of preg_match option. Someone else could probably explain it better. The first time i used preg_match the example use it and it doesn't work without it, so I use it.

The code reads information from the actual page. It reads the page, then gets the html of the info table (the one with the picture) then it splits it up and pushes it into an array. To get more infomation from the page you would have to find a unique element in the html and then use a preg_match to retrieve it. It's really guess and check.

for example to get number of comments:

$i = preg_match('#<span class="redtext">(.*?)</span>comments#msi',$page,$comments);

then $comments[1] would contain the number of comments on the page.

this works because the html near the number of comments is:

<b>Displaying<span class="redtext"> 50 </span>of<span class="redtext"> 792602 </span>comments

Sign In

curl? scrape? how what?

Recommended Posts

monkeytooth

Link to comment

Share on other sites

DarkWater

Link to comment

Share on other sites

monkeytooth

Link to comment

Share on other sites

aseaofflames

Link to comment

Share on other sites

monkeytooth

Link to comment

Share on other sites

aseaofflames

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information