Jump to content

curl? scrape? how what?


monkeytooth

Recommended Posts

Ok, I know this is possible its been done by others.. I know there is no simple answer to this (which if there is lay it on me). So is it curl? That I would use to get information off a myspace profile, or facebook profile? I am mostly aiming at myspace at the moment. Im not looking to log in or change anything at the moment.. just mostly get things off any given profile and display it on another site. What is it I am looking for is there any key word descirptions I can google for a tutorial or something to get me started on this or is there anything anyone here can do to help me get started on what I need to do for this?

Link to comment
https://forums.phpfreaks.com/topic/130501-curl-scrape-how-what/
Share on other sites

Well for my purposes at the moment I am assuming public based profiles.. you know of any good tutorials on how to use regex and cURL outside of php.net don't even care if it deals with myspace profile (though that would be nice) but something a bit on the advanced side of learning either or both.

Link to comment
https://forums.phpfreaks.com/topic/130501-curl-scrape-how-what/#findComment-677015
Share on other sites

a quick function (will work until myspace changes the profile):

 

function get_myspace_info($url) {

//get the contents of the page
$page = file_get_contents($url);
//get html info info table
$e = preg_match('#<table id="Table2"(.*?)</table>#msi',$page,$i);
$f = preg_match('#</td></td>(.*?)</td>#msi',$i[1],$in);
//get profile picture
$g = preg_match('#<img border="0"(.*?)</a>#msi',$i[1],$pic);
//get mood
$h = preg_match('#<span class="searchMonkey-mood">(.*?)</span>#msi',$i[1],$cm);

$mood = $cm[1];
$picture = str_replace("</a>","",$pic[0]);

$info = explode("<br />",strip_tags(trim($in[1]),"<br><img>"));

$last_login = str_replace("Last Login: ","",$info[9]);

//build info array

$myspace = array("url" => $url, "slogan" => $info[0], "sex" => $info[2], "age" => $info[3], "location" => $info[4], "country" => $info[5], "last_login" => $last_login, "picture" => $picture, "mood" => $mood);

return $myspace;
}

 

call example:

 

$myspace_info = get_myspace_info("http://www.myspace.com/tom");

 

$myspace_info will contain an array similar to this:

 

Array
(
    [url] => http://www.myspace.com/tom
    [slogan] => ":-)"
    [sex] => Male
	                
    [age] => 33 years old
	                
    [location] => Santa Monica, CALIFORNIA
	                
    [country] => United States
	                
    [last_login] => 10/28/2008
	                
    [picture] => <img border="0" alt="" src="http://b2.ac-images.myspacecdn.com/00000/20/52/2502_m.jpg" />
				    
    [mood] => pinkfloyd
)

 

test: http://backup.aseaofflames.com/myspace.php

 

hope this can get you started.

Link to comment
https://forums.phpfreaks.com/topic/130501-curl-scrape-how-what/#findComment-677044
Share on other sites

That is deffinately a good start.. curiosity has me though #msi is seen through out, is that something myspace specific?

 

also is this reading the meta tag somehow or the actual page and breaking the page down? lastly is there anyway i can get a raw like output so I can work on getting other information if possible

 

Link to comment
https://forums.phpfreaks.com/topic/130501-curl-scrape-how-what/#findComment-677220
Share on other sites

the #msi is some sort of preg_match option. Someone else could probably explain it better. The first time i used preg_match the example use it and it doesn't work without it, so I use it.

 

The code reads information from the actual page. It reads the page, then gets the html of the info table (the one with the picture) then it splits it up and pushes it into an array. To get more infomation from the page you would have to find a unique element in the html and then use a preg_match to retrieve it. It's really guess and check.

 

for example to get number of comments:

 

$i = preg_match('#<span class="redtext">(.*?)</span>comments#msi',$page,$comments);

 

then $comments[1] would contain the number of comments on the page.

 

this works because the html near the number of comments is:

 

<b>Displaying<span class="redtext"> 50 </span>of<span class="redtext"> 792602 </span>comments 

 

 

 

 

Link to comment
https://forums.phpfreaks.com/topic/130501-curl-scrape-how-what/#findComment-678150
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.