Jump to content

[SOLVED] screen scraping


acidglitter

Recommended Posts

Are you familiar with regular expressions?  They will make this task much easier.

 

  // UNTESTED
  // Assumes $content has the content you wish to scrape
  $regexp_a = '/<a[^>].*<\/a>/';
  // $regexp_a = '/([<]a[^>]*[>](.*)[<][/]a[>])/'; // try this one if above fails
  preg_match($regexp_a, $content, $matches);
  echo '<pre style="text-align: left;">' . print_r($matches, true) . '</pre>';

Thanks that helped a little. :)

 

I'm not very good at regular expressions but I'm trying to get like the default image off of my myspace page. This is what I have so far...

 

$data = file_get_contents('http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=000');
$regexp_a = '/<a.*[^id="ctl00_Main_ctl00_UserBasicInformation1_hlDefaultImage"$].*[^><img].* \/><\/a>/';
preg_match($regexp_a, $data, $matches);
echo $matches[0];

 

but instead of getting the default picture it gets the send message link

 

<a href="http://messaging.myspace.com/index.cfm?fuseaction=mail.message&friendID=000" id="ctl00_Main_ctl00_UserContactLinks1_MailLink"><img src="http://x.myspace.com/images/profile/mail_1.gif" border="0" align="middle" /></a>

 

??? :(

Okay this is the entire code from my page..

 

<a type="text/javascript" id="ctl00_Main_ctl00_UserBasicInformation1_hlDefaultImage" href="http://viewmorepics.myspace.com/index.cfm?fuseaction=user.viewAlbums&friendID=0"><img border="0" alt="" src="http://a963.ac-images.myspacecdn.com/images01/64/m_dfd895b94371623d5059281421c137da.jpg" /></a>

 

and I want to be able to pull this out to show just my default picture on my site. Everytime I change my picture the address in the above code will change too..

 

http://a963.ac-images.myspacecdn.com/images01/64/m_dfd895b94371623d5059281421c137da.jpg

I finally got it to work :D

I looked up more codes and then changed a couple things and now have this..

 

<?php
$ch = curl_init() or die(curl_error());
curl_setopt($ch, CURLOPT_URL,"http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data1=curl_exec($ch) or die(curl_error());

$okay="/<a type.*>/";

if(preg_match($okay, $data1, $matches)){
echo $matches[0];
}

echo curl_error($ch);
curl_close($ch);
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.