Jump to content

reccover data from external webiste


daveh33

Recommended Posts

I am looking at creating a new project. The program is going to be dependant on extracting data from an external website of which I have no control.

 

The website with the information I want to recover is http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece

 

I want to get the time of the race, and the tipsters horse.

 

First of all - is this possible? If so, do I use file_get_contents?

 

Any opinions are appreciated

Link to comment
https://forums.phpfreaks.com/topic/80678-reccover-data-from-external-webiste/
Share on other sites

Yes, very possible to do.  You basically have to use something to read through the html code and then from there extract the data.  Here is some code I wrote before that should give you a quick example...

 

//set the url
$url = "http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece";

//use curl to connect to the webmail page and see if that is the correct username and password
$c = curl_init();
curl_setopt($c, CURLOPT_URL,$url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_TIMEOUT, 3);
$data = curl_exec($c);

curl_close($c);

//to check the log in information this just basically reads in the html code and looks for the response of "cPanel Mail Management" in the title
//using the incorrect password will yield a different title
$temp = "<title>";
$response_start = strPos($data, "<title>");
$response_end = strPos($data, "</title>", $response_start);
$temp_code = substr($data, ($response_start + 7), ($response_end - $response_start - 7));

 

So basically what I did was look for the FIRST time the word "<title>" appeared in the html code for the page I was searching, used that as a reference point and then looked for the first "</title>" AFTER the original "<title>".  From those 2 I made a substring and assigned it to the variable name of "$temp_code"

 

If you ran the above script on the example page you gave it would yield something like...

 

"Today&#39;s Templegate racing tips | The Sun |HomePage|Sport|Other Sports|Templegate Racing Tips"

 

I like to use curl as apposed to just getting a file as you can modify it accordingly to send data (like to fill out forms and stuff) if need be.

 

Basically just get the file first and then do a lot of work with "strPos()" and "substr()"

As I mentioned in the post..

 

Basically just get the file first and then do a lot of work with "strPos()" and "substr()"

 

You will basically have to search the string for certain markings/html/key words etc and then go from there.  If you post a specific example I could show you how to do that.

 

If not just do lots of reading on the php string functions..

 

Go here..

http://us2.php.net/strings

 

Then go to the "table of contents" heading to see the so many functions you can use to break up your string/data/

OK - never used that before in my php so will have to look into it.

 

For an example for me to go off -on that example page, where you extracted data before -can you show me how you would assign th variable $location to for the name on the page - 1st one is Chepstow

You have to look for something unique in the code it self to help narrow down all the text you are working with.  For example if you view the source you can see that the very 1st occurance of the word "red18" is directly before the word that you are attempting to extract (in this case 'Chepstow')

 

So basically I'm gonna do this...

//set the url
$url = "http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece";

//use curl to connect to page and store all information in the "$data" variable
$c = curl_init();
curl_setopt($c, CURLOPT_URL,$url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_TIMEOUT, 3);
$data = curl_exec($c);

curl_close($c);

//break down the information
$temp = "red18";
$response_start = strPos($data, $temp);
$response_end = strPos($data, "</span>", $response_start);
$temp_code = substr($data, ($response_start + 7), ($response_end - $response_start - 7));

//display the information on the page
echo "The word you are looking for is ".$temp_code;

 

Try that out and let me know if that works for you.

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.