Jump to content

reccover data from external webiste


daveh33

Recommended Posts

I am looking at creating a new project. The program is going to be dependant on extracting data from an external website of which I have no control.

 

The website with the information I want to recover is http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece

 

I want to get the time of the race, and the tipsters horse.

 

First of all - is this possible? If so, do I use file_get_contents?

 

Any opinions are appreciated

Link to comment
Share on other sites

Yes, very possible to do.  You basically have to use something to read through the html code and then from there extract the data.  Here is some code I wrote before that should give you a quick example...

 

//set the url
$url = "http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece";

//use curl to connect to the webmail page and see if that is the correct username and password
$c = curl_init();
curl_setopt($c, CURLOPT_URL,$url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_TIMEOUT, 3);
$data = curl_exec($c);

curl_close($c);

//to check the log in information this just basically reads in the html code and looks for the response of "cPanel Mail Management" in the title
//using the incorrect password will yield a different title
$temp = "<title>";
$response_start = strPos($data, "<title>");
$response_end = strPos($data, "</title>", $response_start);
$temp_code = substr($data, ($response_start + 7), ($response_end - $response_start - 7));

 

So basically what I did was look for the FIRST time the word "<title>" appeared in the html code for the page I was searching, used that as a reference point and then looked for the first "</title>" AFTER the original "<title>".  From those 2 I made a substring and assigned it to the variable name of "$temp_code"

 

If you ran the above script on the example page you gave it would yield something like...

 

"Today&#39;s Templegate racing tips | The Sun |HomePage|Sport|Other Sports|Templegate Racing Tips"

 

I like to use curl as apposed to just getting a file as you can modify it accordingly to send data (like to fill out forms and stuff) if need be.

 

Basically just get the file first and then do a lot of work with "strPos()" and "substr()"

Link to comment
Share on other sites

As I mentioned in the post..

 

Basically just get the file first and then do a lot of work with "strPos()" and "substr()"

 

You will basically have to search the string for certain markings/html/key words etc and then go from there.  If you post a specific example I could show you how to do that.

 

If not just do lots of reading on the php string functions..

 

Go here..

http://us2.php.net/strings

 

Then go to the "table of contents" heading to see the so many functions you can use to break up your string/data/

Link to comment
Share on other sites

OK - never used that before in my php so will have to look into it.

 

For an example for me to go off -on that example page, where you extracted data before -can you show me how you would assign th variable $location to for the name on the page - 1st one is Chepstow

Link to comment
Share on other sites

You have to look for something unique in the code it self to help narrow down all the text you are working with.  For example if you view the source you can see that the very 1st occurance of the word "red18" is directly before the word that you are attempting to extract (in this case 'Chepstow')

 

So basically I'm gonna do this...

//set the url
$url = "http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece";

//use curl to connect to page and store all information in the "$data" variable
$c = curl_init();
curl_setopt($c, CURLOPT_URL,$url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_TIMEOUT, 3);
$data = curl_exec($c);

curl_close($c);

//break down the information
$temp = "red18";
$response_start = strPos($data, $temp);
$response_end = strPos($data, "</span>", $response_start);
$temp_code = substr($data, ($response_start + 7), ($response_end - $response_start - 7));

//display the information on the page
echo "The word you are looking for is ".$temp_code;

 

Try that out and let me know if that works for you.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.