daveh33 Posted December 7, 2007 Share Posted December 7, 2007 I am looking at creating a new project. The program is going to be dependant on extracting data from an external website of which I have no control. The website with the information I want to recover is http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece I want to get the time of the race, and the tipsters horse. First of all - is this possible? If so, do I use file_get_contents? Any opinions are appreciated Quote Link to comment Share on other sites More sharing options...
cunoodle2 Posted December 7, 2007 Share Posted December 7, 2007 Yes, very possible to do. You basically have to use something to read through the html code and then from there extract the data. Here is some code I wrote before that should give you a quick example... //set the url $url = "http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece"; //use curl to connect to the webmail page and see if that is the correct username and password $c = curl_init(); curl_setopt($c, CURLOPT_URL,$url); curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); curl_setopt($c, CURLOPT_TIMEOUT, 3); $data = curl_exec($c); curl_close($c); //to check the log in information this just basically reads in the html code and looks for the response of "cPanel Mail Management" in the title //using the incorrect password will yield a different title $temp = "<title>"; $response_start = strPos($data, "<title>"); $response_end = strPos($data, "</title>", $response_start); $temp_code = substr($data, ($response_start + 7), ($response_end - $response_start - 7)); So basically what I did was look for the FIRST time the word "<title>" appeared in the html code for the page I was searching, used that as a reference point and then looked for the first "</title>" AFTER the original "<title>". From those 2 I made a substring and assigned it to the variable name of "$temp_code" If you ran the above script on the example page you gave it would yield something like... "Today's Templegate racing tips | The Sun |HomePage|Sport|Other Sports|Templegate Racing Tips" I like to use curl as apposed to just getting a file as you can modify it accordingly to send data (like to fill out forms and stuff) if need be. Basically just get the file first and then do a lot of work with "strPos()" and "substr()" Quote Link to comment Share on other sites More sharing options...
daveh33 Posted December 7, 2007 Author Share Posted December 7, 2007 Great ! I have used that code to be able to create a variable $temp_code wich when printed shows the data I want to capture. But I need to break it down further. How can I edit a variable & target more specific information from this page full of data? Quote Link to comment Share on other sites More sharing options...
cunoodle2 Posted December 7, 2007 Share Posted December 7, 2007 As I mentioned in the post.. Basically just get the file first and then do a lot of work with "strPos()" and "substr()" You will basically have to search the string for certain markings/html/key words etc and then go from there. If you post a specific example I could show you how to do that. If not just do lots of reading on the php string functions.. Go here.. http://us2.php.net/strings Then go to the "table of contents" heading to see the so many functions you can use to break up your string/data/ Quote Link to comment Share on other sites More sharing options...
daveh33 Posted December 7, 2007 Author Share Posted December 7, 2007 OK - never used that before in my php so will have to look into it. For an example for me to go off -on that example page, where you extracted data before -can you show me how you would assign th variable $location to for the name on the page - 1st one is Chepstow Quote Link to comment Share on other sites More sharing options...
cunoodle2 Posted December 7, 2007 Share Posted December 7, 2007 You have to look for something unique in the code it self to help narrow down all the text you are working with. For example if you view the source you can see that the very 1st occurance of the word "red18" is directly before the word that you are attempting to extract (in this case 'Chepstow') So basically I'm gonna do this... //set the url $url = "http://www.thesun.co.uk/sol/homepage/sport/other_sports/templegate/article196230.ece"; //use curl to connect to page and store all information in the "$data" variable $c = curl_init(); curl_setopt($c, CURLOPT_URL,$url); curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); curl_setopt($c, CURLOPT_TIMEOUT, 3); $data = curl_exec($c); curl_close($c); //break down the information $temp = "red18"; $response_start = strPos($data, $temp); $response_end = strPos($data, "</span>", $response_start); $temp_code = substr($data, ($response_start + 7), ($response_end - $response_start - 7)); //display the information on the page echo "The word you are looking for is ".$temp_code; Try that out and let me know if that works for you. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.