mangy1983 Posted September 17, 2011 Share Posted September 17, 2011 Hi all I am developing a website for a fishing organisation which was to include a weather widget. The only weather widget which blended into the colours of the website l am creating was the second transparent widget on this web page http://www.weatherforecastmap.com/getwidget.phtml/ . It has been fine for the last month apart from it is not as accurate as a local website. On top of this members are complaining that wind speeds are measured in m/s instead of mph which is the UK measurement. I have been scouring the net for a screen scraping code which will scrape the information I want from the local website and found the code supplied on this webpage: http://www.bradino.com/php/screen-scraping/ This looks to do what l want as the website l want displays the weather in a table displaying the weather at 3 hour intervals per row. I would only like to extract the information from the first row but the code in the link does not work for me. If anyone could help me with this it would be great! Below is the code l have as l wanted to get the example working before l customised it to my own use. thanks for any replies Callum <?php $url = "http://www.nfl.com/teams/sandiegochargers/roster?team=SD"; $raw = file_get_contents($url); $newlines = array("\t","\n","\r","\x20\x20","\0","\x0B"); $content = str_replace($newlines, "", html_entity_decode($raw)); $start = strpos($content,' $end = strpos($content,' ',$start) + 8; $table = substr($content,$start,$end-$start); preg_match_all("| |U",$table,$rows); foreach ($rows[0] as $row){ if ((strpos($row,' preg_match_all("| |U",$row,$cells); $number = strip_tags($cells[0][0]); $name = strip_tags($cells[0][1]); $position = strip_tags($cells[0][2]); echo "{$position} – {$name} – Number {$number} \n"; } } ?> Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/ Share on other sites More sharing options...
JKG Posted September 17, 2011 Share Posted September 17, 2011 thats not a very good tut. he has a few errors in the code initially. use curl and go from there. Try this: http://devtrench.com/posts/screen-scrape-with-php-curl Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270250 Share on other sites More sharing options...
jcbones Posted September 17, 2011 Share Posted September 17, 2011 I would suggest using the DOMDocument Object, I think this will be the easiest method for you. Being that the weather doesn't change, I would also write the wanted contents to a file, and only update it once per day. $doc = new DOMDocument(); $doc->loadHTMLFile('http://server.com/some/file"); $widget = $doc->getElementsById('widgetId'); foreach($widget as $element) { echo $element->nodeValue; } Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270255 Share on other sites More sharing options...
mangy1983 Posted September 17, 2011 Author Share Posted September 17, 2011 I would suggest using the DOMDocument Object, I think this will be the easiest method for you. Being that the weather doesn't change, I would also write the wanted contents to a file, and only update it once per day. $doc = new DOMDocument(); $doc->loadHTMLFile('http://server.com/some/file"); $widget = $doc->getElementsById('widgetId'); foreach($widget as $element) { echo $element->nodeValue; } I have done it this way before when trying to get a single element which l did not have too much of a problem with. The problem with the code l need to grab is that it is inside a nameless table, and that the td's all have the same class name. There are also several instances of these class names in different rows as the weather is displayed for every three hours on a separate row whereas l only need a few of the tds from the first row. For instance l would like to have a screen grab of the first row in the weather table at this weblink http://www.metoffice.gov.uk/weather/uk/he/stornoway_forecast_weather.html Hope this makes sense. thanks Callum Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270257 Share on other sites More sharing options...
mangy1983 Posted September 17, 2011 Author Share Posted September 17, 2011 thats not a very good tut. he has a few errors in the code initially. use curl and go from there. Try this: http://devtrench.com/posts/screen-scrape-with-php-curl Thanks for your reply. Unfortunately I don't know how to use curl l am afraid and it doesn't make much sense to me at the moment. I use codes from examples at the moment and once l get them working I learn what each line does so as to understand what everything does. The code from the example I gave in my first post l can understand bits of it and like the way that you can turn the tds elements information into separate variables to use as you wish later. thanks again Callum Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270259 Share on other sites More sharing options...
jcbones Posted September 17, 2011 Share Posted September 17, 2011 Try this: <?php echo '<style type="text/css"> table { border-collapse: collapse; } table, th, td { border: 1px solid black; padding: 2px; } </style>'; $doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.metoffice.gov.uk/weather/uk/he/stornoway_forecast_weather.html'); //load the file; $desired_rows = 1; //How many rows you want from the table. $table = $doc->getElementsByTagName('table'); //get our tables out, it should return 2 from the file, we only want the second. $rows = $table->item(1)->getElementsByTagName('tr'); //pull the table rows from the second table (notice we select the second by item(1).) $count = $rows->length; //returns a count of the table rows. echo '<table id="weather"><tr> <th rowspan="2">Date</th> <th rowspan="2">Time</th> <th rowspan="2">Weather</th> <th rowspan="2">Temp</th> <th colspan="3">Wind</th> <th rowspan="2">Visibility</th> </tr> <tr> <th>Dir</th> <th>Speed</th> <th>Gust</th> </tr>'; //mock up of the original table headers. for($i=2,$start=$i;$i<($start + $desired_rows);$i++) { //for loop, goes through the rows. echo '<tr>'; //start row. $columns = $rows->item($i)->getElementsByTagName('td'); //get columns for this row. $columnCount = $columns->length; for($n=0;$n<$columnCount;$n++) { //go through the columns. if($n == 2) { $img = $columns->item($n)->getElementsByTagName('img'); //the 3rd column is an image, so we must get the image title. $value = $img->item(0)->getAttribute('title'); } else { $value = $columns->item($n)->nodeValue; //else we will just take what is in the column. } echo '<td>' . $value . '</td>'; //push the column to the screen. } echo '</tr>'; //end the row. } echo '</table>'; //end the table. ?> Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270265 Share on other sites More sharing options...
mangy1983 Posted September 17, 2011 Author Share Posted September 17, 2011 Try this: <?php echo '<style type="text/css"> table { border-collapse: collapse; } table, th, td { border: 1px solid black; padding: 2px; } </style>'; $doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.metoffice.gov.uk/weather/uk/he/stornoway_forecast_weather.html'); //load the file; $desired_rows = 1; //How many rows you want from the table. $table = $doc->getElementsByTagName('table'); //get our tables out, it should return 2 from the file, we only want the second. $rows = $table->item(1)->getElementsByTagName('tr'); //pull the table rows from the second table (notice we select the second by item(1).) $count = $rows->length; //returns a count of the table rows. echo '<table id="weather"><tr> <th rowspan="2">Date</th> <th rowspan="2">Time</th> <th rowspan="2">Weather</th> <th rowspan="2">Temp</th> <th colspan="3">Wind</th> <th rowspan="2">Visibility</th> </tr> <tr> <th>Dir</th> <th>Speed</th> <th>Gust</th> </tr>'; //mock up of the original table headers. for($i=2,$start=$i;$i<($start + $desired_rows);$i++) { //for loop, goes through the rows. echo '<tr>'; //start row. $columns = $rows->item($i)->getElementsByTagName('td'); //get columns for this row. $columnCount = $columns->length; for($n=0;$n<$columnCount;$n++) { //go through the columns. if($n == 2) { $img = $columns->item($n)->getElementsByTagName('img'); //the 3rd column is an image, so we must get the image title. $value = $img->item(0)->getAttribute('title'); } else { $value = $columns->item($n)->nodeValue; //else we will just take what is in the column. } echo '<td>' . $value . '</td>'; //push the column to the screen. } echo '</tr>'; //end the row. } echo '</table>'; //end the table. ?> thank you soo much for the work you did on this it is well appreciated! As my previous post l was wondering if it is possible to have each td tags element saved as a variable in order to save them to a database. I would then run this script every 3 hours using a cron job to update the database table. If yourself or one of the other great members on here can be of help l would be immensely grateful. Once the td elements have been turned into variables l will be on familiar territory to save the information to the database. thanks again Callum Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270266 Share on other sites More sharing options...
mangy1983 Posted September 17, 2011 Author Share Posted September 17, 2011 I managed to add incremental variables to the values supplied in the code supplied by jcbones (thanks again) and echo them separately outside of the loop in order to insert them into my database so here is the complete code with my own additions. Theoretically this topic is solved unless anyone thinks of a more efficient way of producing my inserted code shown below thanks again guys Callum <?php $doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.metoffice.gov.uk/weather/uk/he/stornoway_forecast_weather.html'); //load the file; $desired_rows = 1; //How many rows you want from the table. $table = $doc->getElementsByTagName('table'); //get our tables out, it should return 2 from the file, we only want the second. $rows = $table->item(1)->getElementsByTagName('tr'); //pull the table rows from the second table (notice we select the second by item(1).) $count = $rows->length; //returns a count of the table rows. for($i=2,$start=$i;$i<($start + $desired_rows);$i++) { //for loop, goes through the rows. $columns = $rows->item($i)->getElementsByTagName('td'); //get columns for this row. $columnCount = $columns->length; for($n=0;$n<$columnCount;$n++) { //go through the columns. if($n == 2) { $img = $columns->item($n)->getElementsByTagName('img'); //the 3rd column is an image, so we must get the image title. $value = $img->item(0)->getAttribute('title'); } else { $value = $columns->item($n)->nodeValue; //else we will just take what is in the column. } ${a.$n} = $value; } } $patterns[0] = '/[^0-9]/'; $replacements[0] = ''; ksort($patterns); ksort($replacements); $a3 = preg_replace($patterns, $replacements, $a3); $a5 = preg_replace($patterns, $replacements, $a5); $a6 = preg_replace($patterns, $replacements, $a6); echo $a0, '</br>', $a1, '</br>', $a2, '</br>', $a3, '</br>', $a4, '</br>', $a5, '</br>', $a6, '</br>', $a7, '</br>', $a8; ?> Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270274 Share on other sites More sharing options...
jcbones Posted September 18, 2011 Share Posted September 18, 2011 I had you a code that put the information in a SQL query. The forums wouldn't work for me yesterday, but here it is anyway. This will give you a well formulated MySQL query. I just pushed every column into a separate database column. <?php $doc = new DOMDocument(); @$doc->loadHTMLFile('weather.html'); $desired_rows = 100; //header and 1st row of data. $table = $doc->getElementsByTagName('table'); $rows = $table->item(1)->getElementsByTagName('tr'); $count = $rows->length; $sql = 'INSERT INTO weather (day,`time`,description,tempature,windDir,windSpeed,windGust,visibility) VALUES '; for($i=2,$start=$i;$i<($start + $desired_rows) && $i < ($count - 1);$i++) { $values = array(); $columns = $rows->item($i)->getElementsByTagName('td'); $columnCount = $columns->length; if($columnCount == { $retainDate = true; } for($n=0;$n<$columnCount;$n++) { $value = $columns->item($n)->nodeValue;//go through the columns. $img = $columns->item($n)->getElementsByTagName('img'); for($ii = 0; $ii < $img->length; $ii++) { $value = $img->item($ii)->getAttribute('title'); } if($retainDate == true && $n == 0) { $date = $value; } elseif($n == 0) { $value = $date . '\',\'' . $value; } $values[] = $value; } $queryValueArray[] = implode('\',\'',$values); $retainDate = false; } $sql .= '(\'' . implode("'),\n('",$queryValueArray) . '\')'; echo "<pre>$sql</pre>"; ?> Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270471 Share on other sites More sharing options...
mangy1983 Posted September 19, 2011 Author Share Posted September 19, 2011 I had you a code that put the information in a SQL query. The forums wouldn't work for me yesterday, but here it is anyway. This will give you a well formulated MySQL query. I just pushed every column into a separate database column. <?php $doc = new DOMDocument(); @$doc->loadHTMLFile('weather.html'); $desired_rows = 100; //header and 1st row of data. $table = $doc->getElementsByTagName('table'); $rows = $table->item(1)->getElementsByTagName('tr'); $count = $rows->length; $sql = 'INSERT INTO weather (day,`time`,description,tempature,windDir,windSpeed,windGust,visibility) VALUES '; for($i=2,$start=$i;$i<($start + $desired_rows) && $i < ($count - 1);$i++) { $values = array(); $columns = $rows->item($i)->getElementsByTagName('td'); $columnCount = $columns->length; if($columnCount == { $retainDate = true; } for($n=0;$n<$columnCount;$n++) { $value = $columns->item($n)->nodeValue;//go through the columns. $img = $columns->item($n)->getElementsByTagName('img'); for($ii = 0; $ii < $img->length; $ii++) { $value = $img->item($ii)->getAttribute('title'); } if($retainDate == true && $n == 0) { $date = $value; } elseif($n == 0) { $value = $date . '\',\'' . $value; } $values[] = $value; } $queryValueArray[] = implode('\',\'',$values); $retainDate = false; } $sql .= '(\'' . implode("'),\n('",$queryValueArray) . '\')'; echo "<pre>$sql</pre>"; ?> Thank you so much for the code jcbones. It helped me immensely and hopefully others looking for something similar too thanks again Callum Quote Link to comment https://forums.phpfreaks.com/topic/247332-screen-scraping/#findComment-1270618 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.