jeffrydell Posted August 2, 2008 Share Posted August 2, 2008 The goal is to produce a script which will monitor specified URL's and display a list of links to updated pages. I've been trying to use file() and in_array(), but when I search for specified text, in_array() always returns FALSE. Same thing if I use file_get_contents() and strstr(). This is getting a bit silly as it should be fairly straight forward to search for a specified string within a variable or an array ... but it isn't working. Any thoughts on how I might check web pages (some are dynamic) to see if they have been updated? Thanks in advance for any help you can come up with! Jeff Link to comment https://forums.phpfreaks.com/topic/117814-reading-text-from-another-website/ Share on other sites More sharing options...
psunshine Posted August 2, 2008 Share Posted August 2, 2008 Hi, Im pretty crap at php (more of a good googler and copy and paste guy) but this is something which I have used for scraping websites: $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; $header[] = "Content-Type:text/html; charset=UTF-8"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://www.example.com'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); curl_setopt($ch, CURLOPT_HTTPHEADER, $header); curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $output = curl_exec($ch); curl_close($ch); I then use regular expressions to filter out the crap and finally insert specifics into mysql tables. You could probably do similar but have two different columns (one for existing page content and one for most recently checked)which you could use in a compare function. This is probably a bad way of doing it and im sure some peeps on here can definitely give a proper solution Link to comment https://forums.phpfreaks.com/topic/117814-reading-text-from-another-website/#findComment-605993 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.