jim.dandy Posted April 6, 2009 Share Posted April 6, 2009 What is the best meathod to scrape webpages? Link to comment https://forums.phpfreaks.com/topic/152731-simple-page-scraping-with-php/ Share on other sites More sharing options...
unrelenting Posted April 6, 2009 Share Posted April 6, 2009 I don't know about 'best' but you can do it with file_get_contents or cURL. Link to comment https://forums.phpfreaks.com/topic/152731-simple-page-scraping-with-php/#findComment-802048 Share on other sites More sharing options...
Maq Posted April 6, 2009 Share Posted April 6, 2009 What is the best meathod to scrape webpages? There are a lot of posts in the PHP section about this. It also depends if you need to post variables, login, etc... We need more information. Link to comment https://forums.phpfreaks.com/topic/152731-simple-page-scraping-with-php/#findComment-802056 Share on other sites More sharing options...
jim.dandy Posted April 6, 2009 Author Share Posted April 6, 2009 <?php function get_string_between($string, $start, $end){ $string = " ".$string; $ini = strpos($string,$start); if ($ini == 0) return ""; $ini += strlen($start); $len = strpos($string,$end,$ini) - $ini; return substr($string,$ini,$len); } $data = file_get_contents('http://www.thepiratebay.org/top/201'); $data2 = explode("<td>",$data); echo count($data2); for($x=0;$x<count($data2);$x++){ echo get_string_between($data2[$x], "Y-day", ".torrent"); } ?> I found a function someone made that grabs a string between two strings. It kinda works but doesn't grab everything I want. I dont understand why it won't work. Link to comment https://forums.phpfreaks.com/topic/152731-simple-page-scraping-with-php/#findComment-802111 Share on other sites More sharing options...
Daniel0 Posted April 6, 2009 Share Posted April 6, 2009 Use DOM. Link to comment https://forums.phpfreaks.com/topic/152731-simple-page-scraping-with-php/#findComment-802115 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.