xsuck91 Posted December 17, 2011 Share Posted December 17, 2011 i need some help to scrape a link from specified page. for example if i have a page like this http://br.4ce.info/ i want to scrape all link on that page and i want to show all link in that page on my wordpress widget in another blog ? can you help me with this ? dont use iframe i think better using cURL thanks Link to comment https://forums.phpfreaks.com/topic/253369-link-scraping/ Share on other sites More sharing options...
paparts Posted December 17, 2011 Share Posted December 17, 2011 Here is how I use to crawl websites and extract the links, I think you can use this: <?php $input = @file_get_contents('http://www.icpep.org'); $regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { $urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$"; if (eregi($urlregex, $match[2])) { echo trim($match[2])."<br />"; } } } ?> Link to comment https://forums.phpfreaks.com/topic/253369-link-scraping/#findComment-1298792 Share on other sites More sharing options...
QuickOldCar Posted December 17, 2011 Share Posted December 17, 2011 The above code will only fetch the link itself and not the title of the link..or if was an image. Plus would not handle any self links. If your goal is to just display exactly what is on that page but not using an iframe. <?php $input = @file_get_contents('http://br.4ce.info/'); if(!$input){ echo "No Recommended Sites"; } else { echo $input; } ?> This will not work for all pages, but for your example I believe is the easiest route. I do have piles of code for getting links in many different ways, fixing relative links, parsing images/links/data. Using DOM or something like simplehtmldom would be good ways. Link to comment https://forums.phpfreaks.com/topic/253369-link-scraping/#findComment-1298884 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.