Jump to content

Recommended Posts

i need some help to scrape a link from specified page.

for example if i have a page like this http://br.4ce.info/

 

i want to scrape all link on that page

and i want to show all link in that page on my wordpress widget in another blog ?

 

can you help me with this ?

dont use iframe

i think better using cURL

thanks

Link to comment
https://forums.phpfreaks.com/topic/253369-link-scraping/
Share on other sites

Here is how I use to crawl websites and extract the links, I think you can use this:

 

<?php
$input = @file_get_contents('http://www.icpep.org');
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {

		foreach($matches as $match) {
		    $urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";

			if (eregi($urlregex, $match[2])) {

				 echo trim($match[2])."<br />";

			}
		}
	}
?>

Link to comment
https://forums.phpfreaks.com/topic/253369-link-scraping/#findComment-1298792
Share on other sites

The above code will only fetch the link itself and not the title of the link..or if was an image.

Plus would not handle any self links.

 

If your goal is to just display exactly what is on that page but not using an iframe.

 

<?php
$input = @file_get_contents('http://br.4ce.info/');
if(!$input){
echo "No Recommended Sites";
} else {
echo $input;
}
?>

 

This will not work for all pages, but for your example I believe is the easiest route.

 

I do have piles of code for getting links in many different ways, fixing relative links, parsing images/links/data.

 

Using DOM or something like simplehtmldom would be good ways.

Link to comment
https://forums.phpfreaks.com/topic/253369-link-scraping/#findComment-1298884
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.