Jump to content


Photo

URL/Link Extracting


  • Please log in to reply
5 replies to this topic

#1 matfish

matfish
  • Members
  • PipPipPip
  • Advanced Member
  • 242 posts
  • LocationUK

Posted 20 October 2006 - 09:42 PM

Hi, anyone point me in the right direction of extracting a list of URLs/Links of an external page (such as extracting a page of links from google which are related to a keyword)? I can then manipulate this data into my database?

Just want to extract links from a page which I could specify and maybe put the urls into an array so I could then play with?

Many thanks for any help.



#2 matfish

matfish
  • Members
  • PipPipPip
  • Advanced Member
  • 242 posts
  • LocationUK

Posted 24 October 2006 - 07:38 AM

Ok, lets start again.

Anyone know how to extract URLs from a specific site in php?

Thanks

#3 heckenschutze

heckenschutze
  • Members
  • PipPipPip
  • Advanced Member
  • 257 posts
  • LocationAustralia

Posted 24 October 2006 - 08:03 AM

With regular expressions,

Heres my crappy attempt at regex :D

<?php

function GetLinks($url)
{
	$aOut = array();
	preg_match_all("/http:\/\/?[^ ][^\"][^'][^<][^>]+/i", file_get_contents($url), $aOut, PREG_PATTERN_ORDER);

	print_r($aOut);

}

echo "<pre>";
GetLinks("http://google.com.au");
echo "</pre>";

?>

Hey, its a start ;)

#4 matfish

matfish
  • Members
  • PipPipPip
  • Advanced Member
  • 242 posts
  • LocationUK

Posted 24 October 2006 - 09:04 AM

Thats brilliant thank you, It contains all of the ahref tag but from that I can pick out the URLs which is what I need.

Many thanks!!!!

#5 matfish

matfish
  • Members
  • PipPipPip
  • Advanced Member
  • 242 posts
  • LocationUK

Posted 24 October 2006 - 02:51 PM

Hey dude, Im having a bit of trouble reading the array, for example: picking out a random array - say number 4?

Just returns "Array"

#6 True`Logic

True`Logic
  • Members
  • PipPipPip
  • Advanced Member
  • 59 posts

Posted 24 October 2006 - 03:21 PM

random:

$num = rand(0, count(ARRAY));
echo ARRAY[$num];

entire:

$num2 = count(ARRAY);
$num3 = 0;

while($num3 <= $num2) {
echo ARRAY[$num3] . "< br >
";
$num3++;
}


hope this helped =)




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users