Jump to content

[SOLVED] Stripping URLs from a page


NinaDizz

Recommended Posts

TBH its pretty woeful so far.

 

What I've written (despite being horrible) is returning the first URL in the source.html file, but then isn't outputting any more, even though I can see that it's found more.

 

<?php
$source_file = file_get_contents ('source.html');
$each_letter = str_split ($source_file);
for($i = 0; $i <count($each_letter); $i++) {
if ($each_letter[$i] == '<' && $each_letter[$i+1] == 'a' && $each_letter[$i+2] == ' ' && $each_letter[$i+3] == 'h' && $each_letter[$i+4] == 'r' && $each_letter[$i+5] == 'e' && $each_letter[$i+6] == 'f' && $each_letter[$i+7] == '=' && $each_letter[$i+8] == '"') {
	$count_urls++;
	$this_url = '';
	$j = 0;
	while($check_for_end != '"') {
		$this_url .= $each_letter[$i + 9 + $j];
		$j++;
		$check_for_end = $each_letter[$i + 9 + $j];
	}
	echo $count_urls.' - '.$this_url.'<hr />';
}
}
echo $count_urls;
?>

 

Nina

OK thats a long script,

try something like this

<?php
$URL = 'http://uk3.php.net/manual/en/';
$subject = file_get_contents ($URL);
preg_match_all('/(?:href="([^"]*)"+)|(href=\'([^\'])*\'+)/i', $subject, $result, PREG_PATTERN_ORDER);
$result = preg_replace('/(?:href="([^"]*)"+)|(href=\'([^\'])*\'+)/i', '$1', $result[0]);
echo "<pre>";
print_r($result); //$result being the array list 
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.