NinaDizz Posted May 27, 2007 Share Posted May 27, 2007 Hi all I need a function or class to go through a HTML document, and strip out all of the URLs and then store them in an array. Anyone got one or know where I could get one? TIA! Nina Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/ Share on other sites More sharing options...
MadTechie Posted May 27, 2007 Share Posted May 27, 2007 you can find all the functions required here or someone here could probably do the whole thing for you. if they don't work try here Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262604 Share on other sites More sharing options...
NinaDizz Posted May 27, 2007 Author Share Posted May 27, 2007 I tried there, didn't have any luck. I couldn't go there, as I have very few pennies. And I also tried there, as I still am, for the past two hours, without any joy. Nina Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262608 Share on other sites More sharing options...
MadTechie Posted May 27, 2007 Share Posted May 27, 2007 what do you have so far ? Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262611 Share on other sites More sharing options...
MadTechie Posted May 27, 2007 Share Posted May 27, 2007 here's a basic start to the hard part <?php preg_match_all('/href=["|\']+(.*)["|\']+/sim', $subject, $result, PREG_PATTERN_ORDER); $result = $result[0]; ?> good luck with the opening contents of remote file etc Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262615 Share on other sites More sharing options...
maxim Posted May 27, 2007 Share Posted May 27, 2007 easy just use file_get_contents, and it will put a local or a remote file in one big string. Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262625 Share on other sites More sharing options...
NinaDizz Posted May 27, 2007 Author Share Posted May 27, 2007 TBH its pretty woeful so far. What I've written (despite being horrible) is returning the first URL in the source.html file, but then isn't outputting any more, even though I can see that it's found more. <?php $source_file = file_get_contents ('source.html'); $each_letter = str_split ($source_file); for($i = 0; $i <count($each_letter); $i++) { if ($each_letter[$i] == '<' && $each_letter[$i+1] == 'a' && $each_letter[$i+2] == ' ' && $each_letter[$i+3] == 'h' && $each_letter[$i+4] == 'r' && $each_letter[$i+5] == 'e' && $each_letter[$i+6] == 'f' && $each_letter[$i+7] == '=' && $each_letter[$i+8] == '"') { $count_urls++; $this_url = ''; $j = 0; while($check_for_end != '"') { $this_url .= $each_letter[$i + 9 + $j]; $j++; $check_for_end = $each_letter[$i + 9 + $j]; } echo $count_urls.' - '.$this_url.'<hr />'; } } echo $count_urls; ?> Nina Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262628 Share on other sites More sharing options...
NinaDizz Posted May 27, 2007 Author Share Posted May 27, 2007 Oh, MadTechie, thanks for the regex btw, I'm playing around with it now, but still losing hair by the second Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262629 Share on other sites More sharing options...
MadTechie Posted May 27, 2007 Share Posted May 27, 2007 OK thats a long script, try something like this <?php $URL = 'http://uk3.php.net/manual/en/'; $subject = file_get_contents ($URL); preg_match_all('/(?:href="([^"]*)"+)|(href=\'([^\'])*\'+)/i', $subject, $result, PREG_PATTERN_ORDER); $result = preg_replace('/(?:href="([^"]*)"+)|(href=\'([^\'])*\'+)/i', '$1', $result[0]); echo "<pre>"; print_r($result); //$result being the array list ?> Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262652 Share on other sites More sharing options...
NinaDizz Posted May 27, 2007 Author Share Posted May 27, 2007 Thanks for that. A lot more efficient! I'll work with that snippet from now on. Thanks again! Quote Link to comment https://forums.phpfreaks.com/topic/53158-solved-stripping-urls-from-a-page/#findComment-262808 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.