shivam0101 Posted October 28, 2008 Share Posted October 28, 2008 <?php function getLinks4($url, $limit, $crawled) { $fc = @file_get_contents( urldecode( $url ) ); if( !$fc ) return false; preg_match_all( '/\s+href\s*=\s*[\"\']?([^\s\"\']+)[\"\'\s]+/ims', $fc, $links ); if ( $links ) { foreach ( $links[1] as $url ) { $limit--; if( ! $limit ) return; if(!in_array($url, $crawled)) { $crawled[] = $url; print_r($crawled); echo '<br/>'; echo '<br/>'; echo '<br/>'; print_r($url); echo '<br/>'; echo "END OF THE CRAWL <br>"; getLinks4( $url, $limit, $crawled); } else { echo "Allready exists <br/>"; } } } } $indexed_url = getLinks4('http://localhost/my_search/search/page_2.php', 50, ''); ?> I am trying to get all the links but not getting unique arrays. That is if(!in_array($url, $crawled)) is failing. I can use array_unique. That is another option. i do not want to use another function to filter it. Thanks Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/ Share on other sites More sharing options...
MadTechie Posted October 28, 2008 Share Posted October 28, 2008 And the problem is ? Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/#findComment-676406 Share on other sites More sharing options...
shivam0101 Posted October 28, 2008 Author Share Posted October 28, 2008 I am trying to get all the links and their relevent contents and store in database. The problem is, i am getting same url's several times. Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/#findComment-676413 Share on other sites More sharing options...
MadTechie Posted October 28, 2008 Share Posted October 28, 2008 Heres an example of how i would do it <?php #$fc = @file_get_contents( urldecode( $url ) ); #if( !$fc ) return false; //example $HTML = ' <a href="http://www.phpfreaks.com/1">test1</a> <a href=\'http://www.phpfreaks.com/1\'>test1</a> <a href="http://www.phpfreaks.com/1">test1</a> <a href="http://www.phpfreaks.com/1">test1</a> <a href=\'http://www.phpfreaks.com/2\'>test2</a> <a href=http://www.phpfreaks.com/3>test3</a> <a href="http://www.phpfreaks.com/4">test4</a> <a href="http://www.phpfreaks.com/5">test5</a> <a href="http://www.phpfreaks.com/6">test6</a> <a href="http://www.phpfreaks.com/7">test7</a> <a href="http://www.phpfreaks.com/8">test8</a> '; //Match URLs preg_match_all('/\s+href\s*=\s*(["\'])?([^\1]*?)\1/si', $HTML, $result, PREG_PATTERN_ORDER); //Remove Dups $result = array_unique($result[2]); //output echo "<pre>"; print_r($result); ?> Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/#findComment-676461 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.