shivam0101 Posted October 28, 2008 Share Posted October 28, 2008 <?php function getLinks4($url, $limit, $crawled) { $fc = @file_get_contents( urldecode( $url ) ); if( !$fc ) return false; preg_match_all( '/\s+href\s*=\s*[\"\']?([^\s\"\']+)[\"\'\s]+/ims', $fc, $links ); if ( $links ) { foreach ( $links[1] as $url ) { $limit--; if( ! $limit ) return; if(!in_array($url, $crawled)) { $crawled[] = $url; print_r($crawled); echo '<br/>'; echo '<br/>'; echo '<br/>'; print_r($url); echo '<br/>'; echo "END OF THE CRAWL <br>"; getLinks4( $url, $limit, $crawled); } else { echo "Allready exists <br/>"; } } } } $indexed_url = getLinks4('http://localhost/my_search/search/page_2.php', 50, ''); ?> I am trying to get all the links but not getting unique arrays. That is if(!in_array($url, $crawled)) is failing. I can use array_unique. That is another option. i do not want to use another function to filter it. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/ Share on other sites More sharing options...
MadTechie Posted October 28, 2008 Share Posted October 28, 2008 And the problem is ? Quote Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/#findComment-676406 Share on other sites More sharing options...
shivam0101 Posted October 28, 2008 Author Share Posted October 28, 2008 I am trying to get all the links and their relevent contents and store in database. The problem is, i am getting same url's several times. Quote Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/#findComment-676413 Share on other sites More sharing options...
MadTechie Posted October 28, 2008 Share Posted October 28, 2008 Heres an example of how i would do it <?php #$fc = @file_get_contents( urldecode( $url ) ); #if( !$fc ) return false; //example $HTML = ' <a href="http://www.phpfreaks.com/1">test1</a> <a href=\'http://www.phpfreaks.com/1\'>test1</a> <a href="http://www.phpfreaks.com/1">test1</a> <a href="http://www.phpfreaks.com/1">test1</a> <a href=\'http://www.phpfreaks.com/2\'>test2</a> <a href=http://www.phpfreaks.com/3>test3</a> <a href="http://www.phpfreaks.com/4">test4</a> <a href="http://www.phpfreaks.com/5">test5</a> <a href="http://www.phpfreaks.com/6">test6</a> <a href="http://www.phpfreaks.com/7">test7</a> <a href="http://www.phpfreaks.com/8">test8</a> '; //Match URLs preg_match_all('/\s+href\s*=\s*(["\'])?([^\1]*?)\1/si', $HTML, $result, PREG_PATTERN_ORDER); //Remove Dups $result = array_unique($result[2]); //output echo "<pre>"; print_r($result); ?> Quote Link to comment https://forums.phpfreaks.com/topic/130401-looping-problem/#findComment-676461 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.