MaxMouseDLL Posted August 19, 2009 Share Posted August 19, 2009 Say i have a foreach loop, which is dependant on $arr... if i add other elements to $arr while still inside the foreach loop, will foreach process the newly created elements or drop out at what the $arr upper boundary condition was before entering the foreach loop? If it doesn't process elements added while inside the foreach, then what would be a work around for this? Quote Link to comment Share on other sites More sharing options...
rhodesa Posted August 19, 2009 Share Posted August 19, 2009 off the top of my head, i'm not sure if foreach will process them or not...but (if it's a numerically indexed array) you can always do: for($n=0;$n < count($arr);$n++){ } Quote Link to comment Share on other sites More sharing options...
mikesta707 Posted August 19, 2009 Share Posted August 19, 2009 Nope, it will drop out at the boundry before the foreach loop started. so the following will only output 1-10 $arr = array(1, 2,3,4,5,6,7,8,9,10); foreach($arr as $ass){ $arr[] = $ass + 1; echo $ass; } Output: 12345678910 However, the following will show the full array $arr = array(1, 2,3,4,5,6,7,8,9,10); foreach($arr as $ass){ $arr[] = $ass + 1; } foreach ($arr as $ass){ echo $ass . "<br />"; } 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 11 however, rhodesa's example does keep going if you add more and more to the array, but the following example for($n=0;$n < count($arr);$n++){ $arr[] = $n + 1; echo $arr[$n] . "<br />"; } results in an endless loop, so make sure you don't add something to the array at every pass Quote Link to comment Share on other sites More sharing options...
MaxMouseDLL Posted August 19, 2009 Author Share Posted August 19, 2009 My code is a PHP Web Spider, so it may or may not add an element (or more than one element) per loop... EG: http://yoursite.com may produce 300 extra elements http://yoursite.com/map.php may produce (say) 10 elements http://yoursite.com/about.php may produce none All variables are unknown... How do i avoid an infinite loop, yet process an ever changing array upper boundary until completion... i hate recursion lol! Quote Link to comment Share on other sites More sharing options...
mikesta707 Posted August 19, 2009 Share Posted August 19, 2009 recursion is awesome don't be a hater =P But how exactly does your script work? I've never used a web spider before so I would probably have to see the inner code to give any advice. depending on how the loop works, it might might just prevent an infinite loop by itself. You could set a cap for how many times it can loop (IE if i > say 5000, break;) Quote Link to comment Share on other sites More sharing options...
MaxMouseDLL Posted August 19, 2009 Author Share Posted August 19, 2009 Its function based, so i pass it a URL (EG: http://www.something.com ) it returns an array containing all the links on that page, each one of those links will also need to be spidered for other links. So what i need to do is pass it a link, which returns all the links contained within that then begin "dynamic looping" if it returns index.html, hello.html, whatever.html it'll need to spider those three for links, and whatever links it finds within those and so on, the number of links returned is arbitrary and unknown, hence the loop needs to pay attention to the ever growing array rather than just the upper boundary of the array when it began executing. If you would like to see the code i can provide it. I'm not a hater i have just never got on with recursion... i tend to try and visualise the whole thing which is pretty much impossible and ends up bogging my brain down. Quote Link to comment Share on other sites More sharing options...
thebadbad Posted August 19, 2009 Share Posted August 19, 2009 Well, recursion would be the obvious solution to that. But you would have to set a limit, telling the script how deep it should go. Quote Link to comment Share on other sites More sharing options...
kratsg Posted August 19, 2009 Share Posted August 19, 2009 Well, recursion would be the obvious solution to that. But you would have to set a limit, telling the script how deep it should go. Not just recursion, but you also have to check for repeated links. Let's say you spider a website that uses a menu for its links, each page will have that menu of links which means without really going deep at all, you're kind of infinitely looping. Quote Link to comment Share on other sites More sharing options...
MaxMouseDLL Posted August 19, 2009 Author Share Posted August 19, 2009 It should go as deep as possible (IE: index all links) because I'm going to lock it to the domain, and duplicates will be removed. I think I'm going to have to sit and stare at this one for a while. The idea is to cron job the script, or atleast password it so i can execute it whenever i see fit, and from the output generate a sitemap.xml, one will be generated daily from this. From there the domain will be a filler (EG: <domain>) and i'll use another script to change (str_replace) that filler to whatever domain i see fit, that way where ever my site code is deployed a sitemap will be readily available. Quote Link to comment Share on other sites More sharing options...
thebadbad Posted August 19, 2009 Share Posted August 19, 2009 Well, recursion would be the obvious solution to that. But you would have to set a limit, telling the script how deep it should go. Not just recursion, but you also have to check for repeated links. Let's say you spider a website that uses a menu for its links, each page will have that menu of links which means without really going deep at all, you're kind of infinitely looping. Would be a fine feature, yes, but not necessary. The script should still stop at e.g. step 10. You could then just remove repeated links from the array afterwards. A thing you would need however is a function to convert relative URLs to absolute URLs: <?php //http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/ function relative2absolute($absolute, $relative) { $p = @parse_url($relative); if(!$p) { //$relative is a seriously malformed URL return false; } if(isset($p["scheme"])) return $relative; $parts=(parse_url($absolute)); if(substr($relative,0,1)=='/') { $cparts = (explode("/", $relative)); array_shift($cparts); } else { if(isset($parts['path'])){ $aparts=explode('/',$parts['path']); array_pop($aparts); $aparts=array_filter($aparts); } else { $aparts=array(); } $rparts = (explode("/", $relative)); $cparts = array_merge($aparts, $rparts); foreach($cparts as $i => $part) { if($part == '.') { unset($cparts[$i]); } else if($part == '..') { unset($cparts[$i]); unset($cparts[$i-1]); } } } $path = implode("/", $cparts); $url = ''; if($parts['scheme']) { $url = "$parts[scheme]://"; } if(isset($parts['user'])) { $url .= $parts['user']; if(isset($parts['pass'])) { $url .= ":".$parts['pass']; } $url .= "@"; } if(isset($parts['host'])) { $url .= $parts['host']."/"; } $url .= $path; return $url; } ?> And to be a 100% sure, you should also check every page for a base tag, and use the value as the first parameter to the above function if found. Quote Link to comment Share on other sites More sharing options...
kratsg Posted August 21, 2009 Share Posted August 21, 2009 I wonder if you include subdomains as well or just restrict it to either http://www.example.com/ or http://example.com/ (both of these are quite different and may be a pain in the long run when it comes to getting absolute URLs.) Perhaps simply spidering using relative URLs, and then rewinding up the array and making everything absolute. IE: Site Root/Public_Html domain - contains an array of files and folders -- folders contain an array of files and folders included in it So, as you rewind back to the top level of the array, you build up the relative path... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.