Jump to content

Recommended Posts

If I am to understand this correctly, arrays in PHP is hash maps? If that is the case, the worst case time it takes to find something in an array is O(log(n)), right? Shouldn't there then be cases where foreach could be faster to loop through an array with? Imagine the case of a huge array of integers, maybe some thousands. Would it use the same time accessing each of the entries in the array with foreach as a normal for loop? If that is not the case, what if it is a huge array filled with arrays of integers, and you use more than one or two of the values in each array you loop through the giant one? I understand there is quite a huge advantage of using a for loop because you have more control of exactly what key you use against the array.

 

Thoughts?

Edited by MMDE
Link to comment
https://forums.phpfreaks.com/topic/270818-foreach-vs-for-to-loop-through-array/
Share on other sites

I doubt there is any difference worth worrying about between a for and a foreach loop.  Probably the only real difference you might notice is that in a typical usage case, foreach() is going to create a copy of the source array, but even that would probably not cause much of a speed difference until you get into pretty massive array sizes.

 

Use which ever one fits your needs best.  If you just want to loop over every item in an array, a foreach() generally the best.  If you need to loop through just a portion of an array, or need to keep a count as you loop the array for some reason (ie creating a numbered list output) a for loop may be better.

This has actually been benchmarked quite a few times, and the results have shown minuscule differences every single time. The conclusion of this is that it's not normally worth worrying about, and if you do need to worry about it you really should profile your application properly first; Chances are, you have a lot more to gain by spending the time in other parts of the code.

Write the code that makes the most sense from a maintainability standpoint, as long as you keep it sane (no queries in loops, huge number of inner loops, or similar stuff). If you suspect you can/need to gain some better performance, profile the code so you know exactly what's taking time. It's often the stuff you didn't think about at all that takes the most time, after all. ;)

How could foreach really be faster? It too is looping through each element in your array, and this is essentially the slowest part of it all. There is no way to avoid that if you want to search an unordered data structure for a given element (unless you can determine the existence by doing a key lookup, of course, in which case the efficiency would be O(1) rather than O(n)).

 

If you are looping through an array, for instance, then the worst case scenario is O(n) and not O(log n). This is because in the worst case, the element you are looking for is the last element, which would cause you to iterate N times - not the logarithm of N. I wrote an article about this some months ago.

 

Using a binary search algorithm (also sometimes referred to as divide & conquer), you can achieve O(log n) efficiency provided that your data set is sorted. I was writing an article about this some time ago, but I have yet to finish it.

 

If you are searching for an item in your array, then foreach will always be O(n) unless you break the loop when you find the item. In this case, O(n) is now the worst case scenario. My personal preference is to use foreach when I know that I always have to loop through all of the items and use awhile loop when I have another loop condition, e.g. a search algorithm. Surely you could use a good old for loop too if you prefer. I find it prettier to use a while loop over foreach with a return statement because then you don't have to analyze the loop body to determine the loop condition; in a while loop, it's all in the loop condition. To me it makes it easier to read and doesn't give me a false indication of the loop condition when I read "foreach...".

The reason that one's (slightly) faster than the other, is that they don't preform the exact same machine code instructions. Even if you construct a while loop with reset () and each (), it won't do exactly the same as the foreach () loop under the hood. The effect is the same, make no mistake about that, but since you're calling additional PHP functions the PHP parser has to do more work.

 

When it comes to your article I'm afraid it's a bit inaccurate. In the foreach () explanation you refer to the variable being referenced, even though you haven't passed it by reference at all. It also seems that this (apparent) confusion has lead you to mix the warning from the PHP page into this all, at least from what I could understand. It's a bit unclear what you meant, so I might be mistaken on that bit.

The two alternatives you've posted are also woefully inefficient in comparison to the first, since you're calling count () for every iteration. O(n) isn't always the same as O(n), when comparing two different functions. The increase in complexity might very well be the same for both, but it all depends upon how many instructions per iteration they are running which is the faster (not to mention the size of N).

 

So not only is foreach () faster than the alternatives (by a very small margin), but in the right circumstances a O(n) operation can even be faster than a O(log n) operation. All the notation tells you, after all, is how the performance is influenced by an increase in the dataset to be processed.

 

PS: For a search algorithm, array_flip () and isset () can actually be a lot faster than the alternatives. Especially with big arrays.

Edited by Christian F.

The reason that one's (slightly) faster than the other, is that they don't preform the exact same machine code instructions. Even if you construct a while loop with reset () and each (), it won't do exactly the same as the foreach () loop under the hood. The effect is the same, make no mistake about that, but since you're calling additional PHP functions the PHP parser has to do more work.

 

Yes, you are correct that the instructions are different and therefore the efficiency is slightly different. I was thinking in bigger terms, because optimizations like these are not necessary for 99.9% of PHP programmers.

 

When it comes to your article I'm afraid it's a bit inaccurate. In the foreach () explanation you refer to the variable being referenced, even though you haven't passed it by reference at all.

 

If I understand you correctly, then I just assumed that the array existed for the sake of simplicity, but I guess it is true that it should have been added as a parameter for the completeness of the example.

 

The two alternatives you've posted are also woefully inefficient in comparison to the first, since you're calling count () for every iteration.

 

I guess this is true. I have made this a habit from other languages (e.g. C# where the length is a property). I think compiled languages are clever enough to optimize this at compile time, which is why I have carried over this habit. I will correct it, however.

 

O(n) isn't always the same as O(n), when comparing two different functions. The increase in complexity might very well be the same for both, but it all depends upon how many instructions per iteration they are running which is the faster (not to mention the size of N).

 

That is true. My focus was on the growth rates because this is the most important thing. For most developers, this is enough. I think unless you are running a really large scale system, these small optimizations are not worth the effort as you will probably have more success looking at other optimizations first. It is true that two algorithms that both have the O(n) complexity can have different efficiency, but it's not going to be by much compared to O(n) vs. O(log n), for instance.

 

So not only is foreach () faster than the alternatives (by a very small margin), but in the right circumstances a O(n) operation can even be faster than a O(log n) operation. All the notation tells you, after all, is how the performance is influenced by an increase in the dataset to be processed.

 

Again, you are right. For small data sets, O(n) is often more efficient than O(log n) because the worst case is still efficient. Also, O(log n) will have more instructions, so each iteration will require more resources. However, it is also important to consider the fact that data sets tend to grow in size; what may contain 10 elements today may contain 5000 in a few years and may eventually cause a bottleneck in the system. So even if O(n) may be slightly more efficient, I'd still tend to go with O(log n) to be safe, unless I know that the growth in data will be very small - or perhaps unlikely to increase at all. The different is probably very minor anyways.

 

Thank you for the feedback.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.