Jump to content

multidimensional associative array stalls in nested foreach loops


geno11x11

Recommended Posts

I may have gotten carried away trying to make my project ultra efficient...

  • Originally, I wrote my application to repeatedly grab the same data set from mySQL with each php module. The data is used in a variety of ways, so I get the full mySQL data record and then chop & process it within the php module rather than modifying the query over and over again.
  • I thought it would be great to get data once, stuff it all into a multidimensional array, and transfer the array from module to module to increase speed (RAM is faster than drive access, right?)
  • As I extracted full data sets with recursiveIteratorIterator(), I realized that once I obtained the needed data subset I could break the loop because the balance of the data was unneeded -- so I switched to nested foreach loops to grab data from the array and added a break when the data subset was extracted.
  • So now I am on the third version of the same code and I am finding that var_dump($array) has a lengthy pause after about 150 records. When I display the data using echo or print_r, the nested foreach loops stop at about the point of the var_dump pause.

The array size of my current query is 102KB.

 

Questions:

 

  • I like the idea of cutting the processing when the data search has done it's job before the end of record, but am I really saving time or resources?
  • Is it really a better idea to create a large array and pass it between modules or is grabbing the data from mySQL (perhaps cached?) a better choice?
  • What is causing the pause in var_dump? Am I running out of RAM, perhaps going to virtual ram on my computer? Is there a way to assign memory to the array?
  • Why is the nested foreach commands stopping output prematurely?
  • In terms of efficiency balanced by practicality, what is the best approach?

 

Code - This version displays the full record without breaks:

foreach ($rows as $key => $row) 
	{ 
	foreach ($row as something => $else)
		{  foreach ($else as $k => $V) 
			{    echo "row(",$key,") "," field[",$something,"] ",$k ,": ", $v, "<br/>";
   			}
		}
	}
Edited by geno11x11
Link to comment
Share on other sites

1. I like the idea of cutting the processing when the data search has done it's job before the end of record, but am I really saving time or resources?

If you have 1,000,000 records and cut it after 10 then yes you'll be saving time and potentially some resources - I couldn't comment on how much you'd save though. That said you're not guaranteed to save time or resources as the data you're looking for could be at the end of the array. Therefore you're efficiency is poor, see question 5.

 

2. Is it really a better idea to create a large array and pass it between modules or is grabbing the data from mySQL (perhaps cached?) a better choice?

Passing a mass data structure containing all your data definitely is not more efficient than retrieving the data you need, when you need it. Assuming you're using an object-oriented style, you should fetch your data and and create an object to contain the data, then pass the object around. This object should contain appropriate methods with the appropriate visibility to ensure the attributes remain the property of said object and are not manipulated externally - unless this should be allowed. MySQL will cache the queries by default if I'm not mistaken, thereby optimising its efficiency. Furthermore, you can cache pages yourself where possible, even if its sections of pages although that begins getting complicated.

 

3. What is causing the pause in var_dump? Am I running out of RAM, perhaps going to virtual ram on my computer? Is there a way to assign memory to the array?

Computers don't "switch to virtual RAM" if they run out of physical RAM [1][2]. There is no way to set memory in PHP as you would in Java for example (not to my knowledge anyway, someone can verify); PHP assigns memory at run-time. You may have hit a memory limit defined in your php.ini file; the variable is "memory_limit". If this is removed the operating system should enforce a memory limit per process but if you reach this you seriously need to rethink your application architecture. Hitting the PHP limit raises questions in itself.

 

4. Why is the nested foreach commands stopping output prematurely?

Not entirely sure. You would need to break everything down and go through it sequentially debugging each line.

 

5. In terms of efficiency balanced by practicality, what is the best approach?

You're current efficiency described using Big Oh notation is O(N^3) which is pretty horrendous, although not the worst. What this means is, if you submit data of size 1,000 to your algorithm the time taken to process the data will be 1,000 ^ 3 = 1,000,000,000. Similarly, if you have an input of 10,000 the time taken will be 10,000 ^ 3 = 1,000,000,000,000. As you can see, multiplying your input by 10 gives a massive difference in time taken for the algorithm to complete (x1000).

 

To improve the efficiency you need to break your algorithm down and use the appropriate data structures to manage the data, e.g. Stack, Queue, LinkedList. 

 

Hopefully that answers your questions to some degree and if something I've written is wrong I hope someone corrects me! Sorry if I went off in a tangent or failed to answer your questions.

 

 

[1] TechTarget. Virtual Memory http://searchstorage.techtarget.com/definition/virtual-memory.

[2] Microsoft. What is virtual memory? http://windows.microsoft.com/en-gb/windows-vista/what-is-virtual-memory.

Edited by cpd
Link to comment
Share on other sites

cpd: Yes, I am using OOP. Please expand your answer to question 2. I am confused by two statements:

Passing a mass data structure containing all your data definitely is not more efficient than retrieving the data you need, when you need it.
Assuming you're using an object-oriented style, you should fetch your data and and create an object to contain the data, then pass the object around.

As I read them, these statements seem to contradict. But perhaps your reference to "the mass data structure" means the array. If so, It implies there is something else (the object you referred to) available to hold mass data and I am unaware of it. A reference or code example would be very helpful.

Edited by geno11x11
Link to comment
Share on other sites

I was concerned when writing it you'd pick up on the two statements as they appear contradictory. From what you've said, you seem to be grabbing a load of data and shoving it all into a data structure (array), then passing it around even though the receiving object may not need the majority of data. It would be more efficient to get what data you want as and when you need it with an additional call rather than requesting superfluous data. As an example you can think of building a car (CarTypeA). Each part of a car has a blueprint (CarTypeALeftDoor, CarTypeBLeftDoor...). You shouldn't select all blue prints for every single car part for all car types, put it in an array that is passed to a car builder which extracts the blue prints for CarTypeA only. You should only select the data you need at that point in time, the blue prints for CarTypeA.

 

When I say "pass the object around" I mean request the required data, encapsulate it within an object and pass the object to any other object that requires it. For example if you have a user you can request their data and wrap it in a user object and pass the user object to whatever other object needs it. You wouldn't get all users on a hunch that you may need the data later. By breaking everything down and separating your logic you could potentially improve your efficiency as your loops could be broken down. 

 

E.g.

foreach($foo as $bar) {
   foreach($bar as $f) {
      // Code omitted
   }
}
 
foreach($foo as $bar) {
   // Code omitted
}

The first loop has an efficiency of O(N^2) and the second has O(N) which is far more efficient.

Edited by cpd
Link to comment
Share on other sites

cpd: Ok, I've got the picture.

 

I will start by modifying my original query and cutting out the unnecessary fields, although that will mean an additional query down the road; I wonder how it will net out. I expect there are algorithms that break out the efficiency of every processing step, but who has time for that? It might be a terrific add-on for an editor though - something of an efficiency grading or processing score at the end of a module.

 

Without a definitive answer to the pause/stall question, the second step will be to eliminate the large array and revert back to pulling data from the database. Using prepared statements and substituting the bound parameters for the needed fields, processing efficiency is said to be high; I was impressed by the speed at which large volumes of data were processed.

 

Finally, I'll dump the nested loops and return to recursiveIteratorIterator(). Based on the efficiency equations you provided, it appears that less is more -- I am guessing that the canned functions are not just easier to utilize, but are optimized for efficiency as well.

 

Thanks for your time and advice!

Link to comment
Share on other sites

whenever you find yourself doing a lot of work for any one programming task, it is a sign that you are doing something wrong. either your data definition isn't suitable for the task and needs to be reworked or you are layering on code that doesn't have anything to do with the goal you are trying to achieve.

 

this forum regularity helps people reduce and simplify code and data by identifying the underlying problems with that code and data, but that requires having specific details of what that code and data is and how it is being used. posting an example of your data (the input) and what result(s) you are trying to get from that data (the output) would allow someone to help specifically with what you are doing and why your code might be having a problem after x number of iterations.

 

BTW - what does this thread have to do with Regex, the forum section it is posted in?

Link to comment
Share on other sites

... but that requires having specific details of what that code and data is and how it is being used. posting an example of your data (the input) and what result(s) you are trying to get from that data (the output) would allow someone to help specifically with what you are doing and why your code might be having a problem after x number of iterations.

 

I thought I already made a post explaining this. Mac_gyver has hit the nail on the head! To really determine how you should be going about your problem we need to see specific code examples.

Edited by cpd
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.