Jump to content

Loop -> Loop -> Loop, Performance issues.


Alex_

Recommended Posts

Hey.

 

So the issue I'm having is consecutive loops on semi-large arrays, over and over. Consider this array:

$firstArray = array(
'row1' => array(
'dates' => array(
'2014-01-01' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-01-02' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-01-03' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-01-04' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-01-05' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-01-06' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-01-07' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
)
),
'row2' => array(
'dates' => array(
'2014-02-01' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-02' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-03' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-04' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-05' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-06' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-07' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-08' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
'2014-02-09' => array('key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3', 'key4' => 'value4', 'key5' => 'value5', 'key6' => 'value6', 'key7' => 'value7', 'key8' => 'value8', 'key9' => 'value9', 'key10' => 'value10'),
)
)
);

Originally the data comes from ~2-3 database tables, of course. But to ilustrate the point, this is how the main array looks like. This array usually contains anywhere between 10-50 rows, each row containing at least 10 dates, with 10 key/values each.

 

And after setting up all the data, it needs to be processed. Currently this is how a friend of mine did it..

$placeDataHere = array();
foreach($firstArray as $key => $dates) {
foreach($dates as $date => $values) {
foreach($values as $key => $value) {
$placeDataHere['DV_' . $date]['SM_' . $key] = 'KS_' . $value;
//Followed by another ~50-70 lines of processing the 3 loop's data..
...
...
....
....
....
....
....
....
}
}
}

Obviously this isn't good practise, but we can't seem to figure out a better way of doing it, since both the data and the loops are horribly nested.

 

This loop and setup of $firstArray is run anywhere between 10-20 times/request, due to amount of users we wish to process. So, the result is that this code can take up to over 2-3 minutes to complete, which isn't really optimal performance.

 

In short my question is, are there any better methods of handling this with the data setup we currently have?

Link to comment
Share on other sites

You probably need to look into using SQL JOINs on the data in the DB.

 

Oh yeah I forgot to mention, sorry.. The data comes from two different databases.

 

Basically one Database is the User's own database (Well, postgres schema), while the second Database is a union DB for all Users, where sensitive data is held. So unfortunately union statements aren't an option either, considering it's two different connections.

Link to comment
Share on other sites

Is hard to toss out good advice without knowing the entire workings of your database design.

 

Did you know you can directly access the data without triple looping

foreach($firstArray['row1']['dates'] as $key => $value) {
//whatever want to do with it
}
Edited by QuickOldCar
Link to comment
Share on other sites

It's hard to help you without 'real' data or at least, really knowing what you want to do.

 

So far, I understand that:

  • You have 2 different databases
  • You have 1 table about a user in DB1?
  • And a second table about all the users in DB2?
  • Than, something gives you an array in an array in an array...

The parts that are missing to help you:

  • What's the structure of the tables in DB1 and DB2 that you're trying to query?
  • What is the information you're trying to get from the DB?
  • What's the actual code that produce this array you posted in the first message?
  • And what's the format in which you're trying to display the data? (show an example of a desired result)? Or what are you trying to do with the data in the arrays? What's your 'process' you're talking about?

If you can answer those questions, it'll be way easier to help you :)

Link to comment
Share on other sites

 

Is hard to toss out good advice without knowing the entire workings of your database design.

 

Did you know you can directly access the data without triple looping

foreach($firstArray['row1']['dates'] as $key => $value) {
//whatever want to do with it
}

 

Yeah that's something to try i suppose. I've been reading up a bit on the different loop options and I already had a general idea behind the lot of them, and I'm aware foreach copies an array prior to issuing the loop, which may be something to think over, considering the array can be quite large.

 

My thoughts have been striking over to PHP's extract method, considering it's a local function (won't overwrite anything outside the bounds of the function), with data only obtained from the database, meaning they're going to be safe variables. The only doubts I have is if the extract method will be faster than a loop or not.

 

 

@mogo: Yeah I know it's difficult without the real data, the #1 post is just an example of how the multidimensional array "can" look like. Although there's a lot more fields in the 'real deal'. I never display the data for the user, but it's used to calculate values for another table (each user = new row), and there can be info on up to 31 days/user. Adding in all the code here probably won't be too helpful, there's a lot of stuff that I can't really explain in a post like this.. :) In short, it's calculating Paycheck data that is later generated into PDF's.

 

@Barand: Unfortunately they are not. The Second union DB is on a seperate server, because it's very sensitive.

 

 

Any critisism or thoughts about using php's extract for this? Downsides, upsides, etc. Tested the speed of the function earlier today, and the peak time in seconds to complete was 9 seconds. That's insane, in my eyes.

Edited by Alex_
Link to comment
Share on other sites

extracting an array to individual variables would just create a mess of individual variables, that if you are then planning on using variable-variables to reference, will take three times longer than referencing the data in an array.

 

are you sure that the time taken is in the processing of the data or is it in the query(ies)? have you profiled the code to pin down where the majority of the time is taken? for all we know you are running queries inside of loops on data that is lacking needed indexes or that you are repeatedly opening/closing a database connection.

 

the example data tells us little about the problem. your actual complete code is what is important in terms of helping with a performance problem. the only thing that is apparent from the hinted at code - $placeDataHere['DV_' . $date]['SM_' . $key] = 'KS_' . $value; is that by adding text prefixes to these three variables that ALL the processing is slowed down by the extra code need to reference the data using the synthesized keys. in fact, why are you even storing data it in the $placeDataHere array?  just use the $date, $key, and $value directly in the code.

 

since you must retrieve the data from two different database servers, the typical things that would help with the speed would be -

1) only retrieve the data you need (for all we know you are retrieving far more data than gets used.)

 

2) do as much processing in the sql queries as possible (for all we know the actual processing of values involves data from only one of the servers and could be handled in the query and returned relative to just a key value to be used by the php code to combine it with the data from the second server.)

 

3) since the processing is likely on a per-user basis, the data in the arrays should be organized per-user (i'm guessing the $key is user related) so that related data is grouped (see item #4.)

 

4) eliminate actual loops by using array_map()/array_walk() with call-back functions to do the processing (requires that the data be organized in a way the aids processing.)

 

short-answer: you must first find where the time is being taken up at in the code, then address what that portion of the code is doing.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.