Jump to content

Recommended Posts

I've been doing some benchmarking for memory usage for a very resource-sensitive project I'm doing (that I probably shouldn't be using PHP for, but oh well). I came across something peculiar, and I'm trying to figure out why I get results as I do.

 

Here is my test file:

<?php
$string_parts = 4;
$max = 10000;
$arr = array();
$i = 0;
$start = memory_get_usage();
for(;$i<$max;$i++){
$arr[] = array_pad(array(), 4, "A");
}
$end = memory_get_usage();
printf("%d iterations, %d bytes in array\n", $max, $end-$start);

$arr2 = array();
$i = 0;
$start2 = memory_get_usage();
for(;$i<$max;$i++){
$arr2[] = array("A", "A", "A", "A");
}
$end2 = memory_get_usage();
printf("%d iterations, %d bytes in array (%.2f%%)\n", $max, $end2-$start2, ($end2-$start2)/($end-$start)*100);
?>

 

Output on my system:

10000 iterations, 4145696 bytes in array
10000 iterations, 5585744 bytes in array (134.74%)

 

A 35% memory gain for two of the same exact arrays seems really weird. Any information as to why, or perhaps any useful functions or resources would be appreciated.

As expected, nothing changes. Even if I free up the memory from the first cycle, I'm taking the measurements both before and after, so the value is always relative. It should be noted that if I switch the two tests entirely, the second one comes in at ~75% memory usage of the first (expected, as 1/1.3474 ~= .75). So, I'm fairly sure that I'm benchmarking correctly.

Even if you separate the code into two files, you will get similar results.

 

If you use array_fill() you will get a similar value to the array_pad().

 

Best guess is that explicitly listing entries in an array results in a data structure that consumes more memory. Which gave me the following idea -

 

Php could be using some short-circuit optimization where it is using a reference to the result of the array_pad() output (which is a constant/fixed value), which I just confirmed with the following code (the memory usage goes way down by defining the $ar filled array outside the loop and assigning it inside the loop) -

 

<?php
$string_parts = 4;
$max = 10000;
$arr = array();
$i = 0;
$start = memory_get_usage();
$ar =  array_pad(array(), 4, "A");
for(;$i<$max;$i++){
//$arr[] = array_pad(array(), 4, "A");
$arr[] = $ar;
}
$end = memory_get_usage();
printf("%d iterations, %d bytes in array\n", $max, $end-$start);


$arr2 = array();
$i = 0;
$start2 = memory_get_usage();
for(;$i<$max;$i++){
$arr2[] = array("A", "A", "A", "A");
//$arr2[] = array('A', 'A', 'A', 'A');
//$arr2[] = array_fill(0,4,'A');
}
$end2 = memory_get_usage();
printf("%d iterations, %d bytes in array (%.2f%%)\n", $max, $end2-$start2, ($end2-$start2)/($end-$start)*100);
?>

That certainly makes sense. To further experiment, I discarded the second test and replaced it with this:

foreach($arr as $i=>$a){
foreach($a as $j=>$b){
	$arr[$i][$j] = "B";
}
}

So basically, writing over every "A" with a "B". After this, I saw an increase of 34.75%, which is exactly the same increase I saw when I had the original second test.

 

So I guess the bottom line is don't use array_pad() when doing memory benchmarks. Thanks for your help.

LOL, values in variables/arrays aren't what you think they are unless you explicitly assign them...

 

I suspected that if you altered the values that the memory would go up as expected because the reference to the common static/fixed value would get replaced with the actual unique data.

Okay, hold on. I unmarked it solved, since I realized I jumped to a conclusion in my last post.

 

Yes, changing all the strings from "A" to "B" did cause it to go up 35% to be in line with the second test, but changing all the variables from "A" to "B" in the second test causes that to go up a further 35%, which still puts it 35% when the data is the exact same. The numbers were coincidental (or not) and I didn't think to keep testing.

 

The part I don't understand about your original post is:

Php could be using some short-circuit optimization where it is using a reference to the result of the array_pad() output (which is a constant/fixed value), which I just confirmed with the following code (the memory usage goes way down by defining the $ar filled array outside the loop and assigning it inside the loop) -

 

I'm not seeing how you confirmed your suspicion that it was storing references, since in your test case, it was actually storing references, and the memory usage dropped to almost 1/10th of the original.

try

<?php
$string_parts = 4;
$max = 10000;
$arr = array();
$i = 0;
$start = memory_get_usage();
$ar =  array_pad(array(), 4, "A");
for(;$i<$max;$i++){
//$arr[] = array_pad(array(), 4, "A");
$arr[] = $ar;
}
$end = memory_get_usage();
printf("%d iterations, %d bytes in array\n", $max, $end-$start);


$arr2 = array();
$i = 0;
$ar = array("A", "A", "A", "A");
$start2 = memory_get_usage();
for(;$i<$max;$i++){
$arr2[] = $ar;
//$arr2[] = array('A', 'A', 'A', 'A');
//$arr2[] = array_fill(0,4,'A');
}
$end2 = memory_get_usage();
printf("%d iterations, %d bytes in array (%.2f%%)\n", $max, $end2-$start2, ($end2-$start2)/($end-$start)*100);
?> f/code]

I moved both $ar's outside of the memory testing range and changed the second one to $ar2 (so that it wouldn't cause a drop in memory from that being overwritten), and this is what I got as output:

10000 iterations, 545608 bytes in array
10000 iterations, 545608 bytes in array (100.00%)

 

This makes sense, since in both cases it's just storing it as a simple reference until it gets modified. But, this gave me the idea of taking the loop out and just trying $ar and $ar2, and I finally got it, I think.

 

<?php
$start = memory_get_usage();
$ar =  array_pad(array(), 4, "A");
$end = memory_get_usage();
printf("%d bytes in array\n", $end-$start);


$start2 = memory_get_usage();
$ar2 = array("A");
$ar2[] = $ar2[0];
$ar2[] = $ar2[0];
$ar2[] = $ar2[0];
$end2 = memory_get_usage();
printf("%d bytes in array (%.2f%%)\n", $end2-$start2, ($end2-$start2)/($end-$start)*100);
?>

 

Output:

496 bytes in array
496 bytes in array (100.00%)

 

I tried re-implementing this into the loop, and it came out to within 40 bytes of each other, which I will accept as close enough. So, what this means is, when you call array_pad(), it makes the first element unique, and then all the other elements in the padding are just references to that first unique element.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.