Jump to content

Speeding up a array search?


plautzer

Recommended Posts

Hi,

 

i have multidimensional array with about 100k rows which I want to check for certain values. i implemented the search a with for-loop which isnt performaning as well I like. Do u have any suggestions on how I can speed up the search?

 

  
for ($i = 1; $i < 100000; ++$i) {
   
   	$a_1 = 1000+$i;
   	$a_2 = 2000+$i;   	
   	$b_1 = 500+$i;
   	$b_2 = 700+$i;
   
   
   	for ($j = 1; $j < 100000; ++$j) {
   		
   		if ($arr[i]['val1'] > $a_1 and
   		$arr[i]['val1'] < $a_2 and
   		$arr[i]['val2'] > $b_1 and
   		$arr[i]['val2'] < $b_2) $x++;
   		
   	}
}

 

Thx in advance.

 

greetz

Plautzer

Link to comment
Share on other sites

yea... im creating an array table in first loop which I am searching every time in the second loop.

 

Another approach was to load the data in mysql database and apply a search query, which took even longer (0.1s per query).

i dont need persistent data... i therefore thought storing it in an array would be better.

 

I cant think of another way to storing the data for an quicker search afterwards... do u have any suggestions?

Link to comment
Share on other sites

Hm... well i assume you are generating the content for  a page and making some modifications for it? I would suggest to turn to j script  for those things, other wise there is no help that is a huge load on the server how ever you turn it,,,

 

I am using xml and jqery for that and taking the lad to teh clent  ::)

Link to comment
Share on other sites

I may not be a expert but this is not C or C++ or c# every of your array members is not a small member, not some bits... so i would consider to take another approach, what you are doing is like loading a whole table into a array

 

You are refering that C++ handles it differently. Does another language like C++ perform better with large datasets than php? If so, can u tell why and how much we are talkting about?

Link to comment
Share on other sites

I may not be a expert but this is not C or C++ or c# every of your array members is not a small member, not some bits... so i would consider to take another approach, what you are doing is like loading a whole table into a array

 

You are refering that C++ handles it differently. Does another language like C++ perform better with large datasets than php? If so, can u tell why and how much we are talkting about?

 

Ooookk... other "normal" languages like C and others use values that are by nature smaller and easier to handle

 

for instance the array you are making is something like this

 

the array are data fields that are connected by pointers and form the array structure. The thing is while most computing languages use data types that are fit to its type like integers takes like 8 bits and arrays  are 2 to 3 bytes  per char formed by C C++ you have to specify the type of the data you use like int, array, char...

 

But PHP is not a compiled lang it is a scripting and made for dumb dumb not for smart one... so they decided to take the data type out... which resulted they used the BIGGEST data type to hold all other data...  which made its price in the performance.....

 

Your array in C would be like some kb,,, here in php it is like.. around a MB!

 

 

 

 

Link to comment
Share on other sites

for this inner loop

for ($j = 1; $j < 100000; ++$j) {

if ($arr['val1'] > $a_1 and

  $arr['val1'] < $a_2 and

$arr['val2'] > $b_1 and

$arr['val2'] < $b_2) $x++;

 

}

 

why dont u just do

$x=x+100, 000 // since u dont use J, and the if statement is always the same for 100000 times

Link to comment
Share on other sites

Define "slow" in the context of your problem. Also, your code in the first post is looping 9,999,800,001 times which probably is not what you really want to be doing. Could you more precisely describe what you are trying to do?

Link to comment
Share on other sites

HI,

 

to be more specific... within the first loop the script is calculating a set of numbers which are saved to an array.

If the first loop reaches a loop count of 10000, the scipt is searching through the same array for datasets with similar entries.

 

So I basically doing a simple statistic of "historic" data for every dataset:

 

for ($i = 1; $i < 100000; ++$i) {
   
    #Calculating numbers e.g.: (the numbers $a and $b differ between 100 and 10000; c and d only differ a little)
   $a = 2334.42;
   $b = 1234.23;
   $c =  500;
   $d =  1000;

    #Filling the array
    $arr[i]['val1'] = $a;
    $arr[i]['val2'] = $b; 
    $arr[i]['val3'] = $c;
    $arr[i]['val4'] = $d;

    #Defining a "search area"
    $a_1 = $a * 0.95;
    $a_2 = $a * 1.05;   	
    $b_1 = $b * 0.95;
    $b_2 = $b * 1.05;
   
  $x = 0;
   if ($i > 10000){
   	for ($j = 1; $j < $i; ++$j) {
   		
                  # Check if the dataset matches the criteria
   		if ($ar$r[j]['val1'] > $a_1 and
   		$arr[$j]['val1'] < $a_2 and
   		$arr[$j]['val2'] > $b_1 and
   		$arr[$j]['val2'] < $b_2 and
   		$arr[$j]['val3'] = $c  and
   		$arr[$j]['val4'] != $d) 
                            
                            $x++; #Count if it matches
   		
   	}
  
       // Do somethin with the result of the 2nd loop
      }
}

 

The whole script takes about 1,5 hours... which i consider extremly slow. I tried saving the data into and temporary mysql table and use queries instead of the second loop but it took even longer (about 2,5h)

 

I am therefore looking for a much faster solution to this problem. I have done this only with php and Im not really sure if can make the script any faster or if another language like c++ would speed that up.

 

The array function like array_search and in_array wont help me since Im looking within an area of ($a_1 - $a_2).

 

I am open for any suggestions on how I could get the statistics any faster.

 

Greetz

Plautzer

Link to comment
Share on other sites

HI,

 

to be more specific... within the first loop the script is calculating a set of numbers which are saved to an array.

If the first loop reaches a loop count of 10000, the scipt is searching through the same array for datasets with similar entries.

 

So I basically doing a simple statistic of "historic" data for every dataset:

 

for ($i = 1; $i < 100000; ++$i) {
   
    #Calculating numbers e.g.: (the numbers $a and $b differ between 100 and 10000; c and d only differ a little)
   $a = 2334.42;
   $b = 1234.23;
   $c =  500;
   $d =  1000;

    #Filling the array
    $arr[i]['val1'] = $a;
    $arr[i]['val2'] = $b; 
    $arr[i]['val3'] = $c;
    $arr[i]['val4'] = $d;

    #Defining a "search area"
    $a_1 = $a * 0.95;
    $a_2 = $a * 1.05;   	
    $b_1 = $b * 0.95;
    $b_2 = $b * 1.05;
   
  $x = 0;
   if ($i > 10000){
   	for ($j = 1; $j < $i; ++$j) {
   		
                  # Check if the dataset matches the criteria
   		if ($ar$r[j]['val1'] > $a_1 and
   		$arr[$j]['val1'] < $a_2 and
   		$arr[$j]['val2'] > $b_1 and
   		$arr[$j]['val2'] < $b_2 and
   		$arr[$j]['val3'] = $c  and
   		$arr[$j]['val4'] != $d) 
                            
                            $x++; #Count if it matches
   		
   	}
  
       // Do somethin with the result of the 2nd loop
      }
}

 

The whole script takes about 1,5 hours... which i consider extremly slow. I tried saving the data into and temporary mysql table and use queries instead of the second loop but it took even longer (about 2,5h)

 

I am therefore looking for a much faster solution to this problem. I have done this only with php and Im not really sure if can make the script any faster or if another language like c++ would speed that up.

 

The array function like array_search and in_array wont help me since Im looking within an area of ($a_1 - $a_2).

 

I am open for any suggestions on how I could get the statistics any faster.

 

Greetz

Plautzer

 

Wow.... that is  a nasty CPU heater... for when i the name of god do you need such monster????

 

 

The only way i can think of it to "slightly" to increase your script (ike bringing it to a hour ex. time is)

 

is to make it so that you force the values generated in the loop to become integers, im not sure how to do it tough in php. i know of a definition (int)$before_a_value and that is all i can think of...

 

But lets be reasonable, the very nature of our calculation INVOLVE many cycles THAT have to WAIT for the cycle before the other one is finished.  Which HAS to result in a GIANT time needed for the calculations. And i am not  sure about where you get your data from but i can bet it has something with data statistics to do. Because you don't tell us details i am assuming you store those data in a DB... I would ADVISE you to read how to make Queries on the db you can program in mysql to if you know how, sure it is hell difficult but it can be done. and i think then your execution time will reduce to about 30 min not less...

 

CONCLUSION: computers are fast but even light needs TIME to travel across 2 points you are expecting to much.

 

I assume some manager wants some data statistics. such things cant be done in a OLAP it can be only done on a WAREHOUSE DB structure.

 

This was an assumption and nothing more XD

Link to comment
Share on other sites

In my case the slicing of that 'val3' would narrow the array only a bit down, but if I could categorise 'val1' or 'val2' than I would be able to bring the array down to about 10 000 rows or less... the problem is that I cant determine the right size for the categories manually. Is there script which can determine the borders or the size of the categorize automatically/dynamically?

 

@sangoku: I also looked into stored procedures and wrote some myself. The "process" of the programm was:

[*]Getting the data from mysql

[*]Manipulate the data with php

[*]Writing that data to mysql

[*]Calculate with Stored Procedures and save to a table

[*]Retrieve the result from mysql again

[*]Do somethin with the result

 

Especially access and writing into the db is pretty time consuming. And I really dont need the data to be persistent.

 

Alternatively I started to run some simple test with c++. I wrote a small looping script with a simple calculation within. The result was that c++ is about 35 times fast than the exact same php script. that could get the script time down to about 15-20 minutes or even more.

 

And even more if I would be able to categorize a the array.

 

Greetz

Plautzer

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.