Jump to content

plautzer

Members
  • Posts

    40
  • Joined

  • Last visited

    Never

Profile Information

  • Gender
    Not Telling

plautzer's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. In my case the slicing of that 'val3' would narrow the array only a bit down, but if I could categorise 'val1' or 'val2' than I would be able to bring the array down to about 10 000 rows or less... the problem is that I cant determine the right size for the categories manually. Is there script which can determine the borders or the size of the categorize automatically/dynamically? @sangoku: I also looked into stored procedures and wrote some myself. The "process" of the programm was: [*]Getting the data from mysql [*]Manipulate the data with php [*]Writing that data to mysql [*]Calculate with Stored Procedures and save to a table [*]Retrieve the result from mysql again [*]Do somethin with the result Especially access and writing into the db is pretty time consuming. And I really dont need the data to be persistent. Alternatively I started to run some simple test with c++. I wrote a small looping script with a simple calculation within. The result was that c++ is about 35 times fast than the exact same php script. that could get the script time down to about 15-20 minutes or even more. And even more if I would be able to categorize a the array. Greetz Plautzer
  2. HI, to be more specific... within the first loop the script is calculating a set of numbers which are saved to an array. If the first loop reaches a loop count of 10000, the scipt is searching through the same array for datasets with similar entries. So I basically doing a simple statistic of "historic" data for every dataset: for ($i = 1; $i < 100000; ++$i) { #Calculating numbers e.g.: (the numbers $a and $b differ between 100 and 10000; c and d only differ a little) $a = 2334.42; $b = 1234.23; $c = 500; $d = 1000; #Filling the array $arr[i]['val1'] = $a; $arr[i]['val2'] = $b; $arr[i]['val3'] = $c; $arr[i]['val4'] = $d; #Defining a "search area" $a_1 = $a * 0.95; $a_2 = $a * 1.05; $b_1 = $b * 0.95; $b_2 = $b * 1.05; $x = 0; if ($i > 10000){ for ($j = 1; $j < $i; ++$j) { # Check if the dataset matches the criteria if ($ar$r[j]['val1'] > $a_1 and $arr[$j]['val1'] < $a_2 and $arr[$j]['val2'] > $b_1 and $arr[$j]['val2'] < $b_2 and $arr[$j]['val3'] = $c and $arr[$j]['val4'] != $d) $x++; #Count if it matches } // Do somethin with the result of the 2nd loop } } The whole script takes about 1,5 hours... which i consider extremly slow. I tried saving the data into and temporary mysql table and use queries instead of the second loop but it took even longer (about 2,5h) I am therefore looking for a much faster solution to this problem. I have done this only with php and Im not really sure if can make the script any faster or if another language like c++ would speed that up. The array function like array_search and in_array wont help me since Im looking within an area of ($a_1 - $a_2). I am open for any suggestions on how I could get the statistics any faster. Greetz Plautzer
  3. You are refering that C++ handles it differently. Does another language like C++ perform better with large datasets than php? If so, can u tell why and how much we are talkting about?
  4. well... the question still stands... i have about 100k datasets which i need to search for certain values. My solution with an array is very slow therefore Im looking for faster alternatives to do some calculation on my computer
  5. the script isnt for page content... im am doin some calculations with it...
  6. yea... im creating an array table in first loop which I am searching every time in the second loop. Another approach was to load the data in mysql database and apply a search query, which took even longer (0.1s per query). i dont need persistent data... i therefore thought storing it in an array would be better. I cant think of another way to storing the data for an quicker search afterwards... do u have any suggestions?
  7. Hi, i have multidimensional array with about 100k rows which I want to check for certain values. i implemented the search a with for-loop which isnt performaning as well I like. Do u have any suggestions on how I can speed up the search? for ($i = 1; $i < 100000; ++$i) { $a_1 = 1000+$i; $a_2 = 2000+$i; $b_1 = 500+$i; $b_2 = 700+$i; for ($j = 1; $j < 100000; ++$j) { if ($arr[i]['val1'] > $a_1 and $arr[i]['val1'] < $a_2 and $arr[i]['val2'] > $b_1 and $arr[i]['val2'] < $b_2) $x++; } } Thx in advance. greetz Plautzer
  8. Hi, I am writing a paper on the effects of external data manipulation in database layer to the application layer. Or in other words does a direct change in the database effect the (object) data consistency in the application and if so how? A possible scenario which I can think of is, if an object is stored in the heap for a longer time and another application changes the persistent data of the object in the database than the data in the heap would be inconsistent to the data in database… which means the application working with the object works with the wrong data. I am trying to understand how the memory management / allocation and the (general) lifetime of objects / data in the heap/cache is working within PHP amongst other languages… in order to find an answer to the question above. Since I am new to this topic I want to ask if you can give me a hint where to look / or what might be interesting to look at in order to get an answer? Additionally I have following questions: 1. How long is a lifetime of object/application? For PHP is it the lifetime of the script or the time an application is available online? 2. How long does the object/ variable live? To the end of the script / end of application 3. If the object live longer than a script... can it be adresse by other another script? (I know some things about PHP and IMHO an object is created for the time of a script, to get the data from and to the database... am I right?) I am grateful for any advice for that matter... Plautzer
  9. i think detecting an spelling error (missing a letter or switched letter ...) should be pretty general, isnt it? Dont want to invent the wheel there again Im thinking of comparing each letter and its position within the word and if its matching lets say over 95% its likely to be the same word (e.g. spelling e instead of an è).. in that case "Denial" and "Daniel" wouldnt be detected as the same word. Do u know how google does that? As for the database design... i have no other choice but to combine these tables thru matching names either manually or automatic thru a algotithm.. and i want to do as much automatic as possible.
  10. Hi, I want to compare to tables A and B containing similar names. If the names from B don't exist in the Table A than i wanna insert them in A. Before i want to do that, want to make sure that it is a new Name and here comes tricky part... because the names can differ in many ways (e.g. misspelled, some could have middle names in either table...) ... so some LIKE "%name%" wont do it. Are there more sophisticated approaches to determine if to names are the same? I was thinking of the google function (DO u mean:....) if u misspelled a word. Are there scripts that would help me accomplish that? Greetz Plautzer
  11. Thank you, First of all it works well and second the performance got a lil bit worse (from 70s to 89s in my script) thats may EXPLAIN: id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 482 Using temporary; Using filesort 2 DERIVED table ref pid_cdate 4 459 I set an index on (pid, cdate). Do u know how I get rid of the filesort and the temporary?
  12. One thought I had was to combine query 2 and 3 with something like that: SELECT sum(flag) FROM table WHERE pid = 1000 Group by "cdate >= '2008-12-12' and cdate < '2008-12-12'" I know its not correct but is it possible to divide a table by the date and sum/group them individually?
  13. Hi, I have a 4 queries (3 Select, 1 Update) that run through the same dataset and I wonder if there is a way to combine (some of) these queries in order to gain a better performance. I need three things. First (1) I need the lastest row within a certain time frame: SELECT last_update, value FROM table WHERE pid = 1000 AND cdate >= '2008-12-12' Order by cdate DESC Second (2) I need to count a flag within that same time frame: SELECT sum(flag) FROM table WHERE pid = 1000 AND cdate >= '2008-12-12' Third (3) I want to count the same flag outside the time frame: SELECT sum(flag) FROM table WHERE pid = 1000 AND cdate < '2008-12-12' And last (4) I want to update the flag outside the time frame: UPDATE table SET flag = 1 WHERE pid = 1000 AND cdate < '2008-12-12' and flag is null I wrapped my head around the queries for quiet some time but I cant find way to improve the performance further. Do u see a way to optimize them by maybe combining one or the other? Or should be better to run them seperately? Greetz Plautzer
  14. I just had the chane to try it out... works good on temp tables as well. Thx!
  15. Hi, I tried it on my table and it works like charm. But will it also work on Temporary Tables? I heard that a self join isnt possible.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.