plautzer Posted October 19, 2009 Share Posted October 19, 2009 Hi, I have following mysql query which takes up 0.1 sec to execute and im wondering if there is a way to optimize that. All I want to do is to match all data sets in the table with rest of data sets in the table which equal parameters (e.g. WHERE Clause). The table has about 200 000 data sets right now. So the problem is that it take over 20 000 seconds. I am using a myISAM index (b-tree) on par1 & par2 (combined), par3 and the date. $sql =" SELECT Count(id) as sum, Sum(If(type = 1 ,1,0)) as type1, Sum(If(type = 2 ,1,0)) as type2, Sum(If(type = 3 ,1,0)) as type3, Sum(If(type = 4 ,1,0)) as type4, Sum(If(type = 5 ,1,0)) as type5 From table1 WHERE (par1 != $par1 and par2 != $par2) and (par1 != $par2 and par2 != $par1) and par3 = $par3 and par4 between 800 and 1000 and par5 between 400 and 300 and date < '$date' and date > DATE_SUB('$date', INTERVAL 5 YEAR)"; the values for the ranges of par4 and par5 vary every time. Can u give me some advice on how I can speed the query? Greetz, plautzer Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/ Share on other sites More sharing options...
plautzer Posted October 20, 2009 Author Share Posted October 20, 2009 Hi, I realized that I didn't follow the rules... sorry about that! Here's my Situation: I have a view created of 4 tables: select rm.id , rm.m_id, m.p1_id, m. p2_id , p1.p_st AS p1_st, p2.p_st as p2_st, m.date, m.type , s.c_id from table1 rm left join table2 m on (rm.m_id = m.id) left join table3 p1 on (rm.p_id = p1.id) left join table3 p2 on (rm.p_id = p2.id) left join table4 s on (m.s_id = s.id); The view holds currently about 250 000 data sets. Now i like to run following query: SELECT Count(id) as sum, Sum(If(type = 1 ,1,0)) as type1, Sum(If(type = 2 ,1,0)) as type2, Sum(If(type = 3 ,1,0)) as type3, Sum(If(type = 4 ,1,0)) as type4, Sum(If(type = 5 ,1,0)) as type5 From table1 WHERE (p1_id != 258 and p2_id != 248) and (p1_id != 248 and p2_id != 258) and c_id != 43 and p1_st between 800 and 1200 and p2_st between 700 and 2300 and date < '2008-12-18' and date > DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) Status starting Opening tables System lock Table lock init checking permissions checking permissions checking permissions checking permissions checking permissions optimizing statistics preparing executing Sending data end query end freeing items logging slow query cleaning up Endurance (query in phpmyadmin) 0.000083 0.000379 0.000008 0.000008 0.000005 0.000003 0.000003 0.000003 0.000003 0.000054 0.000022 0.000240 0.000020 0.000004 3.403.687 0.000009 0.000005 0.000067 0.000004 0.000008 Status starting Opening tables System lock Table lock init checking permissions checking permissions checking permissions checking permissions checking permissions optimizing statistics preparing executing end query end freeing items logging slow query cleaning up Endurance(explain in phpmyadmin) 0.000084 0.000349 0.000007 0.000016 0.000006 0.000003 0.000003 0.000003 0.000003 0.000054 0.000027 0.000231 0.000019 0.000026 0.000007 0.000003 0.000028 0.000003 0.000006 The rest of Explain statement: phpMyAdmin 3.2.0.1 / MySQL 5.1.37 5rows: id 1 1 1 1 1 select_type SIMPLE SIMPLE SIMPLE SIMPLE SIMPLE table rm m s p1 p2 type ALL eq_ref eq_ref eq_ref eq_ref possible_keys p1_st,p2_st,m_id PRIMARY,season_id,date,p1_id PRIMARY,c_id PRIMARY,p_st PRIMARY,p_st key NULL PRIMARY PRIMARY PRIMARY PRIMARY key_len NULL 4 4 4 4 ref NULL new_old.rm.m_id new_old.m.s_id new_old.rm.p_id new_old.rm.p_id rows 250000 1 1 1 1 Extra Using where Using where Using where Using where The Problem is that it take far too lang to execute the query when I am looping it about 200 000 times. Can u give me a hint how I might improve it? Here's my mysql.ini: Variable auto increment increment auto increment offset autocommit automatic sp privileges back log basedir big tables binlog cache size binlog format bulk insert buffer size character set client (Globaler Wert) character set connection (Globaler Wert) character set database character set filesystem character set results (Globaler Wert) character set server character set system character sets dir collation connection (Globaler Wert) collation database collation server completion type concurrent insert connect timeout datadir date format datetime format default week format delay key write delayed insert limit delayed insert timeout delayed queue size div precision increment engine condition pushdown error count event scheduler example enum var example ulong var expire logs days flush flush time foreign key checks ft boolean syntax ft max word len ft min word len ft query expansion limit ft stopword file general log general log file group concat max len have community features have compress have crypt have csv have dynamic loading have geometry have innodb have ndbcluster have openssl have partitioning have query cache have rtree keys have ssl have symlink hostname identity ignore builtin innodb init connect init file init slave innodb adaptive hash index innodb additional mem pool size innodb autoextend increment innodb autoinc lock mode innodb buffer pool size innodb checksums innodb commit concurrency innodb concurrency tickets innodb data file path innodb data home dir innodb doublewrite innodb fast shutdown innodb file io threads innodb file per table innodb flush log at trx commit innodb flush method innodb force recovery innodb lock wait timeout innodb locks unsafe for binlog innodb log buffer size innodb log file size innodb log files in group innodb log group home dir innodb max dirty pages pct innodb max purge lag innodb mirrored log groups innodb open files innodb rollback on timeout innodb stats on metadata innodb support xa innodb sync spin loops innodb table locks innodb thread concurrency innodb thread sleep delay innodb use legacy cardinality algorithm insert id interactive timeout join buffer size keep files on create key buffer size key cache age threshold key cache block size key cache division limit language large files support large page size large pages last insert id lc time names license local infile log log bin log bin trust function creators log bin trust routine creators log error log output log queries not using indexes log slave updates log slow queries log warnings long query time low priority updates lower case file system lower case table names max allowed packet max binlog cache size max binlog size max connect errors max connections max delayed threads max error count max heap table size max insert delayed threads max join size max length for sort data max prepared stmt count max relay log size max seeks for key max sort length max sp recursion depth max tmp tables max user connections max write lock count min examined row limit multi range count myisam data pointer size myisam max sort file size myisam recover options myisam repair threads myisam sort buffer size myisam stats method myisam use mmap named pipe net buffer length net read timeout net retry count net write timeout new old old alter table old passwords open files limit optimizer prune level optimizer search depth optimizer switch pbxt auto increment mode pbxt checkpoint frequency pbxt data file grow size pbxt data log threshold pbxt garbage threshold pbxt index cache size pbxt log buffer size pbxt log cache size pbxt log file count pbxt log file threshold pbxt max threads pbxt offline log function pbxt record cache size pbxt row file grow size pbxt sweeper priority pbxt transaction buffer size pid file plugin dir port preload buffer size profiling profiling history size protocol version pseudo thread id (Globaler Wert) query alloc block size query cache limit query cache min res unit query cache size query cache type query cache wlock invalidate query prealloc size rand seed1 rand seed2 range alloc block size read buffer size read only read rnd buffer size relay log relay log index relay log info file relay log purge relay log space limit report host report password report port report user rpl recovery rank secure auth secure file priv server id shared memory shared memory base name skip external locking skip networking skip show database slave compressed protocol slave exec mode slave load tmpdir slave net timeout slave skip errors slave transaction retries slow launch time slow query log slow query log file sort buffer size sql auto is null sql big selects sql big tables sql buffer result sql log bin sql log off sql log update sql low priority updates sql max join size sql mode sql notes sql quote show create sql safe updates sql select limit sql slave skip counter sql warnings ssl ca ssl capath ssl cert ssl cipher ssl key storage engine sync binlog sync frm system time zone table definition cache table lock wait timeout table open cache table type thread cache size thread handling thread stack time format time zone timed mutexes timestamp tmp table size tmpdir transaction alloc block size transaction prealloc size tx isolation unique checks updatable views with limit version version comment version compile machine version compile os wait timeout warning count [/td][td] Wert für diese Sitzung / Globaler Wert 1 1 ON ON 50 D:\xampplite\mysql\ OFF 32,768 STATEMENT 8,388,608 utf8 latin1 utf8 latin1 latin1 binary utf8 latin1 latin1 utf8 D:\xampplite\mysql\share\charsets\ utf8_general_ci latin1_swedish_ci latin1_swedish_ci latin1_swedish_ci 0 1 10 D:\xampplite\mysql\data\ %Y-%m-%d %Y-%m-%d %H:%i:%s 0 ON 100 300 1 4 ON 0 OFF e1 8 0 OFF 1,8 ON + -><()~*:""&| 84 4 20 (built-in) OFF D:/xampplite/mysql/data/mysql.log 1,024 YES YES NO YES YES YES YES NO DISABLED YES YES YES DISABLED YES zelaptop 0 OFF ON 2,097,152 8 1 16,777,216 ON 0 500 ibdata1:10M:autoextend D:\xampplite\mysql\data\ ON 1 4 OFF 1 0 50 OFF 8,388,608 5,242,880 2 D:\xampplite\mysql\data\ 90 0 1 300 OFF ON ON 20 ON 8 10 ON 0 28,8 131,072 OFF 524,288,000 300 1,024 100 D:\xampplite\mysql\share\english\ ON 0 OFF 0 en_US GPL ON OFF OFF OFF OFF D:\xampplite\mysql\data\mysql.err FILE OFF OFF OFF 1 10 OFF ON 1 1,048,576 4,294,963,200 1,073,741,824 10 151 20 64 16,777,216 20 1,84467E+19 1,024 16,382 0 4,294,967,295 1,024 0 32 0 4,294,967,295 0 256 6 2,146,435,072 OFF 1 134,217,728 nulls_unequal OFF ON 8,192 30 10 60 OFF OFF OFF OFF 755 1 62 index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on 0 28MB 2MB 64MB 50 32MB 256K 16MB 3 32MB 158 0 32MB 256K 0 1MB D:\xampplite\mysql\data\mysql.pid D:\xampplite\mysql\lib/plugin 3,306 32,768 OFF 15 10 6 0 8,192 1,048,576 4,096 0 ON OFF 8,192 4,096 262,144 OFF 524,288 relay-log.info ON 0 3,306 0 OFF 1 OFF MYSQL ON OFF OFF OFF STRICT D:\xampplite\tmp\ 3,6 OFF 10 2 OFF D:/xampplite/mysql/data/mysql-slow.log 524,288 ON ON OFF OFF ON OFF ON OFF 1,84467E+19 ON ON OFF 1,84467E+19 OFF MyISAM 0 ON Westeurop 256 50 256 MyISAM 0 one-thread-per-connection 196,608 %H:%i:%s Europe/Paris OFF 1,257,939,336 16,777,216 D:/xampplite/tmp/ 8,192 4,096 REPEATABLE-READ ON YES 05.01.1937 Source distribution ia32 Win32 28,8 1 Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940406 Share on other sites More sharing options...
kickstart Posted October 20, 2009 Share Posted October 20, 2009 Hi Do you have to execute it that many times, or can you use GROUP BY and execute it once? Probably comes down to the values of $par1 and $par2. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940413 Share on other sites More sharing options...
plautzer Posted October 20, 2009 Author Share Posted October 20, 2009 Do you have to execute it that many times, or can you use GROUP BY and execute it once? Probably comes down to the values of $par1 and $par2. Right... all the parameters change with every loop... therefore i guess its pretty difficult to do it in one loop. In your experience... how fast can I possibly process such query - under 0.001s? Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940417 Share on other sites More sharing options...
Mchl Posted October 20, 2009 Share Posted October 20, 2009 Even if you cut down query execution time, sending data to and from mysql 200k times will still take a considerable amount of time, and there's nothing you can do about it. Perhaps tell us what is this query doing, so that we might think on how to eliminate the need to loop it. Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940425 Share on other sites More sharing options...
kickstart Posted October 20, 2009 Share Posted October 20, 2009 Hi Bit nasty then, unless the parameters come from another table and you could do a JOIN of some kind. Only other thing to do would be to use a COUNT to draw out the counts of each type and to then manually sum up the count of ID when looping through the returned records. This might be more efficient than trying to do IF statements on numerous rows. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940430 Share on other sites More sharing options...
Mchl Posted October 20, 2009 Share Posted October 20, 2009 Just run this code $mysqli = new mysqli('localhost','root','','test'); $time1 = microtime(true); for ($i = 1; $i <= 200000; $i++) { $result = $mysqli->query('SELECT 1'); $row = $result->fetch_assoc(); } $time2 = microtime(true); echo "Elapsed time: ".($time2 - $time1)."s"; on my PC. One can hardly imagine simpler query, and it still took over 25 seconds. When I changed the query to 'SELECT * FROM t1', where t1 has 5 rows in two columns, elapsed time exceeded 40 seconds. Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940433 Share on other sites More sharing options...
plautzer Posted October 20, 2009 Author Share Posted October 20, 2009 Perhaps tell us what is this query doing, so that we might think on how to eliminate the need to loop it. I want to crossmatch all data sets within the table (or view) and take some statistics. It works like this... when I am adding a new data sets with various parameter I want to count all the similiar data sets within the table that: - lie within a 5 year period - have simliar p1_st and different id (p1_id, p2_id, c_id) This query takes me hours crossmatch all... therefore I thought to do that with arrays (http://www.phpfreaks.com/forums/index.php/topic,273430.0.html) if the query optimization isnt working. Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940440 Share on other sites More sharing options...
kickstart Posted October 20, 2009 Share Posted October 20, 2009 Hi I think I would need more to go on (such as where the various parms come from), but suspect you could do this using a cross join of your view against itself. Ie, base it on something like this (probably a few typos but hopefully gives you the right idea):- SELECT * FROM table1 a CROSS JOIN table1 b WHERE b.p1_st between 800 and 1200 and b.p2_st between 700 and 2300 and a.date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18' and b.date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18' AND a.p1_id != b.p1_id AND a.p2_id != b.p2_id AND a.id != b.id All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940448 Share on other sites More sharing options...
plautzer Posted October 20, 2009 Author Share Posted October 20, 2009 looks good... i could store the parameter in the db as well... I have 2 questions thou: - Can I put the count just in the select statement? - why did put in the date twice I thought of something like this: select id, sum, type1,... from ( SELECT Count(id) as sum, Sum(If(type = 1 ,1,0)) as type1, Sum(If(type = 2 ,1,0)) as type2, Sum(If(type = 3 ,1,0)) as type3, Sum(If(type = 4 ,1,0)) as type4, Sum(If(type = 5 ,1,0)) as type5 From table1 WHERE (p1_id != this->p1_id and p2_id != this->p2_id) and (p1_id != this->p2_id and p2_id != this->p1_id) and c_id != this->c_id and p1_st between this->p1_st1 and this->p1_st2 and p2_st between this->p2_st1 and this->p2_st2 and date < this->date and date > DATE_SUB(this->date , INTERVAL 5 YEAR) ) "this->" is supposed to be the parameters from the outer select Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940458 Share on other sites More sharing options...
kickstart Posted October 20, 2009 Share Posted October 20, 2009 Hi You could put the counts something like this:- SELECT a.id, Count(b.id) as sum, Sum(If(b.type = 1 ,1,0)) as type1, Sum(If(b.type = 2 ,1,0)) as type2, Sum(If(b.type = 3 ,1,0)) as type3, Sum(If(b.type = 4 ,1,0)) as type4, Sum(If(b.type = 5 ,1,0)) as type5 FROM table1 a CROSS JOIN table1 b WHERE b.p1_st between 800 and 1200 and b.p2_st between 700 and 2300 and a.date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18' and b.date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18' AND a.p1_id != b.p1_id AND a.p2_id != b.p2_id AND a.id != b.id GROUP BY a.id Reason I put date twice is I have assumed you have pseudo coded it as though you have a column on the table called date and you are only interested in those records within 5 years, and as the table is joined against itself you need to only deal with records within 5 years on both copies of itself. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940464 Share on other sites More sharing options...
plautzer Posted October 20, 2009 Author Share Posted October 20, 2009 I tried the query and made following observation: - results are only shown if there is a least one hit (or count) - it doesn't work well with the view (takes an hour with 10 000 data sets) - if I put all parameters in one table: +- it works extremely fast with 50 000 data sets (about 8 seconds) +- and somehow really slow with 100 000 data sets (over an hour) Do u have clue why this might be? Here are some parameters I changed in my my.ini: key_buffer = 500M max_allowed_packet = 1M table_cache = 256 sort_buffer_size = 512K net_buffer_length = 8K read_buffer_size = 256K read_rnd_buffer_size = 512K myisam_sort_buffer_size = 128M [myisamchk] key_buffer = 40M sort_buffer_size = 40M read_buffer = 40M write_buffer = 40M greetz, Plautzer Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940698 Share on other sites More sharing options...
kickstart Posted October 20, 2009 Share Posted October 20, 2009 Hi Issue with nothing unless there is 1 hit could probably be solved by removing the line saying AND a.id != b.id. This would mean you always get one which would be the row itself, and every count would include it (shouldn't be a problem as long as you realise this when using the results). With 50k it will generate 2,500,000,000 records from the initial join. With 100k it will generate 10,000,000,000 records from the initial join. Ramps up rapidly. You could possibly exclude some records from the "b" part of the statement prior to the join (ie, both based on the date and the checks for p1_st and p2_st). I take it from your comments you have dropped the view and just put the full selects into this code? If so should be easy enough. Even at an hour for 100k records it appears to be far faster than your initial loop. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940713 Share on other sites More sharing options...
plautzer Posted October 21, 2009 Author Share Posted October 21, 2009 Hi, Even at an hour for 100k records it appears to be far faster than your initial loop. its faster than calling the loop from the php script. What I dont get is why a mysql query would be faster than looping thru an array. I could narrow the inner loop down to a lest 5000 loops instead of 100k. The would be 500 million records in an array loop compared to 10 billion records in the query. So the array should be 20 times faster?! Is there a way to calculate the time which an array/mysql query would need to process? I wanna know whats minimal time usage... so that I dont go nuts optimizing You could possibly exclude some records from the "b" part of the statement prior to the join (ie, both based on the date and the checks for p1_st and p2_st). I take it from your comments you have dropped the view and just put the full selects into this code? If so should be easy enough. Yea, I replaced the view with a table. I'll try taking the IDs and date in the join (on)... but how does that speed up the query? Does it matter which engine I use? MyISAM oder Innodb? Thx, I appreciate the help! Plautzer Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940967 Share on other sites More sharing options...
kickstart Posted October 21, 2009 Share Posted October 21, 2009 Hi You are dealing with massive sets of data, and SQL is optimised for dealing with them. While php is a comparatively slow interpretted language. If you keep passing different pieces of SQL to the database then it is having to interpret it each time, rather than just once. On top of that you could possibly be experiencing delays from network traffic is you db is on one server and your script on another. Excluding rows before the JOIN might help as it will reduce the size of the intermediate table that is generated. Ie, if you have 100k records and join it to itself you get 10,000,000,000. If you can exclude half the records from the copy you are joining to then you reduce that to 5,000,000,000 Not sure on the db engine. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-940992 Share on other sites More sharing options...
plautzer Posted October 21, 2009 Author Share Posted October 21, 2009 Excluding rows before the JOIN might help as it will reduce the size of the intermediate table that is generated. I am not sure what u mean by excluding. Do I exlcude rows before a join when I put some exceptions in the join condition (Ie join on aid = sid and number > 1000 and ...) ?? Until now i thought that there is no difference putting additional conditions in the join condition or in the where condition. Greetz Plautzer Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941002 Share on other sites More sharing options...
kickstart Posted October 21, 2009 Share Posted October 21, 2009 Hi Normally best not, but when you are doing a cross join you are generating an intermediate table that has a number of rows that is the square of the original number. If you can reduce the rows before joining against itself it should make a large difference. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941009 Share on other sites More sharing options...
plautzer Posted October 21, 2009 Author Share Posted October 21, 2009 to be clear, this means that I have to reduce the rows of table b for each record in table a by finding the records in b that lie within the date range: (in my case that would reduce the table b to 1000-20000 records depending on the date of table a) SELECT a.id, Count(b.id) as sum, Sum(If(b.type = 1 ,1,0)) as type1, Sum(If(b.type = 2 ,1,0)) as type2, Sum(If(b.type = 3 ,1,0)) as type3, Sum(If(b.type = 4 ,1,0)) as type4, Sum(If(b.type = 5 ,1,0)) as type5 FROM table1 a CROSS JOIN table1 b on (b.date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18') WHERE b.p1_st between 800 and 1200 and b.p2_st between 700 and 2300 AND a.p1_id != b.p1_id AND a.p2_id != b.p2_id AND a.id != b.id GROUP BY a.id If i got that right... why not putting all the conditions in join? SELECT a.id, Count(b.id) as sum, Sum(If(b.type = 1 ,1,0)) as type1, Sum(If(b.type = 2 ,1,0)) as type2, Sum(If(b.type = 3 ,1,0)) as type3, Sum(If(b.type = 4 ,1,0)) as type4, Sum(If(b.type = 5 ,1,0)) as type5 FROM table1 a CROSS JOIN table1 b on (b.date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18' and b.p1_st between 800 and 1200 and b.p2_st between 700 and 2300 AND a.p1_id != b.p1_id AND a.p2_id != b.p2_id AND a.id != b.id) GROUP BY a.id Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941031 Share on other sites More sharing options...
Mchl Posted October 21, 2009 Share Posted October 21, 2009 MyISAM and InnoDB are comparatively fast these days. Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941034 Share on other sites More sharing options...
kickstart Posted October 21, 2009 Share Posted October 21, 2009 Hi To be honest I have never used an ON clause in a CROSS JOIN (as it doesn't appear logical with what cross joins are meant to be used for, but may well work). However if it works then is would appear to be an easy way of what I was suggesting. If not then something like this:- SELECT a.id, Count(b.id) as sum, Sum(If(b.type = 1 ,1,0)) as type1, Sum(If(b.type = 2 ,1,0)) as type2, Sum(If(b.type = 3 ,1,0)) as type3, Sum(If(b.type = 4 ,1,0)) as type4, Sum(If(b.type = 5 ,1,0)) as type5 FROM table1 a CROSS JOIN (SELECT * FROM table1 WHERE date BETWEEN DATE_SUB('2008-12-18' , INTERVAL 5 YEAR) AND '2008-12-18' and p1_st between 800 and 1200 and p2_st between 700 and 2300) b WHERE a.p1_id != b.p1_id AND a.p2_id != b.p2_id AND a.id != b.id GROUP BY a.id All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941035 Share on other sites More sharing options...
plautzer Posted October 21, 2009 Author Share Posted October 21, 2009 Hi, i tried different versions of the query but it doesnt work out too great. I also tried a left which got me the 0 records as well but it takes about 4 minutes for 1 000 000 records and over 15 min for 64 000 000. So over 100 billion would take hours. I thought of another solution where I categorize p1_st in quadrants... ie i have x quadrants that a different ranges like 0-500, 501-100, 1001... that would help narrowing down the search in certain quadrant.. with only a percentage of 250 k records to look in. Does that make sense for a database query? If so... do u have clue how i could realize that? greetz Plautzer Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941333 Share on other sites More sharing options...
kickstart Posted October 21, 2009 Share Posted October 21, 2009 Hi It would help to cut it down into quadrants (you could just assign the quadrants a number and check that they are equal in the ON clause), but it you are trying to id records that are close to each other then not sure if it would help. Whatever you do with 100 billion records would take ages. Do you really need you app to process this number of records? All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941560 Share on other sites More sharing options...
plautzer Posted October 22, 2009 Author Share Posted October 22, 2009 well, its a statistical counting therefore i have to process all of them. some thoughts.. I would be a lot easier if I could relate to previous record and just add the count and type or i had some kind of formula where i could calculate the result from... but this seems to be impossible since the range of p1_st, p2_st and the date changes every time. And even if I fix ranges of p1_st, p2_st there is still the date that cannot. Or do u see other possiblities? Another thought is if stored procedures/functions would help me out here? i could run the statistics on insert of a record an calculate do the count right away. the advantage i see is that the database doesnt always have to thru the whole db because its amount of records increases slowly on every insert. Greetz, Plautzer Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941733 Share on other sites More sharing options...
Mchl Posted October 22, 2009 Share Posted October 22, 2009 You can use triggers to perform an action on every insert. Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941740 Share on other sites More sharing options...
kickstart Posted October 22, 2009 Share Posted October 22, 2009 Mchls suggestion sounds best. You could insert / update to a stats table on each insert, and then just process that when you require the stats. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/178228-optimizing-a-query-taking-upto-01s/#findComment-941756 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.