plautzer Posted October 19, 2009 Share Posted October 19, 2009 Hi, Im running a script where am I cross matching about 200 000 data sets with each other. Each data set consists of 8 parameters and I want to count all datasets which have similar or the same parameters for each data set. Right now, I am doing the matching via a MySql query which im calling about 200 000 times. The problem is that using a query is extremely expensive… it takes up to 2 hours until the script is done. So I am wondering if there is a better method to cross match data sets and if some of could help me find a better solution. While researching I found out that arrays may be a faster alternative to queries. And so far, I identified 3 possible ways for cross matching: 1. nested foreach () loops foreach($array as ar1) foreach($array as ar2) if ($ar1[0] == $ar1[0])…. 2. Using an Array_map with Callback function, so that i would have only one "hand coded" loop foreach($array as arr) if ($arr[0] == $parameter)…. 3. Array walk where i could save one "hand coded" loop as well. Theoretically would be the best/fastest way to go about it? Can Anyone tell me what technically the difference between those 3 ways is? And which one is the better approach or if there other alternatives to them? I am thankful for any advice that helps me reduce execution time! Greetz plautzer Link to comment https://forums.phpfreaks.com/topic/178209-best-way-to-cross-matching-large-datasets/ Share on other sites More sharing options...
JAY6390 Posted October 19, 2009 Share Posted October 19, 2009 Can you give some examples of "dummy" data and what you want to do with it. It could be that you just need to amend your current queries to drastically reduce the query count Link to comment https://forums.phpfreaks.com/topic/178209-best-way-to-cross-matching-large-datasets/#findComment-939581 Share on other sites More sharing options...
plautzer Posted October 19, 2009 Author Share Posted October 19, 2009 Here is an example of a query with 6 parameter. In this case par1,2,3 should unequal to the parameters of the current data set. par4, par5 and are supposed to be in a range which varies per data set. In the end I do simple statisitical count. $sql =" SELECT Count(id) as sum, Sum(If(type = 1 ,1,0)) as type1, Sum(If(type = 2 ,1,0)) as type2, Sum(If(type = 3 ,1,0)) as type3, Sum(If(type = 4 ,1,0)) as type4, Sum(If(type = 5 ,1,0)) as type5 From table1 WHERE (par1 != $id1 and par2 != $id2) and (par1 != $id2 and par3 != $id1) and par2 = $par3 and par4 between 800 and 1000 and par5 between 400 and 300 and date < '$date' and date > DATE_SUB('$date', INTERVAL 5 YEAR)"; The query takes up to 0,1 seconds. Link to comment https://forums.phpfreaks.com/topic/178209-best-way-to-cross-matching-large-datasets/#findComment-939591 Share on other sites More sharing options...
plautzer Posted October 19, 2009 Author Share Posted October 19, 2009 quick question on the side... how long does a query usually take? Is it possible to get under 0.001 or even 0.0001 seconds? Link to comment https://forums.phpfreaks.com/topic/178209-best-way-to-cross-matching-large-datasets/#findComment-939666 Share on other sites More sharing options...
cags Posted October 19, 2009 Share Posted October 19, 2009 Depends how complex the query is, I just ran a quick test on "SELECT * FROM table" ran in a loop 50,000 times, which took 0.364953 secs to succeed. Which is roughly 0.00000729906 per query (assuming theres no caching going on). Edit: Perhaps start a sub-question in the MySQL board asking how to optimise that query. Link to comment https://forums.phpfreaks.com/topic/178209-best-way-to-cross-matching-large-datasets/#findComment-939679 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.