drbigfresh Posted March 12, 2008 Share Posted March 12, 2008 I have a large database, 5.5 Million + rows, and I am trying to select a few thousand random records every 10 minutes or so. I was having a TON of performance issues, so I added a field to the database called randomnum, and populated it with a random number.... so now I do the select and ordering it by the randomnum. Although it is still a performace boost, my queries are taking upwards of 30 seconds. Does anyone have any suggestions for optimizing these types of queries? (And yes, the field is indexed). When I run the query with and explain I get: 1 SIMPLE wedding_user range Active Active 1 NULL 3443380 Using where; Using filesort If I remove the "where active='Y'" the query does have a nice speed bump. Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/ Share on other sites More sharing options...
effigy Posted March 12, 2008 Share Posted March 12, 2008 Have you tried this? Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490695 Share on other sites More sharing options...
drbigfresh Posted March 12, 2008 Author Share Posted March 12, 2008 I did actually see that, and it is working nicely for me. I think the real performance hit I am taking is in the where clause and the subquery. My query is like this: select id,fname,lname,address1,terms from site_user where id not in(select userid from send_log where campaignid=4) and active='Y' limit 5000 if I remove [where id not in(select userid from send_log where campaignid=4) and active=Y] it flies along pretty nice.... What can I do to make this part run better? Someone suggested trying a union.... Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490711 Share on other sites More sharing options...
fenway Posted March 12, 2008 Share Posted March 12, 2008 Can I see the EXPLAIN for that query/ Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490715 Share on other sites More sharing options...
drbigfresh Posted March 12, 2008 Author Share Posted March 12, 2008 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY site_user ref Active Active 1 const 3443380 Using where 2 DEPENDENT SUBQUERY send_log index_subquery campaignid,userid userid 5 func 4 Using where Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490731 Share on other sites More sharing options...
fenway Posted March 12, 2008 Share Posted March 12, 2008 That's what I thought. Even though that subquery really shouldn't be dependent on the outer one (since the where clause checks an inner table field, correct?), that NOT IN means it has to run that every time, AFAIK. That's probably why it's so slow. You'd be better off with a LEFT JOIN. Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490754 Share on other sites More sharing options...
drbigfresh Posted March 12, 2008 Author Share Posted March 12, 2008 You're correct, the where is checking an inner-table field on site_user. If I remove the where all together, and just have the not in, would a left join still see a performance boost? Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490763 Share on other sites More sharing options...
fenway Posted March 13, 2008 Share Posted March 13, 2008 NOT IN() is just plain bad... no index usage. But with a left join, you won't need either. Link to comment https://forums.phpfreaks.com/topic/95847-performance-advice-random-record-on-large-dataset/#findComment-490964 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.