teerock Posted February 6, 2007 Share Posted February 6, 2007 Hi fellow scripters, I'm a newbie to php and MySQL and have an application that requires me to display a quote of the day in spanish (plucked randomly from a MySQL table) and then translate it to English when the user clicks on it. Any help would be much appreciated. Cheers, Teerock Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 $sql = "select * from table where <your condition> .... order by rand() limit 1"; The trick lies at ..order by rand()... It will return a random row. Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 $sql = "select * from table where <your condition> .... order by rand() limit 1"; The trick lies at ..order by rand()... It will return a random row. Unfortunately, if this table is of any appreciable size, this will be very, very slow. Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 yes fenway, good thinking. If table size is the case, then I would generate a random number between 1 and the size of table: $randomID = rand(1,sizeoftable); //you have to query table to get this sizeoftable then: $sql = "select * from table where <your condition> .... order by rand() limit 1,$randomID"; Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 If table size is the case, then I would generate a random number between 1 and the size of table: $randomID = rand(1,sizeoftable); //you have to query table to get this sizeoftable then: $sql = "select * from table where <your condition> .... order by rand() limit 1,$randomID"; That's not always going to return one row either, because you don't know how many results will be returned. Also, you'd want the offset first, then 1. And even that's horrible for any $randomID much larger than 100 -- if it were 10,000, you'd be throwing away 10,000 rows! I should clarify -- it's not the table size that matters, but rather the number of records that match the WHERE clause -- even if you have 10 millions rows, but your where clause returns ~20 rows all the time, order by rand() will be just fine. This is the best performance-wise: SELECT * FROM yourTable WHERE id >= ( SELECT FLOOR( MAX(id) * RAND() ) FROM yourTable ) LIMIT 1; But with a few caveats: 1) Obviously, you can pull out this subquery for <v4.1; but it results in a constant, so performance is very good. 2) It's unlikely, but possible, that no rows will be returned, dependent on the distribution of values in the id field (less random = bad) 3) Or worse, that the some rows will never be returned; if you had ids 1,2,3,10000,10001,etc...., it's highly unlikely that you'll ever get the first few rows back. But in general, it works fairly well on "normal"-looking tables. Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 You might want to look at this fenway. I've tested them out and results are unexpected. table city contains over 2 mil records of world's city, idn is the primary index field. select * from city limit 100000, 1; 1 row in set (0.06 sec) select * from city limit 1450000, 1; 1 row in set (0.98 sec) the subquery to get the random id: mysql> SELECT FLOOR(MAX(idn) * RAND()) FROM city; +--------------------------+ | FLOOR(MAX(idn) * RAND()) | +--------------------------+ | 2151558 | +--------------------------+ 1 row in set (2.70 sec) the full query: SELECT * FROM city WHERE idn >= (SELECT FLOOR( MAX(`idn`) * RAND()) FROM city) LIMIT 1; I had to shut down mysql because it simply taking too long. Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 Well, the first two make sense... for larger values of offset, it should be "slower". But I don't understand the one with rand()... unless it's not using the index... could you post the EXPLAIN? I know I've used this before, and now I'm very curious. Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 mysql> select max(idn) from city; +----------+ | max(idn) | +----------+ | 2649291 | +----------+ 1 row in set (0.00 sec) mysql> select rand(); +------------------+ | rand() | +------------------+ | 0.37278621957897 | +------------------+ 1 row in set (0.01 sec) mysql> select max(idn) * rand() from city; +-------------------+ | max(idn) * rand() | +-------------------+ | 571916.39771917 | +-------------------+ 1 row in set (2.80 sec) mysql> select rand() * max(idn) from city; +-------------------+ | rand() * max(idn) | +-------------------+ | 417164.5915375 | +-------------------+ 1 row in set (2.78 sec) mysql> select floor(max(idn) * rand()) from city; +--------------------------+ | floor(max(idn) * rand()) | +--------------------------+ | 2546015 | +--------------------------+ 1 row in set (2.78 sec) mysql> select city_name from city where idn>=2000000 limit 1; +-----------+ | city_name | +-----------+ | mozzhukho | +-----------+ 1 row in set (0.00 sec) select city_name from city where idn >= (SELECT FLOOR( MAX(`idn`) * RAND()) FROM city) limit 1; still waiting for result as I am posting this. I can't explain these, but you see the result. Strangely, mysql taking too long multiply rand() * max(idn) I tried reverse the numbers but nothing changed. I am gonna wait for the last query to be executed and post the result (if any) here. my system: window xp sp2, dual core 266, 1gb ram. Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 Run this query: explain select floor(max(idn) * rand()) from city Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 Run this query: explain select floor(max(idn) * rand()) from city <row> <field name="id">1</field> <field name="select_type">SIMPLE</field> <field name="table">city</field> <field name="type">index</field> <possible_keys/> <field name="key">PRIMARY</field> <field name="key_len">779</field> <ref/> <field name="rows">2649291</field> <field name="Extra">Using index</field> </row> I exported as xml to make it easier to read. Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 I'm clueless of what happened, this is not problem of multiplying 2 floats, nor problem with long * float mysql> select (352.5235262352 * 2.9352836523); +---------------------------------+ | (352.5235262352 * 2.9352836523) | +---------------------------------+ | 1034.75654360933272482096 | +---------------------------------+ 1 row in set (0.00 sec) mysql> select rand() * rand(); +------------------+ | rand() * rand() | +------------------+ | 0.17015589508702 | +------------------+ 1 row in set (0.00 sec) mysql> select (235232 * 0.352398273); +------------------------+ | (235232 * 0.352398273) | +------------------------+ | 82895.350554336 | +------------------------+ 1 row in set (0.01 sec) Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 As am I... it's clearly using the index for MAX(), and the rest is just math... I don't see how this can be slow at all, but it doesn't seem to scale at all. Can you try the equivalent query on smaller tables (e.g. 10 records, 100 records, 1K records, etc)? Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 hi fenway, this ipgeo table has about 75K records and here are the results: mysql> SELECT FLOOR(MAX(idn) * RAND()) FROM ipgeo; +--------------------------+ | FLOOR(MAX(idn) * RAND()) | +--------------------------+ | 70871 | +--------------------------+ 1 row in set (0.05 sec) mysql> SELECT * FROM ipgeo WHERE idn >= (SELECT FLOOR( MAX(`idn`) * RAND()) FROM ipgeo) LIMIT 1; +-----+----------+--------------+------------+------------+--------------+------ --------+ | idn | ipfrom | ipto | begin_num | end_num | country_code | count ry_name | +-----+----------+--------------+------------+------------+--------------+------ --------+ | 521 | 61.0.0.0 | 61.3.255.255 | 1023410176 | 1023672319 | IN | India | +-----+----------+--------------+------------+------------+--------------+------ --------+ 1 row in set (23.66 sec) Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 Interesting... well, 0.05 is better, but not great... and the EXPLAIN for that 2nd query is using the index? I'm throughly stumped. Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 here is another result on smaller table (about 3K) mysql> SELECT * FROM classified_email_list WHERE idn >= (SELECT FLOOR( MAX(`idn` ) * RAND()) FROM classified_email_list) LIMIT 1; 1 row in set (0.13 sec) Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 explain SELECT * FROM ipgeo WHERE idn >= (SELECT FLOOR( MAX(`idn`) * RAND()) FROM ipgeo) LIMIT 1; <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ROOT SYSTEM "test2.dtd"> <ROOT> <row> <field name="id">1</field> <field name="select_type">PRIMARY</field> <field name="table">ipgeo</field> <field name="type">ALL</field> <possible_keys/> <key/> <key_len/> <ref/> <field name="rows">73258</field> <field name="Extra">Using where</field> </row> <row> <field name="id">2</field> <field name="select_type">UNCACHEABLE SUBQUERY</field> <field name="table">ipgeo</field> <field name="type">index</field> <possible_keys/> <field name="key">PRIMARY</field> <field name="key_len">12</field> <ref/> <field name="rows">73258</field> <field name="Extra">Using index</field> </row> </ROOT> Quote Link to comment Share on other sites More sharing options...
shoz Posted February 6, 2007 Share Posted February 6, 2007 I'm think this is the approach you're referring to fenway. I bookmarked it seeing how it's some trouble to remember it after a while. The page is down but google's cache doesn't disappoint. http://www.google.com/search?hl=en&q=cache%3Ahttp%3A%2F%2Fjan.kneschke.de%2Fprojects%2Fmysql%2Forder-by-rand%2F&btnG=Google+Search The quote that should be noted at one point in the discussion that addresses the problem hvle is experiencing is the SELECT in the WHERE clause is executed for every row the outer SELECT is fetching. The solution follows of course. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 6, 2007 Share Posted February 6, 2007 Actually after looking at fenway's query more closely I don't think that it should be executed for every column. The query in the article that prompts the quote doesn't use LIMIT 1. Although I do think the query should be quick I also think that it needs an ORDER BY to be random . In case the site and cache are both unavailable this is the query. SELECT * FROM tablename AS t1 INNER JOIN ( SELECT ROUND(RAND() * (SELECT MAX(id) FROM tablename)) AS id ) AS t2 ON t1.id >= t2.id ORDER BY t1.id ASC LIMIT 1; EDIT: I should note that I've changed the syntax of the query slightly. In the event that it matters the original is SELECT name FROM random AS r1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM random)) AS id) AS r2 WHERE r1.id >= r2.id ORDER BY r1.id ASC LIMIT 1; Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 SELECT name FROM random AS r1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM random)) AS id) AS r2 WHERE r1.id >= r2.id ORDER BY r1.id ASC LIMIT 1; I tested this query and it run very well (0.00s). Quote Link to comment Share on other sites More sharing options...
hvle Posted February 6, 2007 Share Posted February 6, 2007 mysql> SELECT r1.idn -> FROM city AS r1 JOIN -> (SELECT ROUND(RAND() * -> (SELECT MAX(idn) -> FROM city)) AS idn) -> AS r2 -> WHERE r1.idn >= r2.idn -> ORDER BY r1.idn ASC -> LIMIT 1; +---------+ | idn | +---------+ | 2091969 | +---------+ 1 row in set (0.00 sec) Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 Sorry, my bad... I was doing this from memory, and yes, shoz, I do remember jan's article now. Of course it won't use the index if there's an expression like FLOOR()... I haven't slept enough recently ;-) In fact, the EXPLAIN does explicitly state "uncacheable subquery" - a subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query. I could have sworn that I'd used this before: SELECT * FROM city WHERE idn >= FLOOR( ( SELECT MAX( idn ) FROM city ) * RAND() ) LIMIT 1 But to my surprise, the EXPLAIN suggests that the outer query can't use an index, even if it's forced to. edit: actually, it's because there's no explicit order by clause (c/o shoz, see below). I'm sure that this has something to do with shoz's comment about any subqueries in the where clause being evaluated for each row... but I'm shocked since the optimizer "optimizes" the subquery away ("Select tables optimized away"... it's a constant), and doesn't even think it's correlated, that it's any different from a normal index/range lookup. Very strange indeed... time to do some more research! Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 On second thought, I remember that the explain doesn't take the limit clause into account, so even though I'm getting all N rows as being evaluated, it's obviously not true... but that still doesn't explain (no pun intended) the lack of index usage... hmm. Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 Interesting... ELECT * FROM city WHERE idn > FLOOR ( ( SELECT MAX(`idn`) FROM city ) ) LIMIT 1 Does produce the desired EXPLAIN output: id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY city range PRIMARY PRIMARY 8 NULL 1 Using where 2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away But, of course, RAND() is missing... I guess I should have figured this out (again, the tired) -- it's an indeterminate function, that blows away things like caching, much like NOW(). I'm still very surprised -- I don't like "having" to use a JOIN for this, because it just adds complexity in this case. Don't know how I'm going to figure this one out, though.... stay tuned. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 6, 2007 Share Posted February 6, 2007 The last query does produce quick results for me. The explain you posted suggested that it would as well. I've added an ORDER BY because it should make it more random. For eg if MYSQL always looks at id 20 before id 10 and 5, then a floor(..) result of 2 or 15 will always give 20 instead of the 5 and 10 it should. SELECT * FROM tablename WHERE id >= (SELECT FLOOR(RAND() * (SELECT MAX(id) FROM tablename))) ORDER BY id ASC LIMIT 1 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+-------+---------------+---------+---------+------+--------+------------------------------+ | 1 | PRIMARY | tablename | index | NULL | PRIMARY | 4 | NULL | 149991 | Using where | | 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | Quote Link to comment Share on other sites More sharing options...
fenway Posted February 6, 2007 Share Posted February 6, 2007 The last query does produce quick results for me. The explain you posted suggested that it would as well. Which query? My last one -- with the explain -- didn't have RAND()! Otherwise, it doesn't seem to scale very well for me. I've added an ORDER BY because it should make it more random. For eg if MYSQL always looks at id 20 before id 10 and 5, then a floor(..) result of 2 or 15 will always give 20 instead of the 5 and 10 it should. Of course... in fact, it's required, or the "random"-ness is meaningless. Edit: actually, it's more than that.. without the ORDER BY, the PK index isn't used, which explains why I was getting a table scan... but not why MySQL isn't smart enough to do it on its own. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.