Jump to content

question about speed (arrays and databases)


tomdelonge

Recommended Posts

let's say i have a huge database of entries (about 1000 to 10000 entries). i need to randomly select 50 rows and display them. what's the best way (i'm talking using the least computer power and being the fastest. should i put each row into an array and then use the php shuffle() function? or should i come up with random numbers first and then find that row in each entry. or what way would you do it?

 

so basically, i need 50 randomly selected rows of a database, what's the best way to do it? (it seems to me that an array of that size would take a while to shuffle; maybe i'm wrong) just let me know how you would accomplish this. thanks!

Link to comment
Share on other sites

I would pull 50 random rows from the database, I wouldn't even consider having an array of 1000 to 10000 entries just for 50 random.  Even if you wanted more than 50 random, the database random function is going to be much faster than php.

Link to comment
Share on other sites

One function you can try using is mysql_data_seek() http://us3.php.net/mysql_data_seek

This function will allow you to choose which row of a query you'll read from next

 

For example:

<?php
$connection = mysql_connect('localhost','username','password');
$query = 'SELECT * FROM mytable';
$resource = mysql_query($query,$connection);

mysql_data_seek($resource,23);
$twenty_fourth_row_data = mysql_fetch_array($resource);

mysql_data_seek($resource,0);
$first_row_data = mysql_fetch_array($resource);
// row numbers start at 0!

// etc.
?> 

 

So you see, using mysql_data_seek allows you to specify the number row of your query results. 

 

 

Now, to solve your problem, what you can do is figure out a way to get 25 random distinct integers between 0 and the number of rows in your query.  Then, all you have to do is call mysql_data_seek with that random number and then extract that row's data and use it however.

 

Hope that helps

Link to comment
Share on other sites

I should add some reasons to use my method:

 

1. You ask MySQL to do as little querying as possible (you only need one query).

 

2. You do not put any unused data into memory like you would if you were to dump all of the results into an array.

 

3. You have more control over how you are randomizing your numbers since you are doing through PHP.

Link to comment
Share on other sites

@ibechane

 

sorry but I don't really recommend that one.

 

First, in your method you get all the records in the database which I don't find necessary. Why would I pull all 10000 records if I can pull only 50 records in random directly from the database?

 

Secondly, you do not need to put that one into array since by recursively using mysql_fetch_assoc() or other related functions would suffice since you only want them to be displayed.

 

Lastly, you can do pretty good random with RAND() in MySQL and it is faster.

 

@Lamez

 

I suggest doing it through the query. It is faster and uses less memory.

Link to comment
Share on other sites

I'd really advise against ORDER BY RAND() LIMIT 50 with 10k rows because RAND() isn't very efficient... it creates a temporary table with all of the rows (all 10k) shuffled and then you pull off 50 of them. There are much more efficient ways, it's just unfortunate that they aren't all as simple.

Agreed, but that just depends on the speed that you absolutely must have and how dedicated you are to get it.  It is easy to get around this issue if pulling one random record but it becomes complicated when pulling multiple random records.  I can think of one way but it wouldn't be pretty.

Link to comment
Share on other sites

I just happened to have a table with exactly 10,000 rows, so to give some idea of the time taken:

 

[pre]mysql> SELECT * FROM cells c

    -> ORDER BY RAND() LIMIT 50;

+---------+--------+

| idcells | colour |

+---------+--------+

|    9035 | BF4432 |

|    1450 | 98EBEE |

|    9574 | BCB385 |

|    6475 | 6C6F68 |

|    2508 | EC8466 |

|    958 | F10D0E |

|    5093 | B27687 |

|    4138 | 81207F |

|    8964 | 9E1CB5 |

|    4403 | AE9E51 |

|    6727 | 53201F |

|    2747 | 221FC1 |

|    2952 | F308F0 |

|    8567 | 889882 |

|    6131 | 7769F7 |

|    1559 | 268EC2 |

|    1001 | E84DE6 |

|    238 | 826AA2 |

|    1785 | CC2EB7 |

|    8699 | 8F3D69 |

|    7695 | BF0311 |

|    7750 | 244639 |

|    8158 | D98A29 |

|    1596 | C29EAA |

|    4439 | D88EF9 |

|    675 | 28C20E |

|    1083 | ED8E04 |

|    6716 | 4A8672 |

|    5328 | 993847 |

|    3572 | 50FF10 |

|    6687 | AADFC2 |

|    5176 | D47455 |

|    224 | 57B54D |

|    2538 | 4384DC |

|    363 | 8F84F4 |

|    3660 | B16533 |

|    6936 | 8F786D |

|    3338 | F2A23E |

|    5345 | CC7AC4 |

|    9730 | 3A4E47 |

|    2075 | F381DF |

|    8750 | 9E3843 |

|    8510 | 324079 |

|    1827 | 195FE7 |

|    3522 | 70FCDF |

|    177 | 7EDBDE |

|    6806 | 65C130 |

|    4590 | 0C2C3C |

|    6692 | 3B4026 |

|    1889 | A12B21 |

+---------+--------+

50 rows in set (0.02 sec)

[/pre]

Link to comment
Share on other sites

But as we all know, things aren't always quite so simple in the real world.  I know at my job, I average at least 5 joins per query.  I just built one that joins 11 tables and a switch.  So it works in this case and probably yours, but I can see where it would be useful to find a way to go in with a comma separated list of PKs.

 

and BTW: one of those tables stores 5 years of medical insurance claims so you can imagine how large that table is, of course its on Postgres but, same concept.

Link to comment
Share on other sites

Hey, I'm just going off of what I've always been told/read :) It would be interesting to benchmark order by rand() against many of the other methods I've seen.

 

Method suggested in earlier post

[pre]mysql> SELECT

    -> *

    -> FROM

    -> cells AS t1

    -> INNER JOIN

    -> (

    ->    SELECT

    ->    ROUND(RAND() * (SELECT MAX(idcells) FROM cells)) AS id

    -> ) AS t2

    -> ON

    -> t1.idcells >= t2.id

    -> ORDER BY

    -> t1.idcells ASC

    -> LIMIT 50;

+---------+--------+------+

| idcells | colour | id  |

+---------+--------+------+

|    3810 | B31FF4 | 3810 |

|    3811 | 14750F | 3810 |

|    3812 | 7FFED6 | 3810 |

|    3813 | E874AD | 3810 |

|    3814 | 4EC40E | 3810 |

|    3815 | FAD638 | 3810 |

|    3816 | 023CE6 | 3810 |

|    3817 | 5E8B4C | 3810 |

|    3818 | 75CE5E | 3810 |

|    3819 | 82FB9F | 3810 |

|    3820 | B6B571 | 3810 |

|    3821 | 69483D | 3810 |

|    3822 | 4E464E | 3810 |

|    3823 | F38BCB | 3810 |

|    3824 | 802EA0 | 3810 |

|    3825 | 1215C7 | 3810 |

|    3826 | 84B3C6 | 3810 |

|    3827 | 15AFC5 | 3810 |

|    3828 | CAF11A | 3810 |

|    3829 | E1DAB2 | 3810 |

|    3830 | 3D1E2D | 3810 |

|    3831 | 310F14 | 3810 |

|    3832 | 7CC608 | 3810 |

|    3833 | DCFE47 | 3810 |

|    3834 | 210E6C | 3810 |

|    3835 | 0FD2C1 | 3810 |

|    3836 | FDF412 | 3810 |

|    3837 | 8E6A4E | 3810 |

|    3838 | 598CEA | 3810 |

|    3839 | F6A053 | 3810 |

|    3840 | 36435E | 3810 |

|    3841 | FC850E | 3810 |

|    3842 | 8C1F90 | 3810 |

|    3843 | AEA2D3 | 3810 |

|    3844 | 8EFC97 | 3810 |

|    3845 | B13850 | 3810 |

|    3846 | E3D0C5 | 3810 |

|    3847 | 807FC8 | 3810 |

|    3848 | EEE7E3 | 3810 |

|    3849 | B2EA5B | 3810 |

|    3850 | 062672 | 3810 |

|    3851 | 34603B | 3810 |

|    3852 | BD4BEA | 3810 |

|    3853 | 8A83F7 | 3810 |

|    3854 | 1C2AFE | 3810 |

|    3855 | 11EDB3 | 3810 |

|    3856 | E3F0D5 | 3810 |

|    3857 | 3F6D6D | 3810 |

|    3858 | CD6351 | 3810 |

|    3859 | DF4D39 | 3810 |

+---------+--------+------+

50 rows in set (0.05 sec)[/pre]

 

Throw those text books away!

 

EDIT: I think this only works for LIMIT 1

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.