Jump to content

Pulling Random Results from a Table, using Factors.


Recommended Posts

SELECT *
FROM table1 AS t1
INNER JOIN ( SELECT ROUND(RAND() * (SELECT MAX(row1) FROM table1)) AS row1) AS tt1 ON t1.row1 >= tt1.row1
LEFT JOIN table2  AS t2 ON t2.row1 = t1.row2
LEFT JOIN table3  AS t3 ON t3.row1 = t2.row2
LEFT JOIN table4  AS t4 ON t4.row1 = t2.row3
LEFT JOIN table5  AS t5 ON t5.row1 = t2.row4
WHERE t1.expiration >= todaysdate 
ORDER BY t1.row1 ASC LIMIT 1

 

A while back I was talking about running a query in a Cron Job to pull information out of a database and post it into a FLAT FILE to use in order to limit the amount of queries I had.  While that would have been useful, I came to the conclusion that I wouldn't be able to display enough results in a day to make it worth while.  If I had more results in the table than how many times the cron job went off per day, a good portion of my results would never be displayed.

 

That would have been a pretty big problem, so I decided to head away from the FLAT FILE ordeal and came up with a query like the above.

 

This query works beautifully, and the use of RAND() in the INNER JOIN, suggested by Fenway in his Sticky,  scales nicely and with some tests it seems to bring records out more evenly.  Of course this could have been a fluke, so don't take my word on that.

 

Now the problem comes with Expiration Dates, as not all rows would be active to pull a result from.  Thus presenting the possibility that an empty result to be pulled from the database.

 

Running a test with 7 rows; Rows 2, 4, and 6 having active expiration dates and Rows 1, 3, 5, and 7 being inactive as they have expired, if the RAND() and all that ends up equaling 7.. then there will be an empty result drawn.

 

So if Row 1 is active for 30 days, and Rows 2 - 30 are only active for a single day.  I would be pulling more empty results than being able to pull the 1 active result from the table.

 

Table 2, 3, 4, and 5 pull other information about a user which happens to be drawn.

 

I've been debating possible methods of eliminating the RAND() all together and pulling results by other methods, but I can't really figure out a way to pull random records using an expiration factor.  Let alone something that seems to be optimized for the long haul when there are a large number of records in the table.

 

Anyone got a suggestion on how to pull only random and active rows?  This one has me a bit stumped.

Okay the information that I would be pulling by using this query has the potential to expire, basically I wouldn't need to pull that information.

 

If row 1 doesn't expire for 30 days, and I have 30 rows after that which has expired.  The use of rand() is possibly going to bring up empty results as Row 1 requires rand() to equal 0 or 1, how ever it works.

 

Even if row 24 has expired, it's still in the database so it has a potential of allowing rand() to equal 24.  Even if Row 1 is the only valid row currently active.

 

Running tests, I have hit empty results using that query.  Empty results are not much of a problem every so often, but more often than pulling the only active row out of a 40 row database...  that could be a problem for this script.  Does that clarify things?  If not I can get even more specific, kind of on a time limit at this moment.

Ah, yes, I see what you're saying.  That's always going to be true -- rand() can't possibly guess.

 

You have a few options:

 

1) Just increase the number of rows you return randomly, if you can easily the "miss rate".  This works well if you don't need "exactly" N rows back.

 

2) Do (1) iteratively, since maybe you will get back the right number sometimes -- can't imagine that a few extra passes wouldn't sort this out.

 

3) This may be very bad performance-wise, but if your "expired" query runs quickly, you can always use those UIDs directly -- but that's quite ugly.

I suppose I could do it backwards!

 

SELECT *
FROM table1 AS t1
INNER JOIN ( SELECT ROUND(RAND() * (SELECT MAX(row1) FROM table1)) AS row1) AS tt1 ON t1.row1 <= tt1.row1
LEFT JOIN table2  AS t2 ON t2.row1 = t1.row2
LEFT JOIN table3  AS t3 ON t3.row1 = t2.row2
LEFT JOIN table4  AS t4 ON t4.row1 = t2.row3
LEFT JOIN table5  AS t5 ON t5.row1 = t2.row4
WHERE t1.expiration >= todaysdate 
ORDER BY t1.row1 DESC LIMIT 1

 

Just put in a false factor for ID number 1...  If I remove the expired entries, is there a way to reorganize the ID fields of the table?

Like where it would be:  1, 4, 7, 8, 73

Organize it to 1, 2, 3, 4, 5 ??

 

I believe that would be the easiest method, especially as the ID isn't needed for anything else that I am aware of.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.