Jump to content

Recommended Posts

Hi,

I have a search engine on my PHP application and I want to store the top 50 searches.

 

Now, If I store all query's then the table will get huge. If I limit the amount of query's to 50 and then count the recurring searches then any new query's that weren't in the original 50 won't be stored.

 

Any ideas on how I might solve this problem?

 

Thank you in advance.

 

autonumber

Search

Count

1

wine

5

2

eggs

1

3

bacon

7

Insert all of the search words into the table, but add a column for LastSearch date.  Then run a cron job daily or weekly (or whatever) that deletes any entries that have a low count and the LastSearch date is over 1 week or 1 month or whatever old.  This way, you are only tossing out words that have grown "stale".

 

Another option would be to create a second table to "archive" these deleted words to.  Then when you get ready to delete, you can check to see if the word is in the archive and if it is, add the archive count to the live count before deciding what to delete.  If you delete a word, insert or replace it in the archive table with the updated count.  You can have a cron job scan this table periodically to see if any words have a high enough count to be moved back into the live table.

 

 

Now, If I store all query's then the table will get huge.

 

I don't see how that's going to be a problem.

 

Insert all of the search words into the table, but add a column for LastSearch date.  Then run a cron job daily or weekly (or whatever) that deletes any entries that have a low count and the LastSearch date is over 1 week or 1 month or whatever old.

 

Is not a good idea, because the first 50 will remain the first 50 as all others would-be popular keywords are always reset while the first 50 will keep incrementing.

Insert all of the search words into the table, but add a column for LastSearch date.  Then run a cron job daily or weekly (or whatever) that deletes any entries that have a low count and the LastSearch date is over 1 week or 1 month or whatever old.  This way, you are only tossing out words that have grown "stale".

 

Thanks, I think this may be a viable solution.

 

I don't see how that's going to be a problem.

 

I don't know much about how much space or processing power mysql needs to store or query data. But if I store every query;

 

alpha = 26

numeric = 10

Total = 36 (not including special chars and converting all to lowercase)

 

36 to the power of 36 = 1.06387359 × 10^56

 

That's one massive table, does it really not matter that it could possibly get that big?

 

Euhm that's not how you calculate row size, something similar to this would be more correct (although I'm no expert in this):

 

id + search + count = int + varchar + int = 4 bytes + 255 bytes + 4 bytes = 263 bytes (assuming you use varchar(255))

 

263 bytes * 1M records = 263 000 000 = 263 M

263 bytes * 1G records = 263 000 000 000 = 263 G

 

If your website is popular you may reach 1M records (which I doubt you'll ever get to).

 

A more realistic number would be:

 

id + search + count = mediumint + varchar(32) + mediumint = 3 bytes + 32 bytes + 3 bytes = 40 bytes

 

40 bytes * 1M = 40M

40 bytes * 1G = 40G

Sorry, I was calculating the amount of entries, should have made that clearer.

 

I think I will just let the table grow and see how it goes, if it becomes unmanageable then I will just delete it and start fresh and then look for a solution.

 

Thank you very much for your thoughts and insight.

ignace:

that's not how you calculate row size,
Looks like you were calculating disk space, not number of entries. And you did not include any overhead for the indexes; of course, you assumed all of the keywords are going to be 255 characters long (varchar only uses the space needed for the contents plus a little overhead for the length of the content), so I guess that makes up for it.

 

razta: You are storing each keyword only once, right?  I mean if I search for "wine" you INSERT it into the database with a count of 1.  When the next guy searches for "wine", you UPDATE that entry and increment the count to 2.  You are not doing a second INSERT, right? 

 

You should probably just let the table grow until it becomes a problem.  But if you develop the archive process before the problem comes up, you will be ahead of the game.

 

One thing to make note of, that auto_increment key is not going to do you much good at all.  And you need to have the keyword indexed so you can locate it and update it.  You might want to consider making the keyword your Primary Key and throw away the auto_number.  Also, since your will be querying for the top 50, you might consider putting an index on the count column so that query will be more

 

of course, you assumed all of the keywords are going to be 255 characters long

 

I wanted to show the worst-case scenario. And I indeed did not include indexes and stuff which would account for another 20%? I just wanted to point out that his current calculation was incorrect. Of course mine wasn't either correct as you pointed out but closer to the truth.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.