Jump to content

Mergeing/Overwriteing/Archiving Tables.


riscit

Recommended Posts

:D Howdy...

 

ok.. I have a database which is the back end of an online store.

 

In this database are 2 tables (among others). One is a record of all customers/visitors (one row for each cookie handed out) and the other is a record of all referrals (via banner ads or search engines, etc - 1 row for each referral).

 

The referrals table is used only for generating statistics to judge ad performance, etc, and therefore is really just an archive. The customers table is used to track how many orders a customer has placed, page views, last visit, first visit, where they came from, etc, and the customer id is used for keeping track of items in their current order (to use an overused cliche - their \"cart\" *shudders*). So some of that stuff is used/referred to regularly (visitors that stop by on a regular basis) but a vast majority of it, is just people who saw the site, looked at few pages, and never came back again. Those rows are unnecessary overhead.

 

What I want to do, is copy/overwrite each of these tables to a seperate database, and then clean them out on a regular basis. This way we keep the old records around for running statistics, etc, but they aren\'t bloating the database that gets used 100\'s of time a minute. The customers table especially is slowing things down a lot.

 

The issue that needs to be dealt with is finding a good way to copy from database.table to database2.table while overwriting the changed rows in database2.table, adding any new rows that wern\'t there before, and leaving the rest alone. After that, database.table will be pruned to eliminate useless rows that are now archived in database2.table.

 

The next day, this will all happen again. database.table will have some of the rows it had the day before (they weren\'t all pruned), some of those will have been modified, and it will have some new rows as well.... and the changes and new rows need to be added to database2.table...

 

I hope I explained that well.. not sure I did. I suck at explaining these things.

 

Any insight on the best algorithm for this process? This is a relatively tight memory situation (which is the whole reason this needs to be done) so I can\'t just load the whole table in to memory and sort it out row by row. I\'d either have to do it in sub-sections, or (hopefully) mysql offers a good way to do this.

 

 

Possibly useful information below... not looking for a specific solution however - just a guideline for a good algorithm - to be pointed in the right direction.

 

customers.id is incremented from a flat file counter, and will be unique from day to day. referrers.id is incremented via auto-increment, so it may be reset in the course of pruning if the entire table contents is just deleted, which would leave duplicate id\'s from day to day - but that probably won\'t be an issue if the the entire table is wiped out anyway.


mysql> describe customers;

+--------------------------+-------------+------+-----+---------------------+-------+

| Field                    | Type        | Null | Key | Default             | Extra |

+--------------------------+-------------+------+-----+---------------------+-------+

| id                       | int(11)     |      | PRI | 0                   |       |

| primary_billing_address  | int(11)     | YES  |     | NULL                |       |

| primary_shipping_address | int(11)     | YES  |     | NULL                |       |

| referring_url            | blob        | YES  |     | NULL                |       |

| referrer                 | varchar(32) | YES  |     | NULL                |       |

| first_visit              | datetime    |      |     | 0000-00-00 00:00:00 |       |

| last_visit               | datetime    |      | MUL | 0000-00-00 00:00:00 |       |

| page_views               | int(11)     |      |     | 0                   |       |

+--------------------------+-------------+------+-----+---------------------+-------+



mysql> describe referrers;

+---------------+-------------+------+-----+---------------------+----------------+

| Field         | Type        | Null | Key | Default             | Extra          |

+---------------+-------------+------+-----+---------------------+----------------+

| id            | int(11)     |      | PRI | NULL                | auto_increment |

| timestamp     | datetime    |      |     | 0000-00-00 00:00:00 |                |

| referrer_name | varchar(32) |      | MUL |                     |                |

| ip            | varchar(15) |      |     |                     |                |

| cookie        | int(11)     | YES  |     | NULL                |                |

| referring_url | blob        | YES  |     | NULL                |                |

| request_url   | blob        | YES  |     | NULL                |                |

+---------------+-------------+------+-----+---------------------+----------------+

Link to comment
Share on other sites

Referrers ------

This is the easy one. I would archive everything from this table into another on a daily baisis. Make sure that you are running all of your stats out of the archive table so you don\'t degrade the \"production\" performance.

 

Customers --------

I would pick a field like last_visit and prune everyone that hasn\'t visited in XX days. Move each of these records into an archive table. If you use this method you should just to a delete from the archive table for each record that exists (and is moving) from the live table. This will keep you from getting duplicate records in the archive.

Link to comment
Share on other sites

Customers --------

I would pick a field like last_visit and prune everyone that hasn\'t visited in XX days.  Move each of these records into an archive table.  If you use this method you should just to a delete from the archive table for each record that exists (and is moving) from the live table.  This will keep you from getting duplicate records in the archive.

What about the visitors that come on a regular basis? Their data changes daily, and should be updated with each archive.

Link to comment
Share on other sites

I\'m not following what you mean? If the user visits then their record would not get archived. Basically the live table would have all of your \"fresh\" records and the archive would have everything else. If you needed to do some sort of reporting on all of the customers you could just UNION the two tables.

Link to comment
Share on other sites

If you needed to do some sort of reporting on all of the customers you could just UNION the two tables.

Good call... our version of MySQL (3.23.46) doesn\'t support that however. I\'m looking in to upgrading it now.....

 

Thanks.

 

If I can\'t upgrade it (it\'s on an old Verio VPS) I\'ll be back...

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.