too much data?

speedy33417 · April 12, 2008

I need to build a website that deals with a bunch of data added each day into a MySQl database. About 1,500 a day, or 7,500 a week, which is about 390,000 a year.

I'm afraid that with time the website will become extremely slow.

Is there anything that I can do to avoid this slow down or is it not that much data?

AndyB · April 12, 2008

390,000 what in a year?

speedy33417 · April 12, 2008

There will be a table with clients, users, invoices and a few more.

There are 60 employees writing about 25 invoices each day.

The invoices table will have about 30-40 fileds and about 390,000 new rows (60 employees x 25 invoices x 5 days a week x 52 weeks a year) will be added each year.

How quick will my query be if I want to pull all invoices on a specific day for a specific employee if there are more than a million rows (two-three years from now)?

Should I be concerned?

Barand · April 12, 2008

If you are frequently querying by invoice date, put an index on the column.

I reqularly work with tables with 3 to 4 million records (rising to 12m over chistmas period)

What is the structure of the invoice table?

speedy33417 · April 12, 2008

The structure is not final, but something like this.

+------------+----------------+---------------+--------------+-------------+-------------+---------------+-----------------+----------------+----------+-----------------+----------------+-------------
| invoice_id | invoice_number | invoice_total | invoice_date | employee_id | stop_number | product1_sold | product1_credit | product1_price | product2 | product2_credit | product2_price | (8 more products...)
+------------+----------------+---------------+--------------+-------------+-------------+---------------+-----------------+----------------+----------+-----------------+----------------+------------

What does putting an index do? I've never done that before.

Barand · April 12, 2008

I thought there seemed to be too many fields. You data needs normalizing.

[pre]

invoice inv_item product

=================== =================== ==========

invoice_number (PK) --+ item_id +--- product_id

invoice_date +---< invoice_number{FK} | product_name

employee_id product_id >---+ product_price

stop_number product_credit

product_price

[/pre]

You invoice number should be unique, so use it as primary key.

Put the invoice items in a separate table, 1 per row, with the invoice and prod ids as foreign keys.

There doesn't seem to be any customer reference in the invoice data (or is that the stop_number)

INDEXES work as they do with a book. You look up a word, see what page/s iys on and go straight to it, so you don't need to read through the whole book to find the word.

speedy33417 · April 12, 2008

Thanks for your help Barand!

The invoice number is unique but it comes from a pre-printed invoice book and I can't use auto-increment. I think unique makes sense. What happens if a user enters an invoice number that's been used before? I don't want them to get an error message, I'd much rather validate the entry.

I did forget the customer_id, but I added it. This is what I have

customer               invoice                      inv_item                     product
===============        ===================          ==================           ==========
customer_id     --+    invoice_number (PK)  --+     item_id                +---  product_id
customer_name     |    invoice_date           +---< invoice_number{FK}     |     product_name
customer_address  +--< employee_id                  product_id         >---+     product_price
                       customer_id                  product_credit               product_price
                       stop_number

What does the foreign key does? Does it speed up the queries?

dsmythe · April 12, 2008

In my opinion, normalizing your data will definitely speed up queries depending on the range of data you need from each invoices' row, but keep in mind it will also increase the complexity of the SQL you're going to have to write.

You original question was what will happen when you get a lot of data. Primarily this is going to be influenced by the server and resources that you're giving to MySQL.

A dual core server w/ 4 GB of RAM strictly dedicated to your database and optimized properly could probably handle 40 Million records +

Depending on your hardware schema, My suggestion would be to archive the old invoices after n number of years, depending on how well your server can handle it. If your server can handle 3 years of invoices, then start archiving a year at a time after 3 years.

Another option would be to use MySQL's partitioning capabilities (in v5.1) to automatically divvy up the invoices by date as you insert the data. This makes the archiving process easier and also provides some performance gain when you're dealing with a lot of records.

-Dan-

speedy33417 · April 12, 2008

Great help. Thanks a lot!

I'm just not used to working with databases with that much data and it freaked me out a bit.

So, again what does the primary key/foreign key do? Since my invoice numbers are going to be all over the place I think I should have another field called invoice_id.

About archiving. What does that mean? Like moving old data out of the table to reduce the amount of rows and speed up the process?

dsmythe · April 12, 2008

The primary -> foreign key relationship is only really useful when using the InnoDB storage engine. It establishes a relationship between the records who's keys match, and allows for error checking and maintenance.

For instance, if you have a table full of countries, and a table full of states - you could have a primary -> foreign key relationship between the two. This could prevent you from adding states without countries, or enable you to delete a country along with all corresponding states.

As long as your invoice numbers are going to be unique all the time and they're only digits (no spaces or hyphens) I don't see a benefit in having an extra column simply to have automatically incrementing identifiers. Besides, it won't have any correlation to the data being stored anyway. It would also be simpler to use your invoice number as the unique identifier, since, if you split the database into separate tables, it will make sense in all the tables to look it up by invoice number.

The main definition of a primary key is a number (or string of characters, although MySQL performs better with numbers) that will always be unique, and never be deleted or re-used. Certainly an invoice number would be a good choice for a primary key. You would use it to reference the rows in the database because of the uniqueness. Primary keys are also indexed, which helps MySQL keep track of where in the database the record is, which speeds up retrieval. Instead of having to scan every row to generate your results, MySQL will know exactly where to look and what rows to scan to get your results if you reference the key in the query.

Archiving would be like boxing up old data. You need/want to keep it, but you don't necessarily need it to be accessible all the time. If you had been running this database for 4 years, you would want to take all the invoices from 2004 and put them in their own table away from the main database of current records. This would leave them still available to query should you need to search back in time real far (and you would expect your query to take longer as well), but would keep them out of the current jumble to keep current activities speedy. The MyISAM storage engine offers table compression, where you can have the data optimized and compressed to save disk space, but still available to SELECT from. The downside of compressing tables would be that you can't add or modify new data. This is why you would do this with 4 year old records, for instance, because that data will no longer change.

-Dan-

Barand · April 12, 2008

Even without InnoDB, the foreign key concept is useful. InnoDB allows you to enforce them so you cannot have, say, an invoice for custid 123 if there is no record in the customer table with that id. It also won't allow you to delete the customer if there are existing invoice records for the customer.

The foreign keys "relate" the records of one table with those in another.

eg customer_id in the invoice table relates that invoice with the customer with that id. Without them you wouldn't which invoices belong to which customer, or which item to which invoice.

When you query the tables, those PK--FK links will be the JOINS that you use.

dsmythe · April 12, 2008

My point was that it doesn't exhibit the benefits of being a Foreign key. MyISAM will completely ignore it:

mysql> create table `t1` (

-> `id` int,

-> `name` varchar(10) ) ENGINE = MyISAM

-> ;

Query OK, 0 rows affected (0.04 sec)

mysql> create table `t2` (

-> `id` int,

-> `last` varchar(10) ) ENGING = MyISAM

-> ;

Query OK, 0 rows affected (0.04 sec)

mysql> ALTER TABLE t1 ADD FOREIGN KEY(id) REFERENCES `t2`(id);

Query OK, 0 rows affected (0.01 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> show create table `t1`\G

*************************** 1. row ***************************

Table: t1

Create Table: CREATE TABLE `t1` (

`id` int(11) DEFAULT NULL,

`name` varchar(10) DEFAULT NULL,

KEY `id` (`id`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

As you can see, it simply creates a normal Key and does nothing to the referenced table:

mysql> show create table `t2`\G

*************************** 1. row ***************************

Table: t2

Create Table: CREATE TABLE `t2` (

`id` int(11) DEFAULT NULL,

`last` varchar(10) DEFAULT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

-Dan-

Barand · April 12, 2008

Whether you enforce the referential integrity using InnoDB, or not with MyIsam, the key of one table placed in another as a basis for relational joins is still, by definition, a FOREIGN KEY

dsmythe · April 12, 2008

I beg to differ:

"

Foreign keys definitions are subject to the following conditions:

*

Both tables must be InnoDB tables and they must not be TEMPORARY tables.

*

Corresponding columns in the foreign key and the referenced key must have similar internal data types inside InnoDB so that they can be compared without a type conversion. The size and sign of integer types must be the same. The length of string types need not be the same. For non-binary (character) string columns, the character set and collation must be the same.

*

In the referencing table, there must be an index where the foreign key columns are listed as the first columns in the same order. Such an index is created on the referencing table automatically if it does not exist.

*

In the referenced table, there must be an index where the referenced columns are listed as the first columns in the same order.

*

Index prefixes on foreign key columns are not supported. One consequence of this is that BLOB and TEXT columns cannot be included in a foreign key, because indexes on those columns must always include a prefix length.

*

If the CONSTRAINT symbol clause is given, the symbol value must be unique in the database. If the clause is not given, InnoDB creates the name automatically.

"

http://dev.mysql.com/doc/refman/5.0/en/innodb-foreign-key-constraints.html

I do agree, however, that it is a good idea to create KEY's on the columns used in relational joins between 2 MyISAM tables, however these are not Foreign keys, neither by definition nor by function. They simply index the two columns which in turn optimizes joins as normal KEY's would. MyISAM simply does not know what a Foreign key is.

Barand · April 12, 2008

What you are arguing about is the physical implementation of a foreign key constraint.

Before you stuck your oar in I was talking only about the logical data model and entity relationships.

So, pat on the back, you've read the manual. So have I. I already stated that you need innodb to enforce them, but they still exist logically.

Sign In

too much data?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information