Jump to content

Recommended Posts

 

I realized that it's much more easier and protocol to have multiples tables rather than one table. I'm guessing it's much more easier to have a column for different values than combining it in one column. My question is what if I am designing a page that has a fixed info and I will always need to pull those info together. For example I have a column called T-Shirt Colors under the table CLOTHES. I would then store the value "Red - Blue - Green - Yellow - Orange - Pink - Dark Red " etc. and let's say that column value will have a very long string since it will contain over 1,000 colors. I know for SURE that these are the only colors my store will sell so I don't have to worry about adding or deleting the colors... Would it be more efficient on the overhead and server speed to do it this way and then explode the colors into a list...Or is it better to create 1000 columns for each color...

 

Also if it is more efficient to explode the values (even though it's harder to maintain) on the overhead and server speed, approximately how many more times is it it more efficient.. 1X faster, 2X faster, 3X faster, etc. or maybe too little to be even notice? Thanks in advance

(keep in mind that these 1,000 colors are always going to be used together... Since they will always be pulled and used together to give the customer the color options to select from...)

 

Whether you need to add/remove colors or not, the optimal solution is to use a separate table for all the color options, then link the available colors with the appropriate products using a third table.

 

Table products

- ProductID

- Name

- Description

- Etc.

 

Table colors

- ColorID

- Name

 

Table product_colors

- ProductID

- ColorID

 

 

So in your products table, you'd have the details on each product. The colors table would just be a simple listing of all available colors, one row per color. The product_colors table has one row per product/color combination. So if you had a shirt, and offered it in Red, Yellow, Black, Blue and White, you'd have:

products:
ProductID | Name  | Description
-----------------------------------
        1 | Shirt | A basic shirt


colors:
ColorID | Name
-----------------------
      1 | Red
      2 | Yellow
      3 | Black
      4 | Blue
      5 | White


product_colors:
ProductId | ColorID
-----------------------
        1 | 1
        1 | 2
        1 | 3
        1 | 4
        1 | 5
If you want to run on the assumption that every product will always be available in ever color, you could skip the third table, but having it gives you the option of limiting the color selection for individual products.

@Kicken... If I was to set it up your way and the count the columns and rows created for the setup based on your template:

 

Database Design A:

 

Table products
- ProductID :
- Name: Shirt


Table colors
- ColorID:
- Name:

Table product_colors
- ProductID"
- ColorID

 

the numbers of tables and rows I would get based on 1,000 colors for a t-shirt would be:

 

Number of Tables = 3     

Number of Columns = 6

Number of Rows = 2,002   -----> 2 ( from table  products), 1,000 (from table colors),  and 1,000 (from table product colors)

 

if I was to setup a table with three columns and just store all the colors together in a long string for the column colors (and explode it into a list when needed).... the amount of rows and columns I would get is:

 

Database Design B:

 

Table shirt_colors

Product ID:

Name:

Colors:

 

Number Of Tables = 1

Number Of Columns = 3

Number Of Rows = 3

 

 

Can you explain to me how Database Design Option A is more optimal (overhead and perfomance wise) than setting the colors into one long string value and then exploding it (besides the fact that it's easier to read and maintain)?

 

                                      Database Design Option A:  would result in 3 tables, 6 columns, and 2020 rows being created.

                                      Database Design Option B : would result in 1 tables, 3 columns, and 3 rows being created.

 

I'm having a hard time understanding how Databased Design Option  A would be the more optimal choice from an overhead and performance perspective. I do realize that a column with a value of over 1,000 words can be very long, but I'm thinking it has no problem pulling up long strings since mysql is a very popular among article and forum sites.

 

Thanks in advance

Edited by learningphpisfun

As mentioned, google about data normalization to get a solid understanding of how databases are designed to work, and what they are optimized for.

 

Think about this one scenario though as a quick way to see how A is more optimal and efficient:

 

So you have a page to display all your products, and filters on the side for users to narrow down the list to things they are interested in. Filters may be for example Type, Price range, Color, Size, etc. Here are two things you may want to do to make things user friendly:

1) Only show filters which would return a result (ie if they filter Shirts and there are no black shirt, don't include black in the color filter)

2) Show a count next to each filter indicating how many match that filter.

3) And of course actually apply the filter if the select it.

Now, using method B, here's what you'd have to do for each one

1) SELECT all the products matching the type/size/etc, run through each row, explode the colors of every item, and merge them into an array of colors.

2) Same as above, but for each color also increment a counter for the number of items that color is available.

3) Either similar as above, but as you go through the rows, explode the colors and check them against the selected filter. Each one that matches gets added to the final result, the rest get dropped.

With method A, here's how easy things become (each of these queries gives you exactly the results you need, with no additional processing required on the application end):

1)

SELECT DISTINCT c.Name 
FROM products p 
INNER JOIN product_colors pc ON pc.ProductId=p.ProductId 
INNER JOIN colors c ON c.ColorId=pc.ColorId 
WHERE /* The filters, like p.Type='t-shirt', p.Size='L' or whatever */
2)

SELECT c.Name, COUNT(*) as totalProducts 
FROM products p 
INNER JOIN product_colors pc ON pc.ProductId=p.ProductId 
INNER JOIN colors c ON c.ColorId=pc.ColorId 
WHERE /* The filters */ 
GROUP BY c.Name
3)

SELECT p.ProductId, p.Type, p.Size, GROUP_CONCAT(c.Name) as colorList /*, etc */ 
FROM products p 
INNER JOIN product_colors pc ON pc.ProductId=p.ProductId 
INNER JOIN colors c ON c.ColorId=pc.ColorId 
WHERE /* The filters, including c.ColorId IN (1,2,3) to limit colors */ 
GROUP BY p.ProductId
Now, of course, with only 2-3 products, either method will appear to run just as well as the other in pratice. It's when you get big that the differences become apparent. So lets fast-forward to the point where you have say, 5,000 products. Let's say the user only wants to see Large t-shirts in colors Blue and Forest-green.

Lets assume these are the totals for each of those filters applied individually:

# t-shirts: 4700

# in large: 4700

# in blue: 4000

# in forest green: 3500

# in both blue and forest green: 3450

 

Lets assume for the sake of being able to count rows, every shirt is available in some random selection of 500 colors out of 1000 total colors)

 

Method B Total rows:

4700 (one per product)

 

Method A Total rows:

4700 (products) + 2,350,000 (product_colors) + 1000 (colors) = 2,355,700

 

So using Method B, in order to filter by the colors you have to first

1) Select 4700 rows (# large shirts) then foreach row, explode the colors, loop over the color list (500 iterations), loop over the selected filters (2 iterations) to find matches, and then push the matching results into an array. Given the numbers above, of the original 4700 rows returned 1200 of them get dropped. How ever long it took to process those rows, is wasted time.

2) Similar to the above

3) Similar to the above

If you look at that, there is a lot of looping going on when processing the results from the DB:

4700 iteration (db results) + ((500 * 4700 iterations (colors for each result)) + (500*2 iterations (filter check)) = 2,355,700 iterations

Hey, look at that, you're doing just as much work as method A, just in your app rather than in MySQL.

 

 

Using method A (assuming proper indexing has been done)

1) The database will use the index that exists on product_colors to first find only the products which match blue and forest green. This immediatly narrows down the 2,355,700 rows to only 3450. After that the t-shirt/large filters would apply but they don't change the row count in this situation. On the PHP end, no further processing is require, just read the results

2) Similar to the above.

3) Similar to the above.

So, using method A your app only needs to loop 3450 times in order to gather up your results. Using method B, you have to loop 2,355,700 times to achieve the same thing.

 

Being a relation database, mysql has been highly optimized to do such filtering/loop in very efficient manner. As I mentioned above, by using an index mysql can near instantaniously narrow the result set down to the final 3450 rows right off the bat.

 

Your App on the other had likely has not been optimized at all (nor can it be optimized to anywhere the same level as Mysql has been). In addition, if your using something like PHP then you have additional overhead of it being an interpreted language

Edited by kicken
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.