Jump to content


Photo

Column with a long string of over 1,000 words?

normalization mysql

  • Please log in to reply
4 replies to this topic

#1 learningphpisfun

learningphpisfun

    Newbie

  • Members
  • Pip
  • 6 posts

Posted 14 July 2013 - 12:00 AM

 

I realized that it's much more easier and protocol to have multiples tables rather than one table. I'm guessing it's much more easier to have a column for different values than combining it in one column. My question is what if I am designing a page that has a fixed info and I will always need to pull those info together. For example I have a column called T-Shirt Colors under the table CLOTHES. I would then store the value "Red - Blue - Green - Yellow - Orange - Pink - Dark Red " etc. and let's say that column value will have a very long string since it will contain over 1,000 colors. I know for SURE that these are the only colors my store will sell so I don't have to worry about adding or deleting the colors... Would it be more efficient on the overhead and server speed to do it this way and then explode the colors into a list...Or is it better to create 1000 columns for each color...

 

Also if it is more efficient to explode the values (even though it's harder to maintain) on the overhead and server speed, approximately how many more times is it it more efficient.. 1X faster, 2X faster, 3X faster, etc. or maybe too little to be even notice? Thanks in advance

(keep in mind that these 1,000 colors are always going to be used together... Since they will always be pulled and used together to give the customer the color options to select from...)

 



#2 kicken

kicken

    Wiser? Not exactly.

  • Gurus
  • 2,576 posts
  • LocationBonita, FL

Posted 14 July 2013 - 12:35 AM

Whether you need to add/remove colors or not, the optimal solution is to use a separate table for all the color options, then link the available colors with the appropriate products using a third table.

Table products
- ProductID
- Name
- Description
- Etc.

Table colors
- ColorID
- Name

Table product_colors
- ProductID
- ColorID


So in your products table, you'd have the details on each product. The colors table would just be a simple listing of all available colors, one row per color. The product_colors table has one row per product/color combination. So if you had a shirt, and offered it in Red, Yellow, Black, Blue and White, you'd have:
products:
ProductID | Name  | Description
-----------------------------------
        1 | Shirt | A basic shirt


colors:
ColorID | Name
-----------------------
      1 | Red
      2 | Yellow
      3 | Black
      4 | Blue
      5 | White


product_colors:
ProductId | ColorID
-----------------------
        1 | 1
        1 | 2
        1 | 3
        1 | 4
        1 | 5
If you want to run on the assumption that every product will always be available in ever color, you could skip the third table, but having it gives you the option of limiting the color selection for individual products.
Recycle your old CD's, don't trash them!
Did I help you out?  Feeling generous? I accept tips via Paypal or Bitcoin @ 14mDxaob8Jgdg52scDbvf3uaeR61tB2yC7

#3 learningphpisfun

learningphpisfun

    Newbie

  • Members
  • Pip
  • 6 posts

Posted 14 July 2013 - 07:17 AM

@Kicken... If I was to set it up your way and the count the columns and rows created for the setup based on your template:

 

Database Design A:

 

Table products
- ProductID :
- Name: Shirt


Table colors
- ColorID:
- Name:

Table product_colors
- ProductID"
- ColorID

 

the numbers of tables and rows I would get based on 1,000 colors for a t-shirt would be:

 

Number of Tables = 3     

Number of Columns = 6

Number of Rows = 2,002   -----> 2 ( from table  products), 1,000 (from table colors),  and 1,000 (from table product colors)

 

if I was to setup a table with three columns and just store all the colors together in a long string for the column colors (and explode it into a list when needed).... the amount of rows and columns I would get is:

 

Database Design B:

 

Table shirt_colors

Product ID:

Name:

Colors:

 

Number Of Tables = 1

Number Of Columns = 3

Number Of Rows = 3

 

 

Can you explain to me how Database Design Option A is more optimal (overhead and perfomance wise) than setting the colors into one long string value and then exploding it (besides the fact that it's easier to read and maintain)?

 

                                      Database Design Option A:  would result in 3 tables, 6 columns, and 2020 rows being created.

                                      Database Design Option B : would result in 1 tables, 3 columns, and 3 rows being created.

 

I'm having a hard time understanding how Databased Design Option  A would be the more optimal choice from an overhead and performance perspective. I do realize that a column with a value of over 1,000 words can be very long, but I'm thinking it has no problem pulling up long strings since mysql is a very popular among article and forum sites.

 

Thanks in advance


Edited by learningphpisfun, 14 July 2013 - 07:31 AM.


#4 trq

trq

    Advanced Member

  • Administrators
  • 30,732 posts
  • LocationSydney, Australia.

Posted 14 July 2013 - 07:32 AM

There are numerous articles on the benefits of database normalization. Google the subject.

What your trying to do works against the entire relational model that relational database systems like MySQL are designed around and optimized for.

http://thorpesystems.com | http://proemframework.org | http://github.com/trq

SmtpCatcher - A very simple mock sendmail useful for testing PHP mail scripts.
OPM - My Linux package manager.


#5 kicken

kicken

    Wiser? Not exactly.

  • Gurus
  • 2,576 posts
  • LocationBonita, FL

Posted 14 July 2013 - 12:36 PM

As mentioned, google about data normalization to get a solid understanding of how databases are designed to work, and what they are optimized for.

Think about this one scenario though as a quick way to see how A is more optimal and efficient:

So you have a page to display all your products, and filters on the side for users to narrow down the list to things they are interested in. Filters may be for example Type, Price range, Color, Size, etc. Here are two things you may want to do to make things user friendly:


1) Only show filters which would return a result (ie if they filter Shirts and there are no black shirt, don't include black in the color filter)
2) Show a count next to each filter indicating how many match that filter.
3) And of course actually apply the filter if the select it.


Now, using method B, here's what you'd have to do for each one


1) SELECT all the products matching the type/size/etc, run through each row, explode the colors of every item, and merge them into an array of colors.
2) Same as above, but for each color also increment a counter for the number of items that color is available.
3) Either similar as above, but as you go through the rows, explode the colors and check them against the selected filter. Each one that matches gets added to the final result, the rest get dropped.


With method A, here's how easy things become (each of these queries gives you exactly the results you need, with no additional processing required on the application end):
1)
SELECT DISTINCT c.Name 
FROM products p 
INNER JOIN product_colors pc ON pc.ProductId=p.ProductId 
INNER JOIN colors c ON c.ColorId=pc.ColorId 
WHERE /* The filters, like p.Type='t-shirt', p.Size='L' or whatever */
2)
SELECT c.Name, COUNT(*) as totalProducts 
FROM products p 
INNER JOIN product_colors pc ON pc.ProductId=p.ProductId 
INNER JOIN colors c ON c.ColorId=pc.ColorId 
WHERE /* The filters */ 
GROUP BY c.Name
3)
SELECT p.ProductId, p.Type, p.Size, GROUP_CONCAT(c.Name) as colorList /*, etc */ 
FROM products p 
INNER JOIN product_colors pc ON pc.ProductId=p.ProductId 
INNER JOIN colors c ON c.ColorId=pc.ColorId 
WHERE /* The filters, including c.ColorId IN (1,2,3) to limit colors */ 
GROUP BY p.ProductId
Now, of course, with only 2-3 products, either method will appear to run just as well as the other in pratice. It's when you get big that the differences become apparent. So lets fast-forward to the point where you have say, 5,000 products. Let's say the user only wants to see Large t-shirts in colors Blue and Forest-green.
Lets assume these are the totals for each of those filters applied individually:
# t-shirts: 4700
# in large: 4700
# in blue: 4000
# in forest green: 3500
# in both blue and forest green: 3450

Lets assume for the sake of being able to count rows, every shirt is available in some random selection of 500 colors out of 1000 total colors)

Method B Total rows:
4700 (one per product)

Method A Total rows:
4700 (products) + 2,350,000 (product_colors) + 1000 (colors) = 2,355,700

So using Method B, in order to filter by the colors you have to first


1) Select 4700 rows (# large shirts) then foreach row, explode the colors, loop over the color list (500 iterations), loop over the selected filters (2 iterations) to find matches, and then push the matching results into an array. Given the numbers above, of the original 4700 rows returned 1200 of them get dropped. How ever long it took to process those rows, is wasted time.
2) Similar to the above
3) Similar to the above


If you look at that, there is a lot of looping going on when processing the results from the DB:
4700 iteration (db results) + ((500 * 4700 iterations (colors for each result)) + (500*2 iterations (filter check)) = 2,355,700 iterations
Hey, look at that, you're doing just as much work as method A, just in your app rather than in MySQL.


Using method A (assuming proper indexing has been done)


1) The database will use the index that exists on product_colors to first find only the products which match blue and forest green. This immediatly narrows down the 2,355,700 rows to only 3450. After that the t-shirt/large filters would apply but they don't change the row count in this situation. On the PHP end, no further processing is require, just read the results
2) Similar to the above.
3) Similar to the above.


So, using method A your app only needs to loop 3450 times in order to gather up your results. Using method B, you have to loop 2,355,700 times to achieve the same thing.

Being a relation database, mysql has been highly optimized to do such filtering/loop in very efficient manner. As I mentioned above, by using an index mysql can near instantaniously narrow the result set down to the final 3450 rows right off the bat.

Your App on the other had likely has not been optimized at all (nor can it be optimized to anywhere the same level as Mysql has been). In addition, if your using something like PHP then you have additional overhead of it being an interpreted language

Edited by kicken, 14 July 2013 - 12:39 PM.

Recycle your old CD's, don't trash them!
Did I help you out?  Feeling generous? I accept tips via Paypal or Bitcoin @ 14mDxaob8Jgdg52scDbvf3uaeR61tB2yC7




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Cheap Linux VPS from $5
SSD Storage, 30 day Guarantee
1 TB of BW, 100% Network Uptime

AlphaBit.com