Zane Posted April 17, 2013 Share Posted April 17, 2013 For the past few years, I have been designing a database & web site for a client in quite a mediocre style IMO. The site design is something that can always be changed, modified, and more or less completely overhauled. My question is in the database design. In this current semester I have been taking a Database Processing course. Luckily, with what SQL I have learned from messing with PHP/MySQL I have made over the years, I have a firm *enough* grasp on SQL to pass this course. What shocked me in the beginning though, and even more for the rest of the class I would presume, was the early introduction of normalization. In the first few weeks we did not even touch a computer and the professor tried his damndest .. with English as his second language, to teach us the various Normal Forms. Normalization I have heard of, but normal forms? That is an entirely new level of SQL understanding I have yet to achieve and needless to say... when the professor testing the class on what we had learned... we all did so horrible that he only counted the first question... of 5 questions. We have stepped away from Normal Forms since then and have focused primarily on SQL Statements which is great for me, but the idea of normal forms still sits in the back of my head as something that I should know. Since I am by no means an expert Database Administrator, I have been approaching my clients site with a very novice approach. One table per product type. Every product type has several similar attributes like colorname, stylename, price, etcetera... Yet every product type also has way more distinct attributes, like thickness, edge profiles, cleaning methods, installation methods, etcetera. I have googled around and came across something called the EAV Model. When I first saw it I was stricken with awe.. and thought to myself that THIS was the answer to all of my problems............ a few google searches later, I hear nothing but horrible things about this model when it comes to SEO.. or in-site searches in general. It is supposedly a nightmare of a "Normal Form" to use for something whose rows will be populated exponentially. 40 new products one day, 22 the next, 100 another day... and so on. So my question, which I almost feel deserves to be in Application Design... is What is the ideal method of normalization for a database that contains several product types which have both similar and dissimilar attributes? Obviously a table for every product type is ridiculous and inefficient, because the owner constantly has a new attribute he wants to add causing me to have to hack and tweak and duct tape it into the system. I really need some advice if I want to do this for a living or be able to claim that I know what I am doing when it comes to e-commerce website development. Luckily, I convinced the client to go with Authorize.net so I do not have to create a shopping cart, but that is another story completely. Quote Link to comment Share on other sites More sharing options...
requinix Posted April 18, 2013 Share Posted April 18, 2013 Calling something "normalized" is like calling someone's actions "illegal". By just the terms themselves you can get a decent idea of what the person means to say, but the reality is there's varying degrees of both. Most everybody knows about normalization where you use a primary key ID (or other single, small piece of unique information) as a key for data across various tables. But there's also stuff about not using multiple tables for a single entity, or not including data in a table that doesn't belong there. Typically (and strictly speaking) normalization comes in different levels: you normalize for one criteria, then for a couple more to reach the next level, then a couple more for the next. Likewise illegality can range from simple shoplifting to armed robbery to outright murder. (Interestingly enough, the analogy doesn't stop there. There are accepted exceptions to normalization, typically for caching or ease of access, like how murder in the name of self defense is sometimes excusable: you should still take a second look to make sure things are the way you think they are, but once confirmed maybe you can let things slide. But don't worry about that quite yet.) EAV is powerful but complex. You can represent just about anything applicable using it but at the cost of opaqueness and difficulty. If you know how it works you may be tempted to use it everywhere, and that's one way knowledge in general backfires, but if you can evaluate potential solutions to a problem in increasing order of sophistication (which often goes hand-in-hand with complexity) then you may be safe. - It's heavily normalized. That's kinda good as normalization is generally good, but kinda bad since you get lots of data and lots of tables and lots of overhead. - It shouldn't have anything to do with SEO. SEO focuses on your presentation of the data, not on the actual representation of the data in your database. Apples and oranges. - It does mean lots more rows, but these rows are generally small - often simply ID+name or ID+name+value - and unique across the table. And there's generally a point or three where additions to your "enum" and "lookup" tables plateau and you only deal with the entities that reference them. - Indexing that data means larger indexes in MySQL but searching is fast: for example, rather than MySQL scanning an entire column in a source table for matching values, it can scan the entire column in the much smaller attribute table and only then join up the source table. I would suggest looking more into it. It seems like you're describing a situation where different types of products have different attributes, and as long as that list of types is subject to change then EAV may be worth it. On the other hand if you have a handful of different products but that's not really going to change - at least not without significant work in the rest of the site - then EAV may be overkill and not worth the time. Also note that converting to and from EAV is fairly painless, should you decide one way now and change your mind later. Quote Link to comment Share on other sites More sharing options...
Zane Posted April 18, 2013 Author Share Posted April 18, 2013 EAV is powerful but complex. You can represent just about anything applicable using it but at the cost of opaqueness and difficulty. If you know how it works you may be tempted to use it everywhere - It shouldn't have anything to do with SEO. SEO focuses on your presentation of the data, not on the actual representation of the data in your database. Apples and oranges. - It does mean lots more rows, but these rows are generally small - often simply ID+name or ID+name+value - and unique across the table. And there's generally a point or three where additions to your "enum" and "lookup" tables plateau and you only deal with the entities that reference them. - Indexing that data means larger indexes in MySQL but searching is fast: for example, rather than MySQL scanning an entire column in a source table for matching values, it can scan the entire column in the much smaller attribute table and only then join up the source table. I would suggest looking more into it. It seems like you're describing a situation where different types of products have different attributes, and as long as that list of types is subject to change then EAV may be worth it. On the other hand if you have a handful of different products but that's not really going to change - at least not without significant work in the rest of the site - then EAV may be overkill and not worth the time. Also note that converting to and from EAV is fairly painless, should you decide one way now and change your mind later. I should not have said SEO, I meant more along the lines of searching the site. At the moment it is VERY difficult to even think of implementing a site search for things like - hickory hardwood - 5 inch maple - and so on. ....with one table per product type. My client has just now told me that he wants to add an additional product type for rugs, whose attributes are completely different from something like laminate floors. Rugs have all shapes and sizes and the client wants to add all of these shapes and sizes in one sitting rather than add them one at a time. My worry about EAV is that I am not sure how I would query it for search purposes, much less display purposes. Quote Link to comment Share on other sites More sharing options...
Zane Posted April 18, 2013 Author Share Posted April 18, 2013 ...And now I am being told to add yet another product type.. That's two completely new product types within the last hour. It's a pain in the ass to continue the way I am doing. I'm gonna have to check out this EAV bull and grab it by the horns I guess. Quote Link to comment Share on other sites More sharing options...
Barand Posted April 18, 2013 Share Posted April 18, 2013 I was looking at this problem a couple of weeks ago to come up with a data model Quote Link to comment Share on other sites More sharing options...
Zane Posted April 18, 2013 Author Share Posted April 18, 2013 This is nearly identical to what I just drew down on this piece of paper the only difference is that I will not have sub-categories.... though I suppose it couldn't hurt to have them available to function just in case... Maybe I'm just crazy but to me it seems more appropriate to name the prod_attributes table to allproducts.. since it would in fact have all the products Ok, so given that schema,... what would be the typical SQL Statement to grab a certain product and all of its attributes. I would imagine it to look something like this SELECT * FROM allproducts a INNER JOIN attributes b ON b.attid = a.attid WHERE a.productid = 123 Quote Link to comment Share on other sites More sharing options...
Barand Posted April 18, 2013 Share Posted April 18, 2013 you would probably want a join to the product table too to pull in the product name and any other product specific data items. And I wouldn't use SELECT *, especially with joins as you get duplicated matching values Quote Link to comment Share on other sites More sharing options...
Zane Posted April 18, 2013 Author Share Posted April 18, 2013 product specific data items. That is exactly the problem I am running into... there are no product specific data per se... For instance.. some floor types are priced by the square foot, some by square yard... some by pallets... etcetera.. Even the name of the product is different... When the salesmen in the warehouse/store are selling to physical in your face customers they refer to products by a combined primary key... like, "We have some 5 in Hickory Gunstock over here" ..... "Nah, thanks, I was looking for more of a 3 and a quarter inch Maple Cinnamon".. .... where the beginning is the width, the second is the specie of wood, and the third is a colorname. So I can't exactly put the price or "name" of the items in the product table as you illustrated. It seems as though EVERYTHING would be an attribute.. Quote Link to comment Share on other sites More sharing options...
Barand Posted April 19, 2013 Share Posted April 19, 2013 so you have CATEGORY id | name ---|------ 1 | rifle ATTRIBUTES catid | attid | attname ------|-------|------------- 1 | 1 | stock width 1 | 2 | stock material 1 | 3 | stock colour PRODUCT catid | prodid | prodname ------|--------|------------- 1 | 1 | gun 1 1 | 2 | gun 2 1 | 3 | gun 3 ALLPRODUCTS prodid | attid | value -------|-------|------------- 1 | 1 | 5 1 | 2 | hickory 1 | 3 | gunstock 2 | 1 | 3.25 2 | 2 | maple 2 | 3 | cinnamon Quote Link to comment Share on other sites More sharing options...
Barand Posted April 19, 2013 Share Posted April 19, 2013 I was testing the feasibility of the model and tried a product comparison script $sql = "SELECT prodid, prodname FROM product WHERE catid = 4"; $res = $db->query($sql); // mysqli $empty = array(); $output = "<tr><td> </td>"; while (list($id,$prod) = $res->fetch_row()) { $empty[$id] = ' '; $output .= "<th>$prod</th>"; } $output .= "</tr>\n"; $query = "SELECT p.prodid, a.attname, pa.prodvalue FROM product p LEFT JOIN product_attribute pa ON p.prodid = pa.prodid AND CURDATE() BETWEEN pa.datefrom AND pa.dateuntil LEFT JOIN attribute a USING (attid) WHERE p.catid = 4 ORDER BY a.catid, a.attname "; $res = $db->query($query); $currAtt = ''; $values = $empty; while (list($pid, $att, $val) = $res->fetch_row()) { if ($currAtt != $att) { if ($currAtt) { $output .= "<tr><td class='att'>$currAtt</td><td>" . join('</td><td>', $values) . "</td></tr>\n"; } $currAtt = $att; $values = $empty; } $values[$pid] = $val; } $output .= "<tr><td class='att'>$currAtt</td><td>" . join('</td><td>', $values) . "</td></tr>\n"; ?> <html> <head> <style type="text/css"> th { background-color: #369; color: white; font-family: sans-serif; font-size: 9pt; } td { background-color: #EEE; font-family: sans-serif; font-size: 9pt; text-align: center; } td.att { background-color: #CCC; text-align: left; } </style> </head> <body> <table border='0' cellpadding="4"> <?php echo $output ?> </table> </body> </html> Result and dump attached product.zip Quote Link to comment Share on other sites More sharing options...
Zane Posted April 22, 2013 Author Share Posted April 22, 2013 Wow! This will help me soo much! Many thanks for these stepping stones, Barand. Quote Link to comment Share on other sites More sharing options...
Zane Posted May 21, 2013 Author Share Posted May 21, 2013 (edited) Now I'm stumped, I want to grab just two attributes for a particular product... Using this query, SELECT p.prodid, p.footage, p.stylename, p.stylenumber, p.colorname, p.colornumber, p.width, pa.value as grade, pa.prodattid, locs.city as location FROM product p LEFT JOIN product_attribute pa ON p.prodid = pa.prodid LEFT JOIN attribute a USING (attid) I can retrieve most of the fields I want, but I can only retrieve ONE attribute. For instance, if I want both quality and grade... I would assume I could just add it to the WHERE clause WHERE p.catid = 4 AND ( a.attname = 'grade' OR a.attname = 'quality' ) But when I do that I get two rows per product.. So I tried to group by p.prodid, which left me wondering from where that second attribute was accessible. I even tried to JOIN a second time on the product attributes table but I could not get the results I was looking for. Surely I am doing something wrong, but I just can't put my finger on it.. Edited May 21, 2013 by Zane Quote Link to comment Share on other sites More sharing options...
Zane Posted May 22, 2013 Author Share Posted May 22, 2013 I tried a different approach, but I'm afraid it barely works...and barely is an understatement. SELECT p.prodid, p.footage, p.stylename, p.stylenumber, p.colorname, p.colornumber, p.width, pa.prodattid, locs.city as location, IF(a.attname = 'grade', pa.value, NULL) as grade, IF(a.attname = 'quality', pa.value, NULL) as quality FROM products p LEFT JOIN product_atts pa ON p.prodid = pa.prodid LEFT JOIN attributes a USING (attid) LEFT JOIN locs ON locs.code = p.location WHERE p.catid = $category GROUP BY p.prodid I know that it "barely" works because I know for a fact that products with a catid of 1 all have a grade and a quality. The results I get from this are NULL values for all grades and NULL values for all qualities. Quote Link to comment Share on other sites More sharing options...
Zane Posted May 22, 2013 Author Share Posted May 22, 2013 I just decided to give in an add columns for grade and quality to the products table. Quote Link to comment Share on other sites More sharing options...
Barand Posted May 22, 2013 Share Posted May 22, 2013 A few years ago I work on a Micro$oft Sharepoint database which used a similar model for attribute storage. To get the returned rows with the attributes all in the same row as the item I had to use a series of subqueries eg SELECT prodid, prodname, attribs.* FROM product p INNER JOIN ( SELECT DISTINCT a5.prodid, a1.`price`, a4.`stock material`, a5.`stock color` FROM (SELECT prodid, value as `price` FROM product_attribute WHERE attid = 1 AND CURDATE() BETWEEN datefrom AND dateuntil ) as a1 INNER JOIN (SELECT prodid, value as `stock material` FROM product_attribute WHERE attid = 4 AND CURDATE() BETWEEN datefrom AND dateuntil ) as a4 USING (prodid) INNER JOIN (SELECT prodid, value as `stock color` FROM product_attribute WHERE attid = 5 AND CURDATE() BETWEEN datefrom AND dateuntil ) as a5 USING (prodid) ) as attribs USING (prodid) WHERE p.catid = 4; +--------+----------+--------+----------------+-------------+ | prodid | prodname | price | stock material | stock color | +--------+----------+--------+----------------+-------------+ | 1 | Rifle 1 | 469.99 | Hickory | - | | 2 | Rifle 2 | 399.99 | Maple | Cinnamon | +--------+----------+--------+----------------+-------------+ To make my life easier I wrote a function which would generate the required subqueries from an array of the required attributes. That's what I have being doing again when I saw your question $db = new mysqli(HOST, USERNAME, PASSWORD, 'product' ); $atts = array (1,2,3,4,5); $subquery = attribute_subquery($db, $atts); $sql = "SELECT prodname, attribs.* FROM product p INNER JOIN ( $subquery ) as attribs USING (prodid) WHERE p.catid = 4 ORDER BY prodname"; // // function to create subqueries // function attribute_subquery ($db, $atts) { $ids = join(',', $atts); $sql = "SELECT attid, attname FROM attribute WHERE attid IN ($ids)"; $res = $db->query($sql); $txt = ''; $attnames = array(); $joins = array(); $i = 0; while (list($id, $name) = $res->fetch_row()) { $attnames[] = "a{$id}.`$name`"; $joins[] = "\n(SELECT prodid, value as `$name` FROM product_attribute WHERE attid = $id AND CURDATE() BETWEEN datefrom AND dateuntil ) as a$id " . ($i>0 ? "USING (prodid)" : ''); $lastid = $id; ++$i; } $attnames[] = "a{$lastid}.prodid"; $subquery = "SELECT " . join(',', $attnames) . "\nFROM " . join("\nINNER JOIN\n", $joins); return $subquery; } Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.