Jump to content

The EAV Model or something better?


Zane

Recommended Posts

For the past few years, I have been designing a database & web site for a client in quite a mediocre style IMO.  The site design is something that can always be changed, modified, and more or less completely overhauled.  My question is in the database design.

 

 

In this current semester I have been taking a Database Processing course.  Luckily, with what SQL I have learned from messing with PHP/MySQL I have made over the years, I have a firm *enough* grasp on SQL to pass this course.  What shocked me in the beginning though, and even more for the rest of the class I would presume, was the early introduction of normalization.  In the first few weeks we did not even touch a computer and the professor tried his damndest .. with English as his second language, to teach us the various Normal Forms.  Normalization I have heard of, but normal forms?  That is an entirely new level of SQL understanding I have yet to achieve and needless to say... when the professor testing the class on what we had learned... we all did so horrible that he only counted the first question... of 5 questions.

 

We have stepped away from Normal Forms since then and have focused primarily on SQL Statements which is great for me, but the idea of normal forms still sits in the back of my head as something that I should know.

 

 

Since I am by no means an expert Database Administrator, I have been approaching my clients site with a very novice approach.  One table per product type.  Every product type has several similar attributes like colorname, stylename, price, etcetera... Yet every product type also has way more distinct attributes, like thickness, edge profiles, cleaning methods, installation methods, etcetera.  

 

I have googled around and came across something called the EAV Model.  When I first saw it I was stricken with awe.. and thought to myself that THIS was the answer to all of my problems............ a few google searches later, I hear nothing but horrible things about this model when it comes to SEO.. or in-site searches in general.  It is supposedly a nightmare of a "Normal Form" to use for something whose rows will be populated exponentially.  40 new products one day, 22 the next, 100 another day... and so on.

 

So my question, which I almost feel deserves to be in Application Design... is

What is the ideal method of normalization for a database that contains several product types which have both similar and dissimilar attributes?  Obviously a table for every product type is ridiculous and inefficient, because the owner constantly has a new attribute he wants to add causing me to have to hack and tweak and duct tape it into the system.

 

I really need some advice if I want to do this for a living or be able to claim that I know what I am doing when it comes to e-commerce website development.  Luckily, I convinced the client to go with Authorize.net so I do not have to create a shopping cart, but that is another story completely.

Link to comment
Share on other sites

Calling something "normalized" is like calling someone's actions "illegal". By just the terms themselves you can get a decent idea of what the person means to say, but the reality is there's varying degrees of both. Most everybody knows about normalization where you use a primary key ID (or other single, small piece of unique information) as a key for data across various tables. But there's also stuff about not using multiple tables for a single entity, or not including data in a table that doesn't belong there. Typically (and strictly speaking) normalization comes in different levels: you normalize for one criteria, then for a couple more to reach the next level, then a couple more for the next. Likewise illegality can range from simple shoplifting to armed robbery to outright murder.

(Interestingly enough, the analogy doesn't stop there. There are accepted exceptions to normalization, typically for caching or ease of access, like how murder in the name of self defense is sometimes excusable: you should still take a second look to make sure things are the way you think they are, but once confirmed maybe you can let things slide. But don't worry about that quite yet.)

 

EAV is powerful but complex. You can represent just about anything applicable using it but at the cost of opaqueness and difficulty. If you know how it works you may be tempted to use it everywhere, and that's one way knowledge in general backfires, but if you can evaluate potential solutions to a problem in increasing order of sophistication (which often goes hand-in-hand with complexity) then you may be safe.

 

- It's heavily normalized. That's kinda good as normalization is generally good, but kinda bad since you get lots of data and lots of tables and lots of overhead.

- It shouldn't have anything to do with SEO. SEO focuses on your presentation of the data, not on the actual representation of the data in your database. Apples and oranges.

- It does mean lots more rows, but these rows are generally small - often simply ID+name or ID+name+value - and unique across the table. And there's generally a point or three where additions to your "enum" and "lookup" tables plateau and you only deal with the entities that reference them.

- Indexing that data means larger indexes in MySQL but searching is fast: for example, rather than MySQL scanning an entire column in a source table for matching values, it can scan the entire column in the much smaller attribute table and only then join up the source table.

 

I would suggest looking more into it. It seems like you're describing a situation where different types of products have different attributes, and as long as that list of types is subject to change then EAV may be worth it. On the other hand if you have a handful of different products but that's not really going to change - at least not without significant work in the rest of the site - then EAV may be overkill and not worth the time. Also note that converting to and from EAV is fairly painless, should you decide one way now and change your mind later.

Link to comment
Share on other sites

EAV is powerful but complex. You can represent just about anything applicable using it but at the cost of opaqueness and difficulty. If you know how it works you may be tempted to use it everywhere

 

- It shouldn't have anything to do with SEO. SEO focuses on your presentation of the data, not on the actual representation of the data in your database. Apples and oranges.

- It does mean lots more rows, but these rows are generally small - often simply ID+name or ID+name+value - and unique across the table. And there's generally a point or three where additions to your "enum" and "lookup" tables plateau and you only deal with the entities that reference them.

- Indexing that data means larger indexes in MySQL but searching is fast: for example, rather than MySQL scanning an entire column in a source table for matching values, it can scan the entire column in the much smaller attribute table and only then join up the source table.

 

I would suggest looking more into it. It seems like you're describing a situation where different types of products have different attributes, and as long as that list of types is subject to change then EAV may be worth it. On the other hand if you have a handful of different products but that's not really going to change - at least not without significant work in the rest of the site - then EAV may be overkill and not worth the time. Also note that converting to and from EAV is fairly painless, should you decide one way now and change your mind later.

 

I should not have said SEO, I meant more along the lines of searching the site.  At the moment it is VERY difficult to even think of implementing a site search for things like

- hickory hardwood

- 5 inch maple

- and so on.

 

....with one table per product type.  My client has just now told me that he wants to add an additional product type for rugs, whose attributes are completely different from something like laminate floors.  Rugs have all shapes and sizes and the client wants to add all of these shapes and sizes in one sitting rather than add them one at a time.

 

My worry about EAV is that I am not sure how I would query it for search purposes, much less display purposes.

Link to comment
Share on other sites

...And now I am being told to add yet another product type..  That's two completely new product types within the last hour.  It's a pain in the ass to continue the way I am doing.  I'm gonna have to check out this EAV bull and grab it by the horns I guess.

Link to comment
Share on other sites

This is nearly identical to what I just drew down on this piece of paper the only difference is that I will not have sub-categories.... though I suppose it couldn't hurt to have them available to function just in case...

 

Maybe I'm just crazy but to me it seems more appropriate to name the prod_attributes table to allproducts..  since it would in fact have all the products

 

Ok, so given that schema,... what would be the typical SQL Statement to grab a certain product and all of its attributes.  I would imagine it to look something like this

SELECT * FROM allproducts a
INNER JOIN attributes b ON b.attid = a.attid
WHERE a.productid = 123
Link to comment
Share on other sites

product specific data items.

That is exactly the problem I am running into... there are no product specific data per se... For instance.. some floor types are priced by the square foot, some by square yard... some by pallets... etcetera..

 

Even the name of the product is different... 

 

When the salesmen in the warehouse/store are selling to physical in your face customers they refer to products by a combined primary key...

like, "We have some 5 in Hickory Gunstock over here" ..... "Nah, thanks, I was looking for more of a 3 and a quarter inch Maple Cinnamon"..

.... where the beginning is the width, the second is the specie of wood, and the third is a colorname.  So I can't exactly put the price or "name" of the items in the product table as you illustrated.  It seems as though EVERYTHING would be an attribute..

Link to comment
Share on other sites

so you have

CATEGORY
id | name
---|------
 1 | rifle
 
ATTRIBUTES
catid | attid | attname
------|-------|-------------
  1   |  1    | stock width
  1   |  2    | stock material
  1   |  3    | stock colour
  
PRODUCT
catid | prodid | prodname
------|--------|-------------
  1   |   1    |  gun 1
  1   |   2    |  gun 2
  1   |   3    |  gun 3
  
ALLPRODUCTS
prodid | attid | value
-------|-------|-------------
  1    |   1   |   5
  1    |   2   | hickory
  1    |   3   | gunstock
  2    |   1   |  3.25
  2    |   2   | maple
  2    |   3   | cinnamon
Link to comment
Share on other sites

I was testing the feasibility of the model and tried a product comparison script

$sql = "SELECT prodid, prodname
        FROM product
        WHERE catid = 4";
$res = $db->query($sql); // mysqli
$empty = array();
$output = "<tr><td> </td>";
while (list($id,$prod) = $res->fetch_row()) {
    $empty[$id] = ' ';
    $output .= "<th>$prod</th>";
}
$output .= "</tr>\n";

$query = "SELECT p.prodid, a.attname, pa.prodvalue
    FROM product p
        LEFT JOIN product_attribute pa
            ON p.prodid = pa.prodid
            AND CURDATE() BETWEEN pa.datefrom AND pa.dateuntil
        LEFT JOIN attribute a USING (attid)
    WHERE p.catid = 4
    ORDER BY a.catid, a.attname
    ";

$res = $db->query($query);
$currAtt = '';
$values = $empty;
while (list($pid, $att, $val) = $res->fetch_row()) {
    if ($currAtt != $att) {
        if ($currAtt) {
            $output .= "<tr><td class='att'>$currAtt</td><td>" .
                join('</td><td>', $values) . "</td></tr>\n";
        }
        $currAtt = $att;
        $values = $empty;
    }
    $values[$pid] = $val;
}
$output .= "<tr><td class='att'>$currAtt</td><td>" .
                join('</td><td>', $values) . "</td></tr>\n";

?>
<html>
<head>
<style type="text/css">
th {
    background-color: #369;
    color: white;
    font-family: sans-serif;
    font-size: 9pt;
}

td {
    background-color: #EEE;
    font-family: sans-serif;
    font-size: 9pt;
    text-align: center;
}

td.att {
    background-color: #CCC;
    text-align: left;
}
</style>
</head>
<body>
<table border='0' cellpadding="4">
    <?php echo $output ?>
</table>
</body>
</html>

Result and dump attached

post-3105-0-02200500-1366414144_thumb.png

product.zip

Link to comment
Share on other sites

  • 5 weeks later...

Now I'm stumped, I want to grab just two attributes for a particular product...

 

Using this query,

SELECT p.prodid, p.footage, p.stylename, p.stylenumber, p.colorname, p.colornumber, p.width, pa.value as grade, pa.prodattid, locs.city as location
FROM product p
LEFT JOIN product_attribute pa
ON p.prodid = pa.prodid
LEFT JOIN attribute a USING (attid)

I can retrieve most of the fields I want, but I can only retrieve ONE attribute.  

For instance, if I want both quality and grade... I would assume I could just add it to the WHERE clause

WHERE p.catid = 4 AND ( a.attname = 'grade' OR a.attname = 'quality' )

But when I do that I get two rows per product..  So I tried to group by p.prodid, which left me wondering from where that second attribute was accessible.  I even tried to JOIN a second time on the product attributes table but I could not get the results I was looking for.  Surely I am doing something wrong, but I just can't put my finger on it..

Edited by Zane
Link to comment
Share on other sites

I tried a different approach, but I'm afraid it barely works...and barely is an understatement.

 

SELECT p.prodid, p.footage, p.stylename, p.stylenumber, p.colorname, p.colornumber, p.width, pa.prodattid, locs.city as location,
IF(a.attname = 'grade', pa.value, NULL) as grade, IF(a.attname = 'quality', pa.value, NULL) as quality
    FROM products p
        LEFT JOIN product_atts pa
            ON p.prodid = pa.prodid
        LEFT JOIN attributes a USING (attid)
LEFT JOIN locs ON locs.code = p.location
WHERE p.catid = $category
GROUP BY p.prodid

I know that it "barely" works because I know for a fact that products with a catid of 1 all have a grade and a quality.  The results I get from this are NULL values for all grades and NULL values for all qualities.

Link to comment
Share on other sites

A few years ago I work on a Micro$oft Sharepoint database which used a similar model for attribute storage. To get the returned rows with the attributes all in the same row as the item I had to use a series of subqueries eg

SELECT prodid, prodname, attribs.*
        FROM product p
        INNER JOIN (
            SELECT DISTINCT a5.prodid, a1.`price`, a4.`stock material`, a5.`stock color`
            FROM 
                (SELECT prodid, value as `price`
                FROM product_attribute
                WHERE attid = 1
                AND CURDATE() BETWEEN datefrom AND dateuntil
                ) as a1
                INNER JOIN
                (SELECT prodid, value as `stock material`
                FROM product_attribute
                WHERE attid = 4
                AND CURDATE() BETWEEN datefrom AND dateuntil
                ) as a4 USING (prodid)
                INNER JOIN
                (SELECT prodid, value as `stock color`
                FROM product_attribute
                WHERE attid = 5
                AND CURDATE() BETWEEN datefrom AND dateuntil
                ) as a5 USING (prodid)
            
        ) as attribs USING (prodid)
        WHERE p.catid = 4;
        

+--------+----------+--------+----------------+-------------+
| prodid | prodname | price  | stock material | stock color |
+--------+----------+--------+----------------+-------------+
|      1 | Rifle 1  | 469.99 | Hickory        | -           |
|      2 | Rifle 2  | 399.99 | Maple          | Cinnamon    |
+--------+----------+--------+----------------+-------------+

To make my life easier I wrote a function which would generate the required subqueries from an array of the required attributes. That's what I have being doing again when I saw your question

$db = new mysqli(HOST, USERNAME, PASSWORD, 'product' );

$atts = array (1,2,3,4,5);
$subquery = attribute_subquery($db, $atts);
$sql = "SELECT prodname, attribs.*
        FROM product p
        INNER JOIN (
            $subquery
        ) as attribs USING (prodid)
        WHERE p.catid = 4
        ORDER BY prodname";
        

//
// function to create subqueries
//

function attribute_subquery ($db, $atts)
{
    $ids = join(',', $atts);
    $sql = "SELECT attid, attname
            FROM attribute
            WHERE attid IN ($ids)";
    $res = $db->query($sql);
    $txt = '';
    $attnames = array();
    $joins = array();
    $i = 0;
    while (list($id, $name) = $res->fetch_row()) {
        $attnames[] = "a{$id}.`$name`";
        $joins[] = "\n(SELECT prodid, value as `$name`
                FROM product_attribute
                WHERE attid = $id
                AND CURDATE() BETWEEN datefrom AND dateuntil
                ) as a$id " . ($i>0 ? "USING (prodid)" : '');
        $lastid = $id;
        ++$i;
    }
    $attnames[] = "a{$lastid}.prodid";
    $subquery = "SELECT " . join(',', $attnames)
        . "\nFROM " . join("\nINNER JOIN\n", $joins);
    return $subquery;
}

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.