Jump to content

Any tips on how to handle this riddle?


wrybread

Recommended Posts

I'm working on a form processed by PHP where multiple categories are saved to a MySQL table. Currently I have one table column for each category. For example I have a column in my table for "is_category1", and a column for "is_category2", and one for "is_category3", etc. (Those aren't the real column names, that's just for simplicity).

The problem is that we keep adding new categories, and I'd rather not add a new column every time we do. So I was thinking I could store the categories as a single string in a column called "categories" and have a character for each category. For example if the string is "100" it would mean is_category1=true, is_category2=false and is_category3=false. 

That all works very well.

The problem is that I don't see a good way to perform searches. For example if I want to find every item where is_category2==true, I don't know of a SQL query to do that without iterating through each item in the table and processing the "categories" string. And there are over 50,000 items in the table, so that would be unacceptably slow.

Does anyone happen to have any advice about how to handle this? Maybe there's a SQL search method that can process the Xth item of a string?

Or a better way to store the "categories" item than a string?

Or am I just doomed to have a table column for every category?

Thanks for any help.

Link to comment
Share on other sites

If these 'categories' are simply different values from a possible choice of values, then those fields you are creating are wrong.  Each of those fields represent the same thing so they should not be separate fields.

Create a 'Categories' table and create a row that has the primary key of your entity along with a category value.  An entity will have multiple categories that he/she belongs to so you will have multiple rows in the Categories table.  Any query you do you simply write to select the key from the first table and join it to the Categories table to produce a result set that may have multiple rows for each selected entity.

That is database normalizaton.   

Another thing - that categories table should probably use a code to represent the category with a 3rd table 'Category_name' that connects that code to a literal description for the category.  I think that is the second normal of 'normalization' - to avoid repeating a string in every record when a simple (shorter) code will do.

Link to comment
Share on other sites

What would be the disadvantage of having a column in the main table named "categories" and having a string value for all the categories an element belong to?

For example, the value could be "blue yellow purple orange" etc.

I guess the disadvantage is that it's slower to find all elements that are blue? 

Link to comment
Share on other sites

On 12/4/2021 at 1:25 PM, wrybread said:

What would be the disadvantage of having a column in the main table named "categories" and having a string value for all the categories an element belong to?

For example, the value could be "blue yellow purple orange" etc.

I guess the disadvantage is that it's slower to find all elements that are blue? 

 

Well yes, it violates relational database design rules, which are referred to as the rules of normalization or normal forms.  This is in violation of the most basic rule, meaning your table can not be in 1st normal form, by doing what you are trying to do.

See this article.

Quote

Each table cell should contain a single value.

Your instinct to put multiple values (categories) in a single column/row combination, tells you that you are going down the wrong path.  Furthermore, if your table has category1, category2, category3 etc, then that is a repeating group which is also incorrect.

There are many issues here: 

  • You can't add a new category position without changing the table structure & all associated queries
  • Queries are ugly and inefficient because you will need to index every category column and have a query that has code like "if category1 = 3 or category2 = 3 or category3 =3" etc.

What is the right structure?  You need to understand the relationship between the entities (thing, category).  Start with the relationship from "thing".  What have you told us?  "A thing can have many categories".  

So the relationship of thing -> category is "one thing to many categories".

Now look at the relationship from category to thing.  What do we know?  

"One category can define many things."  So the relationship from category -> thing is "one category to many things".

This tells you that the relationship between thing and category is actually Many to Many.    Think about it for a minute, using simple examples.  

 

Thing1  (category1, category2, category5)

category5 (thing1, thing20, thing1000)

 

Currently you don't have a table for category.  You need to make one.

category
--------
id  smallint unsigned primary key AUTO_INCREMENT
category varchar(80)


This table should be loaded with all your 1-n categories.

Create a table to resolve the many to many relationship between thing and category.  Typically people name a table like this "thing_category".  It only requires 2 values:

thing_category
--------------
thing_id 
category_id

 

You can give this table its own unique primary autoincrement key if you want or define the primary key to be thing_id,category_id.  It is important either way, that you have a key that guarantees uniqueness for a thing_id,category_id value.  

Hopefully can now see how you would use this table.   If I want to set thing1 to have categories 3,7 and 9, then I only need to insert  rows into thing_category  of (1, 3), (1, 7), (1,9).

To get the categories back out, you join the tables together.  Your queries are simple when you need to query for a particular category -- just inner join thing to thing_category and specify WHERE category_id = 7, or whatever you need.  If you want a few categories, the query can be WHERE category_id IN (3,7) etc.

This also makes your system configurable and data driven, as new categories can be added to the category table at any time, and you can start to make use of them without having to change code, since nothing will be hard wired in.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.