wrybread Posted December 4, 2021 Share Posted December 4, 2021 I'm working on a form processed by PHP where multiple categories are saved to a MySQL table. Currently I have one table column for each category. For example I have a column in my table for "is_category1", and a column for "is_category2", and one for "is_category3", etc. (Those aren't the real column names, that's just for simplicity). The problem is that we keep adding new categories, and I'd rather not add a new column every time we do. So I was thinking I could store the categories as a single string in a column called "categories" and have a character for each category. For example if the string is "100" it would mean is_category1=true, is_category2=false and is_category3=false. That all works very well. The problem is that I don't see a good way to perform searches. For example if I want to find every item where is_category2==true, I don't know of a SQL query to do that without iterating through each item in the table and processing the "categories" string. And there are over 50,000 items in the table, so that would be unacceptably slow. Does anyone happen to have any advice about how to handle this? Maybe there's a SQL search method that can process the Xth item of a string? Or a better way to store the "categories" item than a string? Or am I just doomed to have a table column for every category? Thanks for any help. Quote Link to comment Share on other sites More sharing options...
kicken Posted December 4, 2021 Share Posted December 4, 2021 You need to read up on database normalization. What you need to do is have a table where each category is a row, not a column and then join that to your other tables. 1 Quote Link to comment Share on other sites More sharing options...
ginerjm Posted December 4, 2021 Share Posted December 4, 2021 If these 'categories' are simply different values from a possible choice of values, then those fields you are creating are wrong. Each of those fields represent the same thing so they should not be separate fields. Create a 'Categories' table and create a row that has the primary key of your entity along with a category value. An entity will have multiple categories that he/she belongs to so you will have multiple rows in the Categories table. Any query you do you simply write to select the key from the first table and join it to the Categories table to produce a result set that may have multiple rows for each selected entity. That is database normalizaton. Another thing - that categories table should probably use a code to represent the category with a 3rd table 'Category_name' that connects that code to a literal description for the category. I think that is the second normal of 'normalization' - to avoid repeating a string in every record when a simple (shorter) code will do. Quote Link to comment Share on other sites More sharing options...
wrybread Posted December 4, 2021 Author Share Posted December 4, 2021 What would be the disadvantage of having a column in the main table named "categories" and having a string value for all the categories an element belong to? For example, the value could be "blue yellow purple orange" etc. I guess the disadvantage is that it's slower to find all elements that are blue? Quote Link to comment Share on other sites More sharing options...
gizmola Posted December 4, 2021 Share Posted December 4, 2021 On 12/4/2021 at 1:25 PM, wrybread said: What would be the disadvantage of having a column in the main table named "categories" and having a string value for all the categories an element belong to? For example, the value could be "blue yellow purple orange" etc. I guess the disadvantage is that it's slower to find all elements that are blue? Well yes, it violates relational database design rules, which are referred to as the rules of normalization or normal forms. This is in violation of the most basic rule, meaning your table can not be in 1st normal form, by doing what you are trying to do. See this article. Quote Each table cell should contain a single value. Your instinct to put multiple values (categories) in a single column/row combination, tells you that you are going down the wrong path. Furthermore, if your table has category1, category2, category3 etc, then that is a repeating group which is also incorrect. There are many issues here: You can't add a new category position without changing the table structure & all associated queries Queries are ugly and inefficient because you will need to index every category column and have a query that has code like "if category1 = 3 or category2 = 3 or category3 =3" etc. What is the right structure? You need to understand the relationship between the entities (thing, category). Start with the relationship from "thing". What have you told us? "A thing can have many categories". So the relationship of thing -> category is "one thing to many categories". Now look at the relationship from category to thing. What do we know? "One category can define many things." So the relationship from category -> thing is "one category to many things". This tells you that the relationship between thing and category is actually Many to Many. Think about it for a minute, using simple examples. Thing1 (category1, category2, category5) category5 (thing1, thing20, thing1000) Currently you don't have a table for category. You need to make one. category -------- id smallint unsigned primary key AUTO_INCREMENT category varchar(80) This table should be loaded with all your 1-n categories. Create a table to resolve the many to many relationship between thing and category. Typically people name a table like this "thing_category". It only requires 2 values: thing_category -------------- thing_id category_id You can give this table its own unique primary autoincrement key if you want or define the primary key to be thing_id,category_id. It is important either way, that you have a key that guarantees uniqueness for a thing_id,category_id value. Hopefully can now see how you would use this table. If I want to set thing1 to have categories 3,7 and 9, then I only need to insert rows into thing_category of (1, 3), (1, 7), (1,9). To get the categories back out, you join the tables together. Your queries are simple when you need to query for a particular category -- just inner join thing to thing_category and specify WHERE category_id = 7, or whatever you need. If you want a few categories, the query can be WHERE category_id IN (3,7) etc. This also makes your system configurable and data driven, as new categories can be added to the category table at any time, and you can start to make use of them without having to change code, since nothing will be hard wired in. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.