Jump to content

Is a single JSON MySQL retrieval faster than individual rows?


DeX
Go to solution Solved by Psycho,

Recommended Posts

I have a quoting system which calculates the quantity and other attributes of 360 different building products every time a quote is generated in order to calculate a final cost. When the quote is performed, I'm saving each individual product as a row in a database table with each attribute (length, width, quantity, price) as columns in that table. Right now I'm just about to cross over 3,000,000 records in the table and it takes quite a while to perform queries each time a quote retrieval is requested by a user. I'm wondering if there would be any performance increase if I were to start storing all products in the table in a single JSON string taking up only one row and then parsing the string once it is retrieved.

Link to comment
Share on other sites

A blob of data (like JSON) vs. relational data shouldn't be a question of performance. It's about use cases.

 

3M rows isn't actually that large. Well, relative to what other people and companies do with it. What's the structure of the table and what kinds of queries are you running on it that are taking so long?

Link to comment
Share on other sites

It's a simple table with an ID, a quote_id (to match up to the quote), a product_id (to match up to the product) and then each other column is an attribute like length, width, height, quantity and price.

When loading a purchase order to view or edit, I need to pull out each material of a certain type (lumber, metal, screws) and display the relevant materials on that purchase order with their attributes. The attributes are also building specific because the lumber and metal lengths / quantities will change per building.

Link to comment
Share on other sites

Actually I was thinking something more... precise. What does a SHOW CREATE TABLE whatever output, and what is the actual SQL for those queries?

 

While I'm here, take each of those queries and execute them with an "EXPLAIN" in front. So EXPLAIN SELECT.... That'll output a table which explains how the query executes and can indicate where potential problems are with (though it's not easy to read if you're unfamiliar with it).

Link to comment
Share on other sites

To add some context to why requinix is asking for this information: There are things that can/should be done to configure a database schema and/or queries to be efficient. But HOW you accomplish that directly depends on how the data is to be used.

 

With respect to the database schema, you will typically want to index any fields that are used for JOINing tables, used in frequent filter conditions, or used in similar contexts.

 

Also, in the actual queries it is possible to write two different queries to return the same results where one is exponentially faster than the other.

 

It is impossible to explain all the ins and outs of how to work with a database efficiently in a forum post. There are whole books written on the subject. So, we need to see the actual details of your DB schema and queries to provide specific advice.

Link to comment
Share on other sites


| match_quote_product | CREATE TABLE `match_quote_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`quote_id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`width` varchar(255) NOT NULL,
`length` varchar(255) NOT NULL,
`height` varchar(255) NOT NULL,
`location` varchar(100) NOT NULL,
`quantity` varchar(255) NOT NULL,
`locking_handle_side` varchar(100) NOT NULL,
`power` varchar(100) NOT NULL,
`insulated` tinyint(4) NOT NULL,
`customer_supplied` tinyint(4) NOT NULL,
`storey` varchar(100) NOT NULL,
`window_inserts` int(11) NOT NULL,
`board_feet` double NOT NULL,
`price_per_k` double NOT NULL,
`unit_price` double NOT NULL,
`total_price_calculation_id` int(11) NOT NULL,
`purchase_order_id` int(11) NOT NULL,
`colour_type_id` int(11) NOT NULL,
`total_price` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3248453 DEFAULT CHARSET=latin1 |


mysql> explain select product_id, match_quote_product.price_per_k, match_quote_product.board_feet, match_quote_product.location, match_quote_product.power, match_quote_product.locking_handle_side, match_quote_product.quantity, width, length, height, colours.name, match_quote_product.unit_price, total_price, match_quote_product.total_price_calculation_id from building_products, match_quote_product, match_quote_colour, colours where building_products.id = product_id && match_quote_product.colour_type_id = match_quote_colour.colour_type_id && match_quote_colour.colour_id = colours.id && match_quote_product.quote_id = 50000 && match_quote_product.quote_id = match_quote_colour.quote_id && building_products.purchase_order_id = 5 order by purchase_order_order asc;
+----+-------------+---------------------+------------+--------+---------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+--------+---------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------------+
| 1 | SIMPLE | match_quote_product | NULL | ALL | NULL | NULL | NULL | NULL | 2846461 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | building_products | NULL | eq_ref | PRIMARY | PRIMARY | 4 | integrity_development.match_quote_product.product_id | 1 | 10.00 | Using where |
| 1 | SIMPLE | match_quote_colour | NULL | ALL | NULL | NULL | NULL | NULL | 573276 | 1.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | colours | NULL | eq_ref | PRIMARY | PRIMARY | 4 | integrity_development.match_quote_colour.colour_id | 1 | 100.00 | NULL |
+----+-------------+---------------------+------------+--------+---------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------------+
4 rows in set, 1 warning (0.03 sec)

Link to comment
Share on other sites

Your problem is your query. You need to join the tables. That is the whole reason it is so slow. The WHERE clause is killing it.

 

This kind of thing should be done in a JOIN

WHERE    building_products.id = product_id

Also, name is a reserved word. Change it to something like color_name.

 

Unless height, width, length, and quantity are something other than numbers, you shouldn't be using varchar for the column type.

Edited by benanamen
Link to comment
Share on other sites

  • Solution

Also, all the fields used to JOIN the tables should be indexed. Pretty much anything with "id" int he field name in your case.

 

I was trying to rewrite the query, but without knowing the schema on all the tables it was getting confusing. I *think* your DB design is flawed. The "match_quote_product" appears to be the table to list out the products included in a quote. So, why is there a field for purchase_order_id? Are there different purchase orders for different items int he same quote? There should be a general "purchase_order" table that contains the header data (e.g. date, customer, PO, etc.) Then the details table should include all the items in the quote that reference back to the main record.

 

There also seem to be some circular references that could be a performance issue as well.

 

But, here is my attempt at rewriting the query. But, it may not work. I wouodl have to really understand all the relationships to be sure which would take more time than I am willing to invest. And even then, I would probably have to make changes to the schema as well.

 

SELECT mqp.product_id
       mqp.price_per_k, mqp.board_feet, mqp.location, mqp.power,
       mqp.locking_handle_side, mqp.unit_price, mqp.total_price_calculation_id, mqp.quantity,
       width, length, height, colours.name, total_price
 
FROM match_quote_product mqp
JOIN building_products bp
  ON mqp.product_id = bp.id
JOIN match_quote_colour mqc
  ON mqp.colour_type_id = mqc.colour_type_id
     AND mqp.quote_id = mqc.quote_id
JOIN colours c
  ON mqc.colour_id = c.id
 
WHERE match_quote_product.quote_id = 50000
  AND building_products.purchase_order_id = 5
 
ORDER BY purchase_order_order ASC
Link to comment
Share on other sites

  • 4 weeks later...

I'm back from vacation and hoping to get this resolved now, thanks for being patient while I was away.

 

Your problem is your query. You need to join the tables. That is the whole reason it is so slow. The WHERE clause is killing it.

 

This kind of thing should be done in a JOIN

WHERE    building_products.id = product_id

Also, name is a reserved word. Change it to something like color_name.

 

Unless height, width, length, and quantity are something other than numbers, you shouldn't be using varchar for the column type.

 

Height, width, length and quantity are attributes of doors and windows being added to the building. Because the use is able to continuously add as many doors and windows as they like, I store each set of attributes in a JSON string, that's why they're varchar. For example, if the user enters a 3x4 window and a 3x5 window, the heights column will get the following data:

["4","5"]

and width will get:

["#","3"]

I may be doing this wrong but it's the best way I've found so far.

 

 

Also, all the fields used to JOIN the tables should be indexed. Pretty much anything with "id" int he field name in your case.

 

I was trying to rewrite the query, but without knowing the schema on all the tables it was getting confusing. I *think* your DB design is flawed. The "match_quote_product" appears to be the table to list out the products included in a quote. So, why is there a field for purchase_order_id? Are there different purchase orders for different items int he same quote? There should be a general "purchase_order" table that contains the header data (e.g. date, customer, PO, etc.) Then the details table should include all the items in the quote that reference back to the main record.

 

There also seem to be some circular references that could be a performance issue as well.

 

But, here is my attempt at rewriting the query. But, it may not work. I wouodl have to really understand all the relationships to be sure which would take more time than I am willing to invest. And even then, I would probably have to make changes to the schema as well.

SELECT mqp.product_id
       mqp.price_per_k, mqp.board_feet, mqp.location, mqp.power,
       mqp.locking_handle_side, mqp.unit_price, mqp.total_price_calculation_id, mqp.quantity,
       width, length, height, colours.name, total_price
 
FROM match_quote_product mqp
JOIN building_products bp
  ON mqp.product_id = bp.id
JOIN match_quote_colour mqc
  ON mqp.colour_type_id = mqc.colour_type_id
     AND mqp.quote_id = mqc.quote_id
JOIN colours c
  ON mqc.colour_id = c.id
 
WHERE match_quote_product.quote_id = 50000
  AND building_products.purchase_order_id = 5
 
ORDER BY purchase_order_order ASC

 

The reason I have a purchase_order_id for each product is because the administrators control which products show up on which purchase orders in the system. I need to store which purchase order the item was on when the quote was made in case it changes later on. I want the item to be put on its original purchase order, now the one it was changed to, if that makes sense. I'm already aware there is a better way for this, I plan on storing all edits in the main purchase order table and then just referencing back to the purchase order that item appeared on at the time period the quote was done. It's on the to do list.

 

I very much appreciate you supplying me with the new SQL, thanks a lot, however it didn't quite work. I made some edits to yours and got the following:

SELECT mqp.product_id,
mqp.price_per_k, mqp.board_feet, mqp.location, mqp.power,
mqp.locking_handle_side, mqp.unit_price, mqp.total_price_calculation_id, mqp.quantity,
width, length, height, c.name, total_price

FROM match_quote_product mqp
JOIN building_products bp
ON mqp.product_id = bp.id
JOIN match_quote_colour mqc
ON mqp.colour_type_id = mqc.colour_type_id
AND mqp.quote_id = mqc.quote_id
JOIN colours c
ON mqc.colour_id = c.id

WHERE mqp.quote_id = 50000
AND bp.purchase_order_id = 5

ORDER BY purchase_order_order ASC 

I had to:

- add a comma at the end of line 1

- change building_products to bp in the where clause

- change colours to c in the select statement

- change match_quote_product to mqp in the where clause

 

The query now works but returns 0 rows, I'm not sure why. I have to go through it and try to understand what it's doing but something must not match another column in the JOIN.

Edited by DeX
Link to comment
Share on other sites

 

Also, all the fields used to JOIN the tables should be indexed. Pretty much anything with "id" int he field name in your case.

 

 

Oh wow, that was it! I didn't know what a table index was so I did some research yesterday and applied indexes to 2 of the tables based on the quote_id. Loading the quote details dropped from 104 seconds to 4 seconds and loading a quote to edit dropped from 28 seconds to 0 seconds. This is amazing, thanks a lot! I haven't tested inserting the records, I hope that hasn't increased much as a result.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.