Jump to content

Recursive SQL Function / Drill down in relational MySQL Tables


a.stilliard

Recommended Posts

 

Hello All,

 

I have an new issue regarding drilling down in a sql statement and wondered if anyone could offer me some help/advice on the matter.

 

I have built an app that lists categories on our ecommerce sites, it just lists them in order of top level ones and underneath all the related (sub)categories until you get to the bottom level where the products are. Very simple. Heres the problem, we have now introduced make model and year filters onto the sites and now the lists must only display those categories which have products in them at the lowest (sub)category level which relate to the filters selected.

 

So drawing up the select on the products inside is easy, i have the sql all written for that, but its the drilling down from top level categories whose sub categories may have further sub categories (essentially multiple other layers of sub catgeories, probably up to 5 or 6 levels) that will eventually drill down to products relating to the selected filters.

 

I thought for this the best way would be to do this similar to my php function that loops the categories originally, which is a simple recursize function.  I've now writen a similar thing in SQL, unfortunately i did not realise you cannot have recursize functions in SQL, or at least in my version as it throws the following error "Recursive stored functions and triggers are not allowed."

 

I'm now attempting to create a procedure to do the same and then a function that would call that procedure so i can use it in the select satement. The end result would be a function i could attach onto the where statements of my current app, which would just say if the current category (by id) contains products or sub cats with eventual  products that relate to the search criteria. Although for now im just trying to do the part without the filter, and add in the filtering part after. 

 

Here is the sql function i wrote which demonstrates what im trying to acheive. Ive removed the filter parts because they are simple but very long and not really to do with the current issue, although they are the original cause.

 

If you have any advise on this at all or if you can see im heading in the wrong direction with this please let me know.

Also this "Recursive stored functions and triggers are not allowed.", is this relating only to older versions of mysql? im currently using 5.0.9, do you think its worth upgrading?

 

DROP FUNCTION IF EXISTS `relatedSubs`;
DELIMITER $$
CREATE FUNCTION relatedSubs(catid INT(20))
RETURNS BOOLEAN 
BEGIN 
  
  DECLARE subProductsFound BOOLEAN;
  DECLARE subCatsFound BOOLEAN;
  
  SET subProductsFound = (
  	SELECT COUNT(id) 
    FROM products p 
    WHERE  p.category   = catid 
    LIMIT 0, 1
  ); 
  IF ( subProductsFound>0 ) 
  	THEN RETURN 1;
  END IF;
  
  SET subCatsFound = (
  	SELECT COUNT(id) 
    FROM categories c 
    WHERE c.relatedid=catid
    AND relatedSubs(c.id) 
    LIMIT 0, 1
  ); 
  IF ( subCatsFound>0 ) 
  	THEN RETURN 1;
  END IF;
  
  RETURN 0;
  
END$$
DELIMITER ;

# STATEMENT HERE 
SELECT * FROM categories WHERE relatedSubs(id) AND relatedid=0;

 

Link to comment
Share on other sites

Just incase this helps anyone reconise my issue, i have added the following comment to the other forum where this has been posted.

 

 

The table structure that will come into use for this function is just the primary key id and a relation to other categories via relatedid, relatedid will be 0 if top level.

 

Ive taken out the filters to make the statement much more simple, the filters are not so easy to show examples of, but if you were to think of this as only showing categories where products exist at a lower level it makes more sense. Sorry for mentioning the filters before, thats just complicated the matter i think.

 

The end result needs to just be able to drill down through related categories, where there are any products at the lowest category level.

 

So the most basic table i can re-create with this would be...

 

CREATE TABLE `categories` (
  `id` int(20) NOT NULL auto_increment,
  `category_name` varchar(100) NOT NULL,
  `relatedid` int(20) NOT NULL,
  PRIMARY KEY  (`id`) 
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

CREATE TABLE `products` (
  `id` int(20) NOT NULL auto_increment,
  `product_name` varchar(100) NOT NULL,
  `category` int(20) NOT NULL,
  PRIMARY KEY  (`id`) 
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

 

For example, there could be a few top level categories ( categories with no related id ) each with a couple of (sub)categories relating to them, and again for several levels until one of those categories has products.

 

So I need to build my function to be able to drill from top-level categories downwards where there are products in lower level related categories and to return a true or false value if that condition exists.

 

Link to comment
Share on other sites

 

Im aware my explainations maybe causing more confussion than help, maybe this makes sense,

 

I need a function or proceedure to feed in the ID of a category to perform something like the following, but over an unlimited number of levels of categories.

 

SELECT * 
FROM categories c1
WHERE EXISTS(
  SELECT null 
  FROM products p 
  WHERE p.category = c1.id 
)
OR EXISTS(
  SELECT null
  FROM categories c2
  WHERE c2.relatedid = c1.id
  AND EXISTS(
    SELECT null 
    FROM products p
    WHERE p.category   = c2.id
  )
)

 

This SQL here was what i had before the function i built but i would need to literally have the EXISTS statement repeating for each level and i will be adding many filter params to this later so having it repeat many times will cause problems later on.

 

Hope this extra bit sparks some more help. 

Thanks in advance for anything of use.

 

Link to comment
Share on other sites

  • 3 weeks later...

@fenway

 

Because its a perfectly logical pattern, its just the mysql "adjacency list model" or parent-child model.

When we started this application we had no intention for this facility and therefor chose not to go down the more complex mysql "nested set model".

 

I have since posting this added support for using the nested set model into my table, using left right and level columns, and i have been able to get the script work.

Unfortunately its now very slow, here is the SQL i have, if anyone knows how i can speed this up please let me know as I'm new to using this model.

 

 

  SELECT parent.id, parent.category_name, parent.relatedid, parent.sort_order,

  IF((

    SELECT node.id

    FROM categories node

    WHERE node.lft BETWEEN parent.lft AND parent.rgt

    AND EXISTS (

        SELECT null FROM products p

        WHERE p.category = node.id

    )

    LIMIT 0, 1

  ) > 0, 1, 0) AS productsFound

  FROM categories parent

  HAVING productsFound > 0

  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id

 

 

Link to comment
Share on other sites

With that query i posted? yep.

Baring in mind though there's about 1000 categories and 30,000 products, it takes from 45 seconds to several minutes to load testing in phpmyadmin, i need it to run in around a second or 2 at most.

I have indexes set-up on the fields referenced, and i have tried to sort of optimize my query, but maybe i have missed something crucial.

 

Link to comment
Share on other sites

 

Fenway, sorry my reply is late, i am close to cracking the query now.

The only bit slowing it down now is the part where i have 5 different category fields on the products table to check against.

When im running it against the first category field, it works well and fast, then as i add the others (currently via UNION'S) it drasticly increases the speed !

Each product will definetly relate a category by the "category" field, but may also may relate with the "category_2" field up to _5.

 

The following 2 querys just list top level categories.

 

FAST QUERY, but only one category:

EXPLAIN   
  SELECT distinct parent.id, parent.category_name, parent.relatedid, parent.sort_order, 
  IF((
    SELECT 1
    FROM categories sub
    WHERE sub.status != 'deleted' 
    AND sub.status != 'draft'
    AND sub.relatedid = parent.id
    LIMIT 0, 1
  ) > 0, 1, 0) AS subsFound 
  FROM categories parent
  INNER JOIN (
    SELECT node.lft 
    FROM categories node 
    INNER JOIN products p ON p.category = node.id
    WHERE 1 
    AND node.status != 'deleted' 
    AND node.status != 'draft' 
    AND p.status != 'deleted' 
    AND p.status != 'draft' 
    GROUP BY node.lft
  ) AS node
  ON node.lft BETWEEN parent.lft AND parent.rgt 
  WHERE parent.relatedid = '0'
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id 

 

New SLOW SQL, but 5 related categorys to each product :

EXPLAIN   
  SELECT distinct parent.id, parent.category_name, parent.relatedid, parent.sort_order, 
  IF((
    SELECT 1
    FROM categories sub
    WHERE sub.status != 'deleted' 
    AND sub.status != 'draft'
    AND sub.relatedid = parent.id
    LIMIT 0, 1
  ) > 0, 1, 0) AS subsFound 
  FROM categories parent
  INNER JOIN (
    SELECT node.lft 
    FROM categories node 
    INNER JOIN (
    	SELECT category AS cat_id, status, id FROM products
        UNION DISTINCT 
    	SELECT category_2 AS cat_id, status, id FROM products
        UNION DISTINCT 
    	SELECT category_3 AS cat_id, status, id FROM products
        UNION DISTINCT 
    	SELECT category_4 AS cat_id, status, id FROM products
        UNION DISTINCT 
    	SELECT category_5 AS cat_id, status, id FROM products
    ) p ON p.cat_id = node.id
    WHERE 1 
    AND node.status != 'deleted' 
    AND node.status != 'draft' 
    AND p.status != 'deleted' 
    AND p.status != 'draft' 
    GROUP BY node.lft
  ) AS node
  ON node.lft BETWEEN parent.lft AND parent.rgt 
  WHERE parent.relatedid = '0'
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id 

 

EXPLAIN Output:

 

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 258 Using temporary; Using filesort
1 PRIMARY parent ALL NULL NULL NULL NULL 879 Using where
3 DERIVED <derived4> ALL NULL NULL NULL NULL 49488 Using where; Using temporary; Using filesort
3 DERIVED node eq_ref PRIMARY PRIMARY 4 p.cat_id 1 Using where
4 DERIVED products ALL NULL NULL NULL NULL 24743  
5 UNION products ALL NULL NULL NULL NULL 24743  
6 UNION products ALL NULL NULL NULL NULL 24743  
7 UNION products ALL NULL NULL NULL NULL 24743  
8 UNION products ALL NULL NULL NULL NULL 24743  
NULL UNION RESULT <union4,5,6,7,8> ALL NULL NULL NULL NULL NULL  
2 DEPENDENT SUBQUERY sub ALL NULL NULL NULL NULL 879 Using where

 

 

Any ideas of how to crack this last part.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.