fastsol Posted March 14, 2021 Share Posted March 14, 2021 So I have a Customers table with normal fields and a standard ID primary key field. I am trying to find a way through a relationship method to get a collection of repeat customers. Sounds simple and can easily be done with query like so public function repeatCustomer() { return $this->where('id', '<>', $this->id) ->where(function($query){ $query->where('emai', $this->email) ->orWhere('phone', $this->phone); })->get(); } But that can't be loaded or eager loaded as a relationship. I don't have nor do I really want to have a parent_id type field in the table as there are already 15k records that aren't directly related now. As you can see in the query above I'm simply needing to find related customers by same email or phone. I have tried the hasMany relationship but no matter what relationship method I've tried I can't get the query to ignore the fact that it wants to find the Customer record by the primary key of the current $this model. Like this select * from `customers` where `customers`.`id` = 1887 and `customers`.`id` is not null and (`email` = blah.com or `phone` = 123-456-7890 It needs to not use this part of the query to work and still return a relationship. `customers`.`id` = 1887 Can this even be done with relationships? If not then I don't think I can do this at all since it can't be eager loaded and will cause a N+1 problem on several pages of the site. Quote Link to comment Share on other sites More sharing options...
Barand Posted March 14, 2021 Share Posted March 14, 2021 By "repeat customers" do you mean customers who have placed more then one order, or do you mean "duplicate" customers with more than 1 customer record? If it's the latter SELECT eamil , phone , GROUP_CONCAT(id SEPARATOR ', ') as duplicates FROM customers GROUP BY email, phone HAVING COUNT(*) > 0; Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 14, 2021 Author Share Posted March 14, 2021 Duplicate customers. The same person may have made multiple purchases under the same email or phone number in the Customers table. I need to do this with an Eloquent relationship though. Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 14, 2021 Author Share Posted March 14, 2021 This relationship query gets me the correct results but it doesn't when using with() or load() on the model for eager loading. return $this->hasMany($this, 'id')->where('customers.id', '<>', $this->id) ->orWhere(function($query){ $query->where('customers.id', '<>', $this->id) ->where(function($query){ $query->where('email', $this->email) ->orWhere('phone', $this->phone); }); }); So if I try the code below it only returns the original Customer in the repeatCustomer relation instead of all of them. $customer = Customer::with('repeatCustomer')->find(1887); But if I call the relationship AFTER and not via the with() or load() it returns the correct results. $customer = Customer::find(1887); $customer->repeatCustomer; Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 14, 2021 Author Share Posted March 14, 2021 Or could this maybe be done with a pivot table or a view table and then ran through a eloquent relationship? Quote Link to comment Share on other sites More sharing options...
Barand Posted March 14, 2021 Share Posted March 14, 2021 OK, call me old fashioned, but why don't you just write a query instead of struggling to find the right incantations to get the db classes to do it for you? Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 15, 2021 Author Share Posted March 15, 2021 2 hours ago, Barand said: OK, call me old fashioned, but why don't you just write a query instead of struggling to find the right incantations to get the db classes to do it for you? Because on several pages if I don't do it with eager loading it will cause an N+1 query problem. Plus I don't need or want to load this info on every page as there will also be many times I don't need the extra info. Maybe I'm splitting hairs on not needing all the info all the time. I'll consider it if there is no other way to do it. Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 15, 2021 Author Share Posted March 15, 2021 (edited) I'm getting much much closer. I ended up using a View table to build the query and then a hasOne eloquent relationship which is working as expected at the moment. @Barand I used your query above in the view with some additions. The only thing that isn't working with the query is lets say a customer is listed more than once in the db with the same email, then they are listed in your query results BUT if a customer is listed more than once with different emails but same phone numbers then they are not listed in your query results. I need the customer to be listed in either of those cases. Here is what I currently have as the View query. select `customers`.`email` AS `email` ,`customers`.`phone` AS `phone` ,group_concat(`customers`.`id` separator ',') AS `duplicates` from (`customers` join `quotes` on((`quotes`.`customer_id` = `customers`.`id`))) where (`quotes`.`purchased` = 1) group by `customers`.`email`,`customers`.`phone` having (count(0) > 1) One other thing I noticed is that querying the view takes much longer than any other query being performed on the page, upwards of 72ms which isn't terrible but considering that all the other queries are no more than 1.4ms it's a huge difference. I'm going to guess that it's because the data needs to be populated in the view first before it can be returned since it's not a static table of data. Does that sound right? Or is the view query itself that is causing the lag? Edited March 15, 2021 by fastsol Quote Link to comment Share on other sites More sharing options...
Barand Posted March 15, 2021 Share Posted March 15, 2021 perhaps this, then SELECT a.id , a.email , a.phone FROM customers a JOIN customers b ON ((a.email = b.email) OR (a.phone = b.phone)) AND a.id <> b.id ORDER BY a.id; Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 15, 2021 Author Share Posted March 15, 2021 5 minutes ago, Barand said: perhaps this, then SELECT a.id , a.email , a.phone FROM customers a JOIN customers b ON ((a.email = b.email) OR (a.phone = b.phone)) AND a.id <> b.id ORDER BY a.id; If that's supposed to be the entire query for the view, then it doesn't work. It gives me a ton of results that are all identical with the same id and email and no phone number. Quote Link to comment Share on other sites More sharing options...
Barand Posted March 15, 2021 Share Posted March 15, 2021 I see emails and phone numbers customers +----+------+------------+-------+ | id | name | email | phone | +----+------+------------+-------+ | 1 | aaa | aaa@x.com | 123 | | 2 | bbbb | bbbb@x.com | 124 | | 3 | ccc | bbbb@x.com | 124 | | 4 | ddd | ddd@x.com | 123 | | 5 | eee | eee@x.com | 125 | | 6 | fff | bbb@x.com | 126 | | 7 | ggg | ggg@x.com | 127 | | 8 | hhh | hhh@x.com | 128 | | 9 | iii | iii@x.com | 125 | | 10 | jjj | jjj@x.com | 129 | | 11 | kkk | kkk@x.com | 130 | | 12 | lll | lll@x.com | 127 | | 13 | mmm | mmm@x.com | 131 | | 14 | nnn | nnn@x.com | 132 | | 15 | ooo | aaa@x.com | 123 | | 16 | ppp | ppp@x.com | 133 | +----+------+------------+-------+ query SELECT DISTINCT a.id , a.email , a.phone FROM customers a JOIN customers b ON ((a.email = b.email) OR (a.phone = b.phone)) AND a.id <> b.id ORDER BY a.id; results +----+------------+-------+ | id | email | phone | +----+------------+-------+ | 1 | aaa@x.com | 123 | | 2 | bbbb@x.com | 124 | | 3 | bbbb@x.com | 124 | | 4 | ddd@x.com | 123 | | 5 | eee@x.com | 125 | | 7 | ggg@x.com | 127 | | 9 | iii@x.com | 125 | | 12 | lll@x.com | 127 | | 15 | aaa@x.com | 123 | +----+------------+-------+ Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 15, 2021 Author Share Posted March 15, 2021 @BarandThe problem with your recent query seems to be stemming from the fact that early records don't always have a phone number. Quote Link to comment Share on other sites More sharing options...
Barand Posted March 15, 2021 Share Posted March 15, 2021 That strikes me as a problem with the data, not the query. Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 15, 2021 Author Share Posted March 15, 2021 (edited) So here is the query that I finally got to work in a way that I can handle. The only thing I see as a potential problem is that the query is extremely slow, at least for the initial saving of the view. It seems to perform well enough when querying for a single record from the view. SELECT `a`.`id` AS `customer_id` FROM `customers` `a` WHERE ( SELECT `b`.`id` FROM ( `customers` `b` JOIN `quotes` ON ( ( `b`.`id` = `quotes`.`customer_id` ) ) ) WHERE ( ( (`a`.`email` = `b`.`email`) OR(`a`.`phone` = `b`.`phone`) ) AND(`a`.`id` <> `b`.`id`) AND(`quotes`.`purchased` = 1) ) GROUP BY `a`.`email` ) If I run this query in the phpmyadmin outside the view just as a regular query it takes about 75 seconds to load as it scans the full 10k+ records. I have indexes on every column that is used in that query within its respective table. I know views can be labor intensive so maybe there isn't anything that can be done. Also I filled in all the records that were missing phone numbers so that it would stop tripping on that. Those records are pretty old and aren't really a concern with their accuracy anymore. Edited March 15, 2021 by fastsol Quote Link to comment Share on other sites More sharing options...
Barand Posted March 15, 2021 Share Posted March 15, 2021 75 seconds seems excessively long! If you want to email me a dump of those two tables I'll see if can can speed it up for you. Quote Link to comment Share on other sites More sharing options...
Barand Posted March 16, 2021 Share Posted March 16, 2021 I have added quotes to my data and queries. This is my test data.. My query mysql> SELECT DISTINCT -> a.id -> FROM customers a -> JOIN -> customers b ON ((a.email = b.email) OR (a.phone = b.phone)) -> AND a.id <> b.id -> JOIN -> quotes qa ON qa.customer_id = a.id AND qa.purchased = 1 -> JOIN -> quotes qb ON qb.customer_id = b.id AND qb.purchased = 1 -> ORDER BY a.id; +----+ | id | +----+ | 1 | | 2 | | 3 | | 4 | | 5 | | 9 | +----+ Your query (on the same data) gives mysql> SELECT -> `a`.`id` AS `customer_id` -> FROM -> `customers` `a` -> WHERE -> ( -> SELECT -> `b`.`id` -> FROM -> ( -> `customers` `b` -> JOIN `quotes` ON -> ( -> ( -> `b`.`id` = `quotes`.`customer_id` -> ) -> ) -> ) -> WHERE -> ( -> ( -> (`a`.`email` = `b`.`email`) OR(`a`.`phone` = `b`.`phone`) -> ) AND(`a`.`id` <> `b`.`id`) AND(`quotes`.`purchased` = 1) -> ) -> GROUP BY -> `a`.`email` -> ); +-------------+ | customer_id | +-------------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 9 | | 12 | | 15 | +-------------+ Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 16, 2021 Author Share Posted March 16, 2021 (edited) Nice example! Seems accurate enough in table layout to determine what I need figured out. The results you list for my query are the correct ones based on your table data. Customers 12 and 15 should be listed because the email or phone for those customers is the same as another customer that has a purchased of 1. Basically what I'm doing here is as new customers get added, it needs to check for previous customers and see if the email or phone is the same AND if any of the previous customers matching have also purchased. If that condition is true, then it returns the new customers id so that in the Eloquent relationship I simply need to see if a record in the view exists under the same customer_id that the model is for. This way I can easily determine if this customer is a repeat because they actually have purchased something and are not just another record in the db with similar data but still has never purchased. Querying the relationship will allow me to do an easy check like this: // The relationship will either return the database record or NULL // Obviously if it's NULL then the if statment fails and we know they aren't a repeat. if($customer->repeatCustomer){ // Do something } I ran your new query (minus the join for qa since that gives the wrong data back) and it's drastically slower when trying to view the records in phpmyadmin. My query can return a standard list of 25 in .45 seconds and when trying the same thing with yours is 12.9 seconds. Also just pulling a single record by a customer id that I know to be in the view returns in .0009 seconds but yours still takes 12.4 seconds to get the same single record. Here are the EXPLAINs for the 2 queries when gathering 25 rows, my query is the first image. Hopefully this information helps. Edited March 16, 2021 by fastsol Quote Link to comment Share on other sites More sharing options...
Barand Posted March 16, 2021 Share Posted March 16, 2021 TEST 2 10,000 customers. IDs 890 and 8000 have same phone number. 890 has purchased = 1 mysql> select count(*) from customers; +----------+ | count(*) | +----------+ | 10000 | +----------+ Your query mysql> SELECT -> `a`.`id` AS `customer_id` -> FROM -> `customers` `a` -> WHERE -> ( -> SELECT -> `b`.`id` -> FROM -> `customers` `b` -> JOIN `quotes` ON `b`.`id` = `quotes`.`customer_id` -> WHERE -> ( (`a`.`email` = `b`.`email`) OR(`a`.`phone` = `b`.`phone`) ) -> AND(`a`.`id` <> `b`.`id`) -> AND (`quotes`.`purchased` = 1) -> GROUP BY -> `a`.`email` -> ); +-------------+ | customer_id | +-------------+ | 8000 | +-------------+ 1 row in set (0.13 sec) My query mysql> SELECT DISTINCT -> a.id -> FROM customers a -> JOIN -> customers b ON ((a.email = b.email) OR (a.phone = b.phone)) -> AND a.id <> b.id -> JOIN -> quotes qb ON qb.customer_id = b.id AND qb.purchased = 1 -> ORDER BY a.id; +------+ | id | +------+ | 8000 | +------+ 1 row in set (0.02 sec) 6.5x faster !? Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 16, 2021 Author Share Posted March 16, 2021 Clearly we have something different between our table setups for this. I'll try to get you a version of my tables where the personal data is altered for privacy. Then hopefully you can find the cause behind the lag on my end. Quote Link to comment Share on other sites More sharing options...
Barand Posted March 16, 2021 Share Posted March 16, 2021 Your query has "dependent subqueries" (to be avoided) which are ineffecient compared to joins (which mine uses) which is why I did my own 10000 record test. I couldn't believe yours was faster. Adding a dupe email on records 1 and 10,000 (2 rows returned) gave your time as 0.156 sec and mine as 0.015 (10x diff). Quote Link to comment Share on other sites More sharing options...
Barand Posted March 16, 2021 Share Posted March 16, 2021 (edited) The killer with my query with your data is the join condition ((a.email = b.email) OR (a.phone = b.phone)) Using the OR ... mysql> SELECT DISTINCT -> a.id -> FROM customers a -> JOIN ( -> SELECT c.id -> , c.email -> , c.phone -> FROM customers c JOIN quotes q ON q.customer_id = c.id AND q.purchased = 1 -> ) b ON ((a.email = b.email) OR (a.phone = b.phone)) -> AND a.id <> b.id -> ORDER BY a.id; +-------+ | id | +-------+ | 9 | | 13 | | 21 | | 36 | | 38 | | 41 | . . . | 11878 | | 11879 | | 11887 | | 11893 | | 11900 | | 11903 | +-------+ 2987 rows in set (1 min 29.25 sec) !!! Using a UNION to pull those matching on email then those matching on phone ... mysql> SELECT -> a1.id -> FROM customers a1 -> JOIN -> customers b1 ON (a1.email = b1.email) AND a1.id <> b1.id -> JOIN -> quotes q1 ON q1.customer_id = b1.id AND q1.purchased = 1 -> UNION -> SELECT -> a.id -> FROM customers a -> JOIN -> customers b ON (a.phone = b.phone) AND a.id <> b.id -> JOIN -> quotes q ON q.customer_id = b.id AND q.purchased = 1 -> ORDER BY id; +-------+ | id | +-------+ | 9 | | 13 | | 21 | | 36 | | 38 | | 41 | . . . | 11878 | | 11879 | | 11887 | | 11893 | | 11900 | | 11903 | +-------+ 2987 rows in set (1.33 sec) Edited March 16, 2021 by Barand Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 17, 2021 Author Share Posted March 17, 2021 I tried your new query and yes it is much faster than mine when gathering a large number of records at a time, even when paging through the results it really shines. BUT it's still much slower than mine when grabbing a single record. Mine is still .0015 ish and your new one is .15 ish. Since I'll be doing both single and multi record queries this new query might be the way to go. With multi record pulls mine goes up in time exponentially where yours stays consistent at that .12-.15. I'd still like to get the single record pull down in time but it's liveable. I did email you my sql dumps, did you get them? Quote Link to comment Share on other sites More sharing options...
Barand Posted March 17, 2021 Share Posted March 17, 2021 (edited) Have you considered defining email and phone columns as UNIQUE then trapping dupe key errors on insert? That should be far better than searching for potential dupes before inserting. (You would have to cleanse your data of current duplicates first, however) 6 hours ago, fastsol said: I did email you my sql dumps, did you get them? Yes thanks. The ~3000 duplicates in my last post are from your data. Perhaps, for a single search, you could mysql> SELECT id -> FROM customers -> WHERE email = 'johnrademacher44@x.com' -> UNION -> SELECT id -> FROM customers -> WHERE phone = '612-772-4311'; +-------+ | id | +-------+ | 11902 | +-------+ 1 row in set (0.00 sec) Edited March 17, 2021 by Barand Quote Link to comment Share on other sites More sharing options...
fastsol Posted March 19, 2021 Author Share Posted March 19, 2021 @Barand Thank you so much for the help. I've decided to use this version of your query with slight modifications that haven't impacted the performance from what I have tested. SELECT `a1`.`id` AS `id` FROM ( ( `customers` `a1` JOIN `customers` `b1` ON ( ( (`a1`.`email` = `b1`.`email`) AND(`a1`.`id` <> `b1`.`id`) AND( `a1`.`email` <> 'blah@example.com' ) AND( `b1`.`email` <> 'blah@example.com' ) ) ) ) JOIN `quotes` `q1` ON ( ( (`q1`.`customer_id` = `b1`.`id`) AND(`q1`.`purchased` = 1) ) ) ) UNION SELECT `a`.`id` AS `id` FROM ( ( `customers` `a` JOIN `customers` `b` ON ( ( (`a`.`phone` = `b`.`phone`) AND(`a`.`id` <> `b`.`id`) ) ) ) JOIN `quotes` `q` ON ( ( (`q`.`customer_id` = `b`.`id`) AND(`q`.`purchased` = 1) ) ) ) ORDER BY `id` The performance of that query isn't super for a single record pull but it's not terrible. It performs well enough for multi record pulls though. It doesn't seem like it can get any better, so I'm fine with it, have to be or the rest of the stuff won't work lol. One other thing that I wonder could be added to the query is 2 columns like `by_email` and `by_phone` where the values would be 1 or 0 depending on if the result came from finding it in the email or phone query OR if it found it in both queries then both those new columns would be 1. At this point I have no idea if this can even be done without performance issues. If you can help again that would be awesome. Quote Link to comment Share on other sites More sharing options...
Barand Posted March 19, 2021 Share Posted March 19, 2021 (edited) Try this. It doesn't appear to impact performance. Unmodified : 2987 rows in set (1.53 sec) Modified : 2987 rows in set (1.48 sec) I had to use GROUP BY to get a single row with 1234 1 1 otherwise, if both matched, it gave 1234 1 0 1234 0 1 Query SELECT id , SUM(match_email) as by_email , SUM(match_phone) as by_phone FROM ( SELECT `a1`.`id`, 1 as match_email, 0 as match_phone FROM ( ( `customers` `a1` JOIN `customers` `b1` ON ( ( (`a1`.`email` = `b1`.`email`) AND(`a1`.`id` <> `b1`.`id`) AND( `a1`.`email` <> 'blah@example.com' ) AND( `b1`.`email` <> 'blah@example.com' ) ) ) ) JOIN `quotes` `q1` ON ( ( (`q1`.`customer_id` = `b1`.`id`) AND(`q1`.`purchased` = 1) ) ) ) UNION SELECT `a`.`id`, 0 as match_email, 1 as match_phone FROM ( ( `customers` `a` JOIN `customers` `b` ON ( ( (`a`.`phone` = `b`.`phone`) AND(`a`.`id` <> `b`.`id`) ) ) ) JOIN `quotes` `q` ON ( ( (`q`.`customer_id` = `b`.`id`) AND(`q`.`purchased` = 1) ) ) ) ) all_matches GROUP BY `id` Edited March 19, 2021 by Barand 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.