Jump to content

Recommended Posts

So I have a Customers table with normal fields and a standard ID primary key field.  I am trying to find a way through a relationship method to get a collection of repeat customers.  Sounds simple and can easily be done with query like so

public function repeatCustomer()
{
	return $this->where('id', '<>', $this->id)
				->where(function($query){
					$query->where('emai', $this->email)
					->orWhere('phone', $this->phone);
				})->get();
}

But that can't be loaded or eager loaded as a relationship.  I don't have nor do I really want to have a parent_id type field in the table as there are already  15k records that aren't directly related now.  As you can see in the query above I'm simply needing to find related customers by same email or phone.

I have tried the hasMany relationship but no matter what relationship method I've tried I can't get the query to ignore the fact that it wants to find the Customer record by the primary key of the current $this model.  Like this

select * from `customers` where `customers`.`id` = 1887 and `customers`.`id` is not null and (`email` = blah.com or `phone` = 123-456-7890

It needs to not use this part of the query to work and still return a relationship.

`customers`.`id` = 1887

Can this even be done with relationships?  If not then I don't think I can do this at all since it can't be eager loaded and will cause a N+1 problem on several pages of the site.

Link to comment
https://forums.phpfreaks.com/topic/312293-eloquent-relationship-question/
Share on other sites

By "repeat customers" do you mean

  • customers who have placed more then one order, or
  • do you mean "duplicate" customers with more than 1 customer record?

If it's the latter

SELECT eamil
     , phone
     , GROUP_CONCAT(id SEPARATOR ', ') as duplicates
FROM customers
GROUP BY email, phone
HAVING COUNT(*) > 0;

 

This relationship query gets me the correct results but it doesn't when using with() or load() on the model for eager loading.

return $this->hasMany($this, 'id')->where('customers.id', '<>', $this->id)
                    ->orWhere(function($query){
                        $query->where('customers.id', '<>', $this->id)
                              ->where(function($query){
                            $query->where('email', $this->email)
                                  ->orWhere('phone', $this->phone);
                    });
        });

So if I try the code below it only returns the original Customer in the repeatCustomer relation instead of all of them.

$customer = Customer::with('repeatCustomer')->find(1887);

But if I call the relationship AFTER and not via the with() or load() it returns the correct results.

$customer = Customer::find(1887);

$customer->repeatCustomer;

 

2 hours ago, Barand said:

OK, call me old fashioned, but why don't you just write a query instead of struggling to find the right incantations to get the db classes to do it for you?

Because on several pages if I don't do it with eager loading it will cause an N+1 query problem. Plus I don't need or want to load this info on every page as there will also be many times I don't need the extra info.  Maybe I'm splitting hairs on not needing all the info all the time.  I'll consider it if there is no other way to do it.

I'm getting much much closer.  I ended up using a View table to build the query and then a hasOne eloquent relationship which is working as expected at the moment. 

@Barand

 I used your query above in the view with some additions.  The only thing that isn't working with the query is lets say a customer is listed more than once in the db with the same email, then they are listed in your query results BUT if a customer is listed more than once with different emails but same phone numbers then they are not listed in your query results.  I need the customer to be listed in either of those cases.

Here is what I currently have as the View query.

select `customers`.`email` AS `email`
,`customers`.`phone` AS `phone`
,group_concat(`customers`.`id` separator ',') AS `duplicates` 
from (`customers` 
      join `quotes` on((`quotes`.`customer_id` = `customers`.`id`))) 
      where (`quotes`.`purchased` = 1) 
      group by `customers`.`email`,`customers`.`phone` 
      having (count(0) > 1)

One other thing I noticed is that querying the view takes much longer than any other query being performed on the page, upwards of 72ms which isn't terrible but considering that all the other queries are no more than 1.4ms it's a huge difference.  I'm going to guess that it's because the data needs to be populated in the view first before it can be returned since it's not a static table of data.  Does that sound right?  Or is the view query itself that is causing the lag?

Edited by fastsol
5 minutes ago, Barand said:

perhaps this, then


SELECT 
       a.id
     , a.email
     , a.phone
FROM customers a 
     JOIN
     customers b ON ((a.email = b.email) OR (a.phone = b.phone))
                 AND a.id <> b.id
ORDER BY a.id;

 

If that's supposed to be the entire query for the view, then it doesn't work.  It gives me a ton of results that are all identical with the same id and email and no phone number.

I see emails and phone numbers

customers

+----+------+------------+-------+
| id | name | email      | phone |
+----+------+------------+-------+
|  1 | aaa  | aaa@x.com  | 123   |
|  2 | bbbb | bbbb@x.com | 124   |
|  3 | ccc  | bbbb@x.com | 124   |
|  4 | ddd  | ddd@x.com  | 123   |
|  5 | eee  | eee@x.com  | 125   |
|  6 | fff  | bbb@x.com  | 126   |
|  7 | ggg  | ggg@x.com  | 127   |
|  8 | hhh  | hhh@x.com  | 128   |
|  9 | iii  | iii@x.com  | 125   |
| 10 | jjj  | jjj@x.com  | 129   |
| 11 | kkk  | kkk@x.com  | 130   |
| 12 | lll  | lll@x.com  | 127   |
| 13 | mmm  | mmm@x.com  | 131   |
| 14 | nnn  | nnn@x.com  | 132   |
| 15 | ooo  | aaa@x.com  | 123   |
| 16 | ppp  | ppp@x.com  | 133   |
+----+------+------------+-------+

query

SELECT DISTINCT
       a.id
     , a.email
     , a.phone
FROM customers a 
     JOIN
     customers b ON ((a.email = b.email) OR (a.phone = b.phone))
                 AND a.id <> b.id
ORDER BY a.id;

results

+----+------------+-------+
| id | email      | phone |
+----+------------+-------+
|  1 | aaa@x.com  | 123   |
|  2 | bbbb@x.com | 124   |
|  3 | bbbb@x.com | 124   |
|  4 | ddd@x.com  | 123   |
|  5 | eee@x.com  | 125   |
|  7 | ggg@x.com  | 127   |
|  9 | iii@x.com  | 125   |
| 12 | lll@x.com  | 127   |
| 15 | aaa@x.com  | 123   |
+----+------------+-------+

 

So here is the query that I finally got to work in a way that I can handle.  The only thing I see as a potential problem is that the query is extremely slow, at least for the initial saving of the view.  It seems to perform well enough when querying for a single record from the view.

SELECT
    `a`.`id` AS `customer_id`
FROM
    `customers` `a`
WHERE
    (
    SELECT
        `b`.`id`
    FROM
        (
            `customers` `b`
        JOIN `quotes` ON
            (
                (
                    `b`.`id` = `quotes`.`customer_id`
                )
            )
        )
    WHERE
        (
            (
                (`a`.`email` = `b`.`email`) OR(`a`.`phone` = `b`.`phone`)
            ) AND(`a`.`id` <> `b`.`id`) AND(`quotes`.`purchased` = 1)
        )
    GROUP BY
        `a`.`email`
)

If I run this query in the phpmyadmin outside the view just as a regular query it takes about 75 seconds to load as it scans the full 10k+ records.  I have indexes on every column that is used in that query within its respective table.  I know views can be labor intensive so maybe there isn't anything that can be done.

Also I filled in all the records that were missing phone numbers so that it would stop tripping on that.  Those records are pretty old and aren't really a concern with their accuracy anymore.

Edited by fastsol

I have added quotes to my data and queries. This is my test data..

image.png.29837a72eb500406e02a739168324c94.png

 

My query

mysql> SELECT DISTINCT
    ->        a.id
    -> FROM customers a
    ->      JOIN
    ->      customers b ON ((a.email = b.email) OR (a.phone = b.phone))
    ->                  AND a.id <> b.id
    ->       JOIN
    ->       quotes qa ON qa.customer_id = a.id AND qa.purchased = 1
    ->       JOIN
    ->       quotes qb ON qb.customer_id = b.id AND qb.purchased = 1
    -> ORDER BY a.id;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
|  9 |
+----+

Your query (on the same data) gives

mysql> SELECT
    ->     `a`.`id` AS `customer_id`
    -> FROM
    ->     `customers` `a`
    -> WHERE
    ->     (
    ->     SELECT
    ->         `b`.`id`
    ->     FROM
    ->         (
    ->             `customers` `b`
    ->         JOIN `quotes` ON
    ->             (
    ->                 (
    ->                     `b`.`id` = `quotes`.`customer_id`
    ->                 )
    ->             )
    ->         )
    ->     WHERE
    ->         (
    ->             (
    ->                 (`a`.`email` = `b`.`email`) OR(`a`.`phone` = `b`.`phone`)
    ->             ) AND(`a`.`id` <> `b`.`id`) AND(`quotes`.`purchased` = 1)
    ->         )
    ->     GROUP BY
    ->         `a`.`email`
    -> );
+-------------+
| customer_id |
+-------------+
|           1 |
|           2 |
|           3 |
|           4 |
|           5 |
|           9 |
|          12 |
|          15 |
+-------------+

 

Nice example!  Seems accurate enough in table layout to determine what I need figured out.

The results you list for my query are the correct ones based on your table data.  Customers 12 and 15 should be listed because the email or phone for those customers is the same as another customer that has a purchased of 1. 

Basically what I'm doing here is as new customers get added, it needs to check for previous customers and see if the email or phone is the same AND if any of the previous customers matching have also purchased.  If that condition is true, then it returns the new customers id so that in the Eloquent relationship I simply need to see if a record in the view exists under the same customer_id that the model is for.  This way I can easily determine if this customer is a repeat because they actually have purchased something and are not just another record in the db with similar data but still has never purchased.  

Querying the relationship will allow me to do an easy check like this:

// The relationship will either return the database record or NULL
// Obviously if it's NULL then the if statment fails and we know they aren't a repeat.
if($customer->repeatCustomer){
	// Do something
}

I ran your new query (minus the join for qa since that gives the wrong data back) and it's drastically slower when trying to view the records in phpmyadmin.  My query can return a standard list of 25 in .45 seconds and when trying the same thing with yours is 12.9 seconds. Also just pulling a single record by a customer id that I know to be in the view returns in .0009 seconds but yours still takes 12.4 seconds to get the same single record.  Here are the EXPLAINs for the 2 queries when gathering 25 rows, my query is the first image.  Hopefully this information helps.

explain_mine.jpg

explain_yours.jpg

Edited by fastsol

TEST 2

10,000 customers. IDs 890 and 8000 have same phone number. 890 has purchased = 1

mysql> select count(*) from customers;
+----------+
| count(*) |
+----------+
|    10000 |
+----------+

Your query

mysql> SELECT
    ->     `a`.`id` AS `customer_id`
    -> FROM
    ->     `customers` `a`
    -> WHERE
    ->     (
    ->     SELECT
    ->         `b`.`id`
    ->     FROM
    ->         `customers` `b`
    ->         JOIN `quotes` ON `b`.`id` = `quotes`.`customer_id`
    ->     WHERE
    ->             (  (`a`.`email` = `b`.`email`) OR(`a`.`phone` = `b`.`phone`) )
    ->             AND(`a`.`id` <> `b`.`id`)
    ->             AND (`quotes`.`purchased` = 1)
    ->     GROUP BY
    ->         `a`.`email`
    ->     );
+-------------+
| customer_id |
+-------------+
|        8000 |
+-------------+
1 row in set (0.13 sec)

My query

mysql> SELECT DISTINCT
    ->        a.id
    -> FROM customers a
    ->      JOIN
    ->      customers b ON ((a.email = b.email) OR (a.phone = b.phone))
    ->                  AND a.id <> b.id
    ->      JOIN
    ->      quotes qb ON qb.customer_id = b.id AND qb.purchased = 1
    -> ORDER BY a.id;
+------+
| id   |
+------+
| 8000 |
+------+
1 row in set (0.02 sec)

6.5x faster !?

Your query has "dependent subqueries" (to be avoided) which are ineffecient compared to joins (which mine uses) which is why I did my own 10000 record test. I couldn't believe yours was faster. Adding a dupe email on records 1 and 10,000 (2 rows returned) gave your time as 0.156 sec and mine as 0.015 (10x diff).

The killer with my query with your data is the join condition

((a.email = b.email) OR (a.phone = b.phone))

Using the OR ...

mysql> SELECT DISTINCT
    ->        a.id
    -> FROM customers a
    ->      JOIN (
    ->             SELECT c.id
    ->                  , c.email
    ->                  , c.phone
    ->             FROM customers c JOIN quotes q ON q.customer_id = c.id AND q.purchased = 1
    ->           ) b ON ((a.email = b.email) OR (a.phone = b.phone))
    ->                 AND a.id <> b.id
    -> ORDER BY a.id;
+-------+
| id    |
+-------+
|     9 |
|    13 |
|    21 |
|    36 |
|    38 |
|    41 |
    .
    .
    .
| 11878 |
| 11879 |
| 11887 |
| 11893 |
| 11900 |
| 11903 |
+-------+
2987 rows in set (1 min 29.25 sec)                !!!

 

Using a UNION to pull those matching on email then those matching on phone ...

mysql> SELECT
    ->        a1.id
    -> FROM customers a1
    ->      JOIN
    ->      customers b1 ON (a1.email = b1.email) AND a1.id <> b1.id
    ->      JOIN
    ->      quotes q1 ON q1.customer_id = b1.id AND q1.purchased = 1
    -> UNION
    -> SELECT
    ->        a.id
    -> FROM customers a
    ->      JOIN
    ->      customers b ON (a.phone = b.phone) AND a.id <> b.id
    ->      JOIN
    ->      quotes q ON q.customer_id = b.id AND q.purchased = 1
    -> ORDER BY id;
+-------+
| id    |
+-------+
|     9 |
|    13 |
|    21 |
|    36 |
|    38 |
|    41 |
    .
    .
    .
| 11878 |
| 11879 |
| 11887 |
| 11893 |
| 11900 |
| 11903 |
+-------+
2987 rows in set (1.33 sec)

 

Edited by Barand

I tried your new query and yes it is much faster than mine when gathering a large number of records at a time, even when paging through the results it really shines.  BUT it's still much slower than mine when grabbing a single record.  Mine is still .0015 ish and your new one is .15 ish.  Since I'll be doing both single and multi record queries this new query might be the way to go. 

With multi record pulls mine goes up in time exponentially where yours stays consistent at that .12-.15.  I'd still like to get the single record pull down in time but it's liveable.

I did email you my sql dumps, did you get them?

Have you considered defining email and phone columns as UNIQUE then trapping dupe key errors on insert? That should be far better than searching for potential dupes before inserting.

(You would have to cleanse your data of current duplicates first, however)

 

6 hours ago, fastsol said:

I did email you my sql dumps, did you get them?

Yes thanks. The ~3000 duplicates in my last post are from your data.

 

Perhaps, for a single search, you could

mysql> SELECT id
    -> FROM customers
    -> WHERE email = 'johnrademacher44@x.com'
    -> UNION
    -> SELECT id
    -> FROM customers
    -> WHERE phone = '612-772-4311';
+-------+
| id    |
+-------+
| 11902 |
+-------+
1 row in set (0.00 sec)

 

Edited by Barand

@Barand Thank you so much for the help.  I've decided to use this version of your query with slight modifications that haven't impacted the performance from what I have tested.

SELECT
    `a1`.`id` AS `id`
FROM
    (
        (
            `customers` `a1`
        JOIN `customers` `b1`
        ON
            (
                (
                    (`a1`.`email` = `b1`.`email`) AND(`a1`.`id` <> `b1`.`id`) 
                  AND(
                        `a1`.`email` <> 'blah@example.com'
                    ) AND(
                        `b1`.`email` <> 'blah@example.com'
                    )
                )
            )
        )
    JOIN `quotes` `q1`
    ON
        (
            (
                (`q1`.`customer_id` = `b1`.`id`) AND(`q1`.`purchased` = 1)
            )
        )
    )
UNION
SELECT
    `a`.`id` AS `id`
FROM
    (
        (
            `customers` `a`
        JOIN `customers` `b`
        ON
            (
                (
                    (`a`.`phone` = `b`.`phone`) AND(`a`.`id` <> `b`.`id`)
                )
            )
        )
    JOIN `quotes` `q`
    ON
        (
            (
                (`q`.`customer_id` = `b`.`id`) AND(`q`.`purchased` = 1)
            )
        )
    )
ORDER BY
    `id`

The performance of that query isn't super for a single record pull but it's not terrible.  It performs well enough for multi record pulls though.  It doesn't seem like it can get any better, so I'm fine with it, have to be or the rest of the stuff won't work lol.

One other thing that I wonder could be added to the query is 2 columns like `by_email` and `by_phone` where the values would be 1 or 0 depending on if the result came from finding it in the email or phone query OR if it found it in both queries then both those new columns would be 1.  At this point I have no idea if this can even be done without performance issues. 

If you can help again that would be awesome.

Try this. It doesn't appear to impact performance.

Unmodified : 2987 rows in set (1.53 sec)
Modified   : 2987 rows in set (1.48 sec)

I had to use GROUP BY to get a single row with 

1234  1  1

otherwise, if both matched, it gave

1234  1  0
1234  0  1

Query

SELECT id
     , SUM(match_email) as by_email
     , SUM(match_phone) as by_phone
FROM (
			SELECT
				`a1`.`id`,
				1 as match_email,
				0 as match_phone
			FROM
				(
					(
						`customers` `a1`
					JOIN `customers` `b1`
					ON
						(
							(
								(`a1`.`email` = `b1`.`email`) AND(`a1`.`id` <> `b1`.`id`) 
							  AND(
									`a1`.`email` <> 'blah@example.com'
								) AND(
									`b1`.`email` <> 'blah@example.com'
								)
							)
						)
					)
				JOIN `quotes` `q1`
				ON
					(
						(
							(`q1`.`customer_id` = `b1`.`id`) AND(`q1`.`purchased` = 1)
						)
					)
				)
			UNION
			SELECT
				`a`.`id`,
				0 as match_email,
				1 as match_phone
			FROM
				(
					(
						`customers` `a`
					JOIN `customers` `b`
					ON
						(
							(
								(`a`.`phone` = `b`.`phone`) AND(`a`.`id` <> `b`.`id`)
							)
						)
					)
				JOIN `quotes` `q`
				ON
					(
						(
							(`q`.`customer_id` = `b`.`id`) AND(`q`.`purchased` = 1)
						)
					)
				)
	 ) all_matches
GROUP BY
    `id`

 

Edited by Barand
  • Great Answer 1
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.