Jump to content

Multiple LEFT JOINS not returning expected results


jalinidy

Recommended Posts

Hello,

 

I am trying to join 4 tables using the query below (tables also described below).  Table "ptoadp" contains employees current time off balances; I want the vac, per and hol columns from that table always.  I then join "ptoreq" which contains any requests for time off and sum the totals for each request for vac, per and hol (returning zero if no balances for requests are in the table).  Next I want to join the "holidays" table, which will tell me what holidays the employee has yet to take (in the example below they are MLK Day, Test A and Test B).  Finally, if any of the holidays have been used for a request, I want to know the id for that request.

 



SELECT
a.vac,
a.per,
a.hol,
IF(sum(r.vac) IS NULL, 0.00, sum(r.vac)) as vac_req,
IF(sum(r.per) IS NULL, 0.00, sum(r.per)) as per_req,
IF(sum(r.hol) IS NULL, 0.00, sum(r.hol)) as hol_req,
h.day,
hm.ptoreq_id,
hm.id as hol_map_id

FROM ptoadp as a

LEFT JOIN holidays h ON h.hol_bal <= a.hol AND h.hol_bal >= a.hol_cur AND h.expdate >= curdate()
LEFT JOIN ptoreq r ON a.ee_id = r.ee_id AND r.declined_date IS NULL AND r.enddate >= curdate()
LEFT JOIN ptoreq_holiday_map hm ON h.id = hm.holiday_id AND r.ptoreq_id = hm.ptoreq_id

WHERE a.ee_id = 1

GROUP BY h.day
ORDER BY h.date;



Here's what each table looks like:

 



PTOADP

+-------+-------+-------+-------+------+-------+---------+------------+---------+
| ee_id | years | vac | sick | per | hol | hol_cur | asofdate | ee_type |
+-------+-------+-------+-------+------+-------+---------+------------+---------+
| 1 | 9 | 23.00 | 19.00 | 8.00 | 72.00 | 56.00 | 2014-01-19 | 2 |
+-------+-------+-------+-------+------+-------+---------+------------+---------+


PTOREQ

+-----------+-------+------------+------------+------+------+------+---------------------+
| ptoreq_id | ee_id | startdate | enddate | vac | per | hol | reqdate |
+-----------+-------+------------+------------+------+------+------+---------------------+
| 1 | 1 | 2014-01-31 | 2014-01-31 | 0.00 | 0.00 | 8.00 | 2014-01-29 16:19:55 |
| 2 | 1 | 2014-02-03 | 2014-02-03 | 0.00 | 0.00 | 8.00 | 2014-01-29 16:20:29 |
+-----------+-------+------------+------------+------+------+------+---------------------+

HOLIDAYS

+----+---------+------------+------------------+------------+-----------+
| id | hol_bal | date | day | expdate | year_paid |
+----+---------+------------+------------------+------------+-----------+
| 45 | 56.00 | 2014-01-20 | MLK Day | 2014-03-17 | 2014 |
| 57 | 72.00 | 2014-01-10 | Test A | 2014-03-07 | 2014 |
| 58 | 64.00 | 2014-01-11 | Test B | 2014-03-08 | 2014 |
+----+---------+------------+------------------+------------+-----------+

PTOREQ_HOLIDAY_MAP

+----+-----------+------------+
| id | ptoreq_id | holiday_id |
+----+-----------+------------+
| 1 | 1 | 45 |
| 2 | 2 | 57 |
+----+-----------+------------+


 

And finally, here are the actual results of the query and what I'm expecting (i.e. *would like*) to see:

 

 



ACTUAL RESULT

+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| vac | per | hol | vac_req | per_req | hol_req | day | ptoreq_id | hol_map_id |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| 23.00 | 8.00 | 72.00 | 0.00 | 0.00 | 16.00 | Test A | NULL | NULL |
| 23.00 | 8.00 | 72.00 | 0.00 | 0.00 | 16.00 | Test B | NULL | NULL |
| 23.00 | 8.00 | 72.00 | 0.00 | 0.00 | 16.00 | MLK Day | 1 | 1 |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+


RESULT THAT I WANT

+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| vac | per | hol | vac_req | per_req | hol_req | day | ptoreq_id | hol_map_id |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| 23.00 | 8.00 | 72.00 | 0.00 | 0.00 | 16.00 | Test A | 2 | 2 |
| 23.00 | 8.00 | 72.00 | 0.00 | 0.00 | 16.00 | Test B | NULL | NULL |
| 23.00 | 8.00 | 72.00 | 0.00 | 0.00 | 16.00 | MLK Day | 1 | 1 |
+-------+------+-------+------


Holiday Test A should be linked to ptoreq_id 2 and hol_map_id 2 based on the values in "ptoreq_holiday_map" but instead the value is NULL.

 

I'm still rough around the edges with JOINS and have tried a few different scenarios (using RIGHT joins for example) but to no avail.  Rather than just chucking any more ideas at the problem, I thought I'd turn here for any advice.

 

I'm using MYSQL v. 5.1.49 and PHP 5.3.15.

 

Please let me know if I've been unclear or left out any pertinent information.

 

Thank you in advance.

 

Best,

Adam

 

 

 

Link to comment
Share on other sites

Calculate the SUM()s in a subquery

SELECT 
a.vac, 
a.per, 
a.hol, 
IFNULL(tots.vac, 0.00) as vac_req, 
IFNULL(tots.per, 0.00) as per_req, 
IFNULL(tots.hol, 0.00) as hol_req, 
h.day, 
hm.ptoreq_id, 
hm.id as hol_map_id 

FROM ptoadp as a
JOIN (
    SELECT ee_id, SUM(vac) as vac, SUM(per) as per, SUM(hol) as hol
    FROM ptoreq
    GROUP BY ee_id
    ) tots USING (ee_id)
JOIN holidays h ON h.hol_bal BETWEEN a.hol_cur AND a.hol AND h.expdate >= curdate() 
LEFT JOIN ptoreq_holiday_map hm ON h.id = hm.holiday_id
LEFT JOIN ptoreq r ON a.ee_id = r.ee_id 
AND r.ptoreq_id = hm.ptoreq_id  
AND r.declined_date IS NULL AND r.enddate >= curdate() 

WHERE a.ee_id = 1 

ORDER BY h.date;

Gives

+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| vac   | per  | hol   | vac_req | per_req | hol_req | day     | ptoreq_id | hol_map_id |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | Test A  |         2 |          2 |
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | Test B  |      NULL |       NULL |
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | MLK Day |         1 |          1 |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+
Link to comment
Share on other sites

Hi Barand,

 

Thanks very much for your response (including the added tips of using things like IFNULL and BETWEEN in the query - very nice).

 

Yes, your solution returns the expected result, but only if the employee has time off requested (i.e. values in the ptoreq table).  If the employee has not requested time off the query returns an empty set, so it's not providing me with the balances from ptoadp for that employee, which I want to see regardless of whether the employee has requested time off or not.  (Additionally, I want to see any used holidays regardless of whether the employee has any time off requests)

 

I've played around a bit with your query this morning (trying to move some of the other joins into the subquery, e.g.) but to no avail.

 

Here's an example of an added row in table ptoadp


+-------+-------+-------+-------+-------+-------+---------+------------+---------+
| ee_id | years | vac   | sick  | per   | hol   | hol_cur | asofdate   | ee_type |
+-------+-------+-------+-------+-------+-------+---------+------------+---------+
|     1 |     9 | 23.00 | 19.00 |  8.00 | 72.00 |   56.00 | 2014-01-19 | 2       |
|    81 |    10 | 46.00 | 38.00 | 16.00 | 72.00 |   56.00 | 2014-01-19 | 6       |
+-------+-------+-------+-------+-------+-------+---------+------------+---------+

 

So changing the WHERE clause in the query to 

WHERE a.ee_id = 81

returns an empty set.

 

Also, if a different employee adds a time off request for an unused holiday, the query is including that holiday in the results.

 

For example, ee._id 82 adds a couple of time off requests

PTOREQ

+-----------+-------+------------+------------+------+------+------+---------------------+
| ptoreq_id | ee_id | startdate  | enddate    | vac  | per  | hol  | reqdate             |
+-----------+-------+------------+------------+------+------+------+---------------------+
|         1 |     1 | 2014-01-31 | 2014-01-31 | 0.00 | 0.00 | 8.00 | 2014-01-29 16:19:55 |
|         2 |     1 | 2014-02-03 | 2014-02-03 | 0.00 | 0.00 | 8.00 | 2014-01-29 16:20:29 |
|         3 |    82 | 2014-03-01 | 2014-03-02 | 0.00 | 8.00 | 0.00 | 2014-01-29 16:20:29 |
|         4 |    82 | 2014-02-07 | 2014-02-07 | 0.00 | 0.00 | 8.00 | 2014-01-30 10:44:51 |
+-----------+-------+------------+------------+------+------+------+---------------------+ 

Request 4 being for a holiday, which also gets added to the ptoreq_holiday_map:

PTOREQ_HOLIDAY_MAP


+----+-----------+------------+
| id | ptoreq_id | holiday_id |
+----+-----------+------------+
|  1 |         1 |         45 |
|  2 |         2 |         57 |
|  3 |         4 |         45 |
+----+-----------+------------+

the query includes the holidays for all employees, not just the one selected in the WHERE clause


QUERY RESULT

+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| vac   | per  | hol   | vac_req | per_req | hol_req | day     | ptoreq_id | hol_map_id |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | Test A  |         2 |          2 |
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | Test B  |      NULL |       NULL |
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | MLK Day |         1 |          1 |
| 23.00 | 8.00 | 72.00 |    0.00 |    0.00 |   16.00 | MLK Day |         4 |          3 |
+-------+------+-------+---------+---------+---------+---------+-----------+------------+



For ee_id 1, e.g., the last result above should not be included in the result set.

 

If you have  (or anyone else has) any further advice, it will be much appreciated.

 

I hope my intentions are clear (clearer) - and I apologize for any ambiguity.  I could always break this into two queries (one that returns balances from ptoadp and ptoreq (if any) and another that returns any unused holidays and the request they are associated with (if any) but thought if I could get it all in one go, all the better.

 

Thanks again,

Adam

Link to comment
Share on other sites

Hi Folks,

 

I've got things working now.  Basically I started over, building one query to get unused holidays (if any) and time off requests associated with the holiday(s) (if any) and another query to get the available time off balances and balances for time off requests (if any).  Then I joined those two queries to get this:

SELECT 
    distinct a.ee_id, 
    bal.vac, 
    bal.per, 
    bal.hol, 
    bal.vac_req, 
    bal.per_req, 
    bal.hol_req, 
    h.day, 
    map.hol_map_id, 
    map.ptoreq_id 

FROM ptoadp a 

LEFT JOIN globe_holidays h ON h.hol_bal BETWEEN a.hol_cur AND a.hol AND h.expdate >= CURDATE() 

LEFT JOIN (
    SELECT
        r.ee_id, 
	hm.id as hol_map_id, 
	hm.ptoreq_id, 
	hm.holiday_id as hol_id

	FROM ptoreq r 
	
	LEFT JOIN ptoreq_holiday_map hm USING (ptoreq_id)
) map ON h.id = map.hol_id AND a.ee_id = map.ee_id 

JOIN (
    SELECT 
        a.ee_id, 
	a.vac, 
	a.per, 
	a.hol, 
	DATE_FORMAT(a.asofdate,'%m/%d/%Y') as asofdate, 
	a.years, 
	a.ee_type, 
	IFNULL(tots.vac, 0.00) as vac_req, 
	IFNULL(tots.per, 0.00) as per_req, 
	IFNULL(tots.hol, 0.00) as hol_req 
	
	FROM ptoadp as a 
	
	LEFT JOIN (
	    SELECT 
                ee_id, 
                SUM(vac) as vac, 
                SUM(per) as per, 
                SUM(hol) as hol 
		
            FROM ptoreq 
            GROUP BY ee_id
        ) tots ON a.ee_id = tots.ee_id 
	
        LEFT JOIN ptoreq r ON a.ee_id = r.ee_id AND r.declined_date IS NULL AND r.enddate >= CURDATE() 
	

) bal on a.ee_id = bal.ee_id 

WHERE a.ee_id = 82;

(There are a few more columns in the subqueries than earlier, as I needed to add things once I got it working.)

 

This query returns the results I was expecting (data (i.e., the requests)) have been changed since the earlier post)


+-------+--------+-------+-------+---------+---------+---------+---------+------------+-----------+
| ee_id | vac    | per   | hol   | vac_req | per_req | hol_req | day     | hol_map_id | ptoreq_id |
+-------+--------+-------+-------+---------+---------+---------+---------+------------+-----------+
|    82 | 156.08 | 16.00 | 72.00 |    0.00 |    0.00 |   24.00 | MLK Day |          3 |         4 |
|    82 | 156.08 | 16.00 | 72.00 |    0.00 |    0.00 |   24.00 | Test A  |          1 |         2 |
|    82 | 156.08 | 16.00 | 72.00 |    0.00 |    0.00 |   24.00 | Test B  |          4 |         5 |
+-------+--------+-------+-------+---------+---------+---------+---------+------------+-----------+

Thanks again for everyone's help.  And, if you notice any glaring dangers in the query I ended up with, please feel free to speak up.

 

Best,

A.

Link to comment
Share on other sites

Barand, thanks again for your reply.

 

The column hol_bal isn't derived data actually, but I can see (now) where it is a badly named column.

 

I'll try to explain my concept (which may have faults other than the derived data):

 

The company offers 13 paid holidays in 2014.  Due to an employee's schedule s/he may have to work on a given holiday and is then allowed to take that holiday off at another time within 8 weeks of the original holiday date.  

 

In the holidays table the hol_bal column is the number of available holiday hours an employee should have (at minimum) as his/her holiday balance (column "hol") in the ptoadp table (i.e., the available time off balances table) at the time of the holiday.

 

For example, here's the data from table ptoadp for ee_id 1

PTOADP - Available balances

+-------+-------+-------+------+-------+-------+---------+------------+---------+
| ee_id | years | vac   | sick | per   | hol   | hol_cur | asofdate   | ee_type |
+-------+-------+-------+------+-------+-------+---------+------------+---------+
|     1 |     9 | 56.32 | 3.19 | 16.00 | 72.00 |   56.00 | 2014-01-19 | 2       |
+-------+-------+-------+------+-------+-------+---------+------------+---------+

And the holidays table:

HOLIDAYS

+----+---------+------------+------------------+------------+-----------+
| id | hol_bal | date       | day              | expdate    | year_paid |
+----+---------+------------+------------------+------------+-----------+
| 53 |  104.00 | 2013-12-24 | Christmas Eve    | 2014-02-18 | 2014      |
| 52 |   96.00 | 2013-12-25 | Christmas Day    | 2014-02-19 | 2014      |
| 43 |   88.00 | 2013-12-31 | New Years Eve    | 2014-02-25 | 2014      |
| 44 |   80.00 | 2014-01-01 | New Years Day    | 2014-02-26 | 2014      |
| 57 |   72.00 | 2014-01-10 | Test A           | 2014-03-07 | 2014      |
| 58 |   64.00 | 2014-01-11 | Test B           | 2014-03-08 | 2014      |
| 45 |   56.00 | 2014-01-20 | MLK Day          | 2014-03-17 | 2014      |
| 46 |   48.00 | 2014-02-17 | Presidents Day   | 2014-04-14 | 2014      |
| 47 |   40.00 | 2014-05-26 | Memorial Day     | 2014-07-21 | 2014      |
| 48 |   32.00 | 2014-07-04 | Independence Day | 2014-08-29 | 2014      |
| 49 |   24.00 | 2014-09-01 | Labor Day        | 2014-10-27 | 2014      |
| 50 |   16.00 | 2014-11-11 | Veterans Day     | 2015-01-06 | 2014      |
| 54 |    8.00 | 2014-11-27 | Thanksgiving Day | 2015-01-22 | 2014      |
+----+---------+------------+------------------+------------+-----------+

Because the employee has a holiday balance in ptoadp of 72, I know that s/he has not used holidays Test A and Test B, so I want to include those.  I also include MLK Day because it is effective in the current pay period and the employee may need to schedule it as of day off in the current pay period on a day other than the actual holiday (taking the Friday off at the end of the week, e.g., instead of the Monday).

 

I can see now where the "hol_cur" column in ptoadp is certainly redundant; I can just as easily use the "asofdate" column (which is the date of the last pay period), so changing this:

LEFT JOIN holidays h ON h.hol_bal BETWEEN a.hol_cur AND a.hol AND h.expdate >= CURDATE() 

to this  (altering the asofdate from the end date of the last pay period to the end of the current (bi-weekly) pay period.):

LEFT JOIN holidays h ON h.hol_bal <= a.hol AND h.date <= DATE_ADD(a.asofdate, INTERVAL 14 DAY) AND h.expdate >= CURDATE() 

I'll also try to think of a better label for the hol_bal column.

 

Does this make sense?  Does it make the data model any more reasonable?  I'll continue to examine it, but at this point I'm not sure how else to structure this data.

 

As always, any / all help / suggestions are welcome.

 

I've removed the "solved" label to this post while discussion is still happening.

 

Thanks,

Adam

Link to comment
Share on other sites

Would it not be possible to store the holiday_id in the request record. You would then easily know by matching that TestA and TestB had not been taken yet without using the hours taken - ie match on dates and ids rather than the hours (incidentally the hol_bal is derived as a cumulative hours remaining through the year) and is the map table required?

Link to comment
Share on other sites

The reasons for the holiday_map is that an employee can make a request to use two holidays in one request; e.g., Take Feb 3 and 4 off using holidays Test A and Test B.

 

With the current set up that gets stored like so:

PTOREQ

+-----------+-------+------------+------------+------+------+-------+---------------------+
| ptoreq_id | ee_id | startdate  | enddate    | vac  | per  | hol   | reqdate             |
+-----------+-------+------------+------------+------+------+-------+---------------------+
|         1 |     1 | 2014-02-03 | 2014-02-04 | 0.00 | 0.00 | 16.00 | 2014-01-31 13:01:52 |
+-----------+-------+------------+------------+------+------+-------+---------------------+


PTOREQ_HOLIDAY_MAP


+----+-----------+------------+
| id | ptoreq_id | holiday_id |
+----+-----------+------------+
|  1 |         1 |         57 |
|  2 |         1 |         58 |
+----+-----------+------------+

So the ptoreq record shows the employee using 16 holiday hours.

 

Otherwise, to store the holiday ID in the request table, the table would have to look like this:

PTOREQ WITH STORED HOLIDAY ID

+-----------+-------+------------+------------+------+------+-------+-------+---------------------+
| ptoreq_id | ee_id | startdate  | enddate    | vac  | per  | hol_ID| hol   | reqdate             |
+-----------+-------+------------+------------+------+------+-------+-------+---------------------+
|         1 |     1 | 2014-02-03 | 2014-02-03 | 0.00 | 0.00 |    57 |  8.00 | 2014-01-31 13:01:52 |
+-----------+-------+------------+------------+------+------+-------+-------+---------------------+
|         2 |     1 | 2014-02-04 | 2014-02-04 | 0.00 | 0.00 |    58 |  8.00 | 2014-01-31 13:01:52 |
+-----------+-------+------------+------------+------+------+-------+-------+---------------------+

Or am I missing the better way to store the holiday ID?  Also, for requests that are not using holiday hours (the vast majority of requests will be for vacation, not holiday) I'll then have a bunch of NULL or empty values in that table.  I was under the (perhaps erroneous) impression that that was contraindicated.

 

I'll continue to think about the structure to see if I'm missing a simpler way of storing this data ... Also, while I see your point about hol_bal being derived, my thinking is that the hol_bal is intrinsically tied to the holiday, just as the day and date are.  But perhaps that's muddled thinking.

 

Barand, thanks again ... I appreciate very much your time on this matter.

 

Best,

Adam

Edited by jalinidy
Link to comment
Share on other sites

It seemed to me that when a solution needs several table subqueries to knock the data into a shape suitable for querying then there ought to be another way.

 

I'm in agreement!  I'd like my queries to be as simple as possible ... but I couldn't get it to work any other way.  As with so many things, the hardest part of this query is what in practice will be used the least ... i.e., most requests won't ever get into using stored holidays ... still, need to plan for those that will, eh?

 

If you don't mind, could you expand on the potential pitfalls of keeping the hol_bal column?  Where might that cause me grief in the future?  Not knowing all of the scenarios of my particular situation you may not be able to answer that ... but if you can, I'd much appreciate it.

 

Thanks again.

Link to comment
Share on other sites

You shouldn't store calculated data. Unless the data changes infrequently and used VERY OFTEN and the calculations are derived from a VERY LARGE data set and take 5 minutes to produce.

 

When you store calculations, you need mechanisms to ensure the stored value is in fact the correct value.  

When the data changes you need to update all of these balances columns and that is room for disaster.

 

You should be calculating balances on the fly.   Take max subtract used and you have a balance.  There is no way your data set is going to be large enough to warrant storing derived data.    Make sure to get your indices set up and make sure you're using them properly when joining / querying data. Learn how to read the output of MySQLs explain query feature.

 

Don't store a request off that spans multiple days in one row.    At most the request off should be a single date and time from / to.

 

date               |  time_from |    time_to

------------------------------------

2013-01-01   |  00:00:00   |   23:59:59                   =  24 HOURS the entire day.  You can check if it's a holiday by looking the date up in the holidays table. if you don't need to be exact on from & to, you can just store how many hours as a single column.  

 

You don't have to have an extra table to map the request off record with the holiday record-  The of the request off date should be enough to map it to any holiday.  You don't need an ID column in the holidays table.  The date column is the primary key / ID    :)

Link to comment
Share on other sites

 

 

The ... request off date should be enough to map it to any holiday.

 

Thanks, objnoob, for your reply.  Unfortunately the date of the request is not enough to map to any holiday ... The point is that an employee would be making a request to use a holiday at a later date than the original date of the holiday.  To give an example, the employee is obliged to work on New Years Day, Jan 1st.  So he takes the holiday later in the month, say Jan 15th.  The date of the request (Jan 15) does not map to the holiday date (Jan 1st).

 

Or have I misunderstood what you might have meant?

 

Thanks again.

A.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.