Jump to content

calculate total absents and presents based on user login data


Recommended Posts

Hi, 

I have the following login and logout data of a student :-

Quote

image.png.fdc40d94bdd1f89ada0e0330c8758eda.png

 

 Would it be possible to calculate the absents by subtracting consecutive times of logins or logouts using just Mysql ?  From the example above the  absents between 2019-04-01 and 2019-03-30  should be SELECT DATEDIFF("2019-04-01", "2019-03-30") = 2. So actual absents is 2 -1 = 1 and so on for all the consecutive dates and then their sum. 

I have no clue how to go about this. Gurus please help. 

Thanks !

 

 

Link to post
Share on other sites

Yes.

Make sure your data is ordered by date

Use an @variable to store the record's date

compare the date in record n with the date stored from record n-1

store the cumalative difference in a second @variable

 

[edit] PS as it isn't possible to load your data into a test table from an image, that's all the help I'll give.

Edited by Barand
Link to post
Share on other sites

My version would be to JOIN the table to itself once or twice in order to pair up consecutive records, then DATEDIFF()-1 the end of the first with the start of the second, then SUM the results.

  • Like 1
Link to post
Share on other sites

Thanks Guru Barand and Requinix for the replies. 

@Guru Barand : Please find the data attached as data.  I have tried your method but I am stuck at the comparison part where I need to reference the next date. 

Quote

+-------+---------------------+---------------------+
| RecNo | TimeOfLogin         | TimeOfLogout        |
+-------+---------------------+---------------------+
|     3 | 2019-03-30 17:05:24 | 2019-03-30 17:09:47 |
|     6 | 2019-04-01 15:13:32 | 2019-04-01 15:19:46 |
|     7 | 2019-04-04 23:37:21 | 2019-04-04 23:50:51 |
|     8 | 2019-04-18 15:28:35 | 2019-04-18 15:33:10 |
|     9 | 2019-04-23 16:35:20 | 2019-04-23 16:42:35 |
|    10 | 2019-04-24 12:03:07 | 2019-04-24 12:10:28 |
|    11 | 2019-05-01 08:05:48 | 2019-05-01 08:20:28 |
|    13 | 2019-05-08 18:04:04 | 2019-05-08 18:14:57 |
|    14 | 2019-05-09 08:18:15 | 2019-05-09 08:29:38 |
|    15 | 2019-06-18 12:49:01 | 2019-06-18 13:10:15 |
|    17 | 2019-09-05 17:17:33 | 2019-09-13 15:24:28 |
|    18 | 2019-09-28 07:05:03 | 2019-09-28 08:12:26 |
|    19 | 2019-09-28 12:55:56 | 2019-09-28 13:21:15 |
|    20 | 2019-09-28 16:47:52 | 2019-10-01 16:28:18 |
|    22 | 2019-10-03 13:11:44 | 2019-12-10 17:56:25 |
|    74 | 2020-05-22 12:08:32 | 2020-08-27 17:21:02 |
+-------+---------------------+---------------------+
 

  

 @ Requinix :  HI, I get the idea but I am not sure how to go about it.

 

Thank you.

 

Link to post
Share on other sites

OK, I loaded your data into a test table

INSERT INTO ajoo_login (datein, dateout) VALUES
('2019-03-30 17:05:24', '2019-03-30 17:09:47'),
('2019-04-01 15:13:32', '2019-04-01 15:19:46'),
('2019-04-04 23:37:21', '2019-04-04 23:50:51'),
('2019-04-18 15:28:35', '2019-04-18 15:33:10'),
('2019-04-23 16:35:20', '2019-04-23 16:42:35'),
('2019-04-24 12:03:07', '2019-04-24 12:10:28'),
('2019-05-01 08:05:48', '2019-05-01 08:20:28'),
('2019-05-08 18:04:04', '2019-05-08 18:14:57'),
('2019-05-09 08:18:15', '2019-05-09 08:29:38'),
('2019-06-18 12:49:01', '2019-06-18 13:10:15'),
('2019-09-05 17:17:33', '2019-09-13 15:24:28'),
('2019-09-28 07:05:03', '2019-09-28 08:12:26'),
('2019-09-28 12:55:56', '2019-09-28 13:21:15'),
('2019-09-28 16:47:52', '2019-10-01 16:28:18'),
('2019-10-03 13:11:44', '2019-12-10 17:56:25'),
('2020-05-22 12:08:32', '2020-08-27 17:21:02');

Running the query gives

SELECT SUM(diff) AS tot_absent
FROM (
        SELECT 
              CASE WHEN DATE(datein) > DATE(@prevout) 
                   THEN DATEDIFF(datein, @prevout) - 1 
                   ELSE 0
                   END AS diff
            , datein
            , @prevout := dateout AS dateout             -- store dateout in @prevout
        FROM ajoo_login
             JOIN (SELECT @prevout := NULL) init         -- initialize @prevout
     ) logins;

+------------+
| tot_absent |
+------------+
|        327 |
+------------+

Running just the subquery portion gives

mysql> SELECT
    ->       CASE WHEN DATE(datein) > DATE(@prevout)
    ->            THEN DATEDIFF(datein, @prevout) - 1
    ->            ELSE 0
    ->            END AS diff
    ->     , datein
    ->     , @prevout := dateout AS dateout
    -> FROM ajoo_login
    ->      JOIN (SELECT @prevout := NULL) init;
+------+---------------------+---------------------+
| diff | datein              | dateout             |
+------+---------------------+---------------------+
|    0 | 2019-03-30 17:05:24 | 2019-03-30 17:09:47 |
|    1 | 2019-04-01 15:13:32 | 2019-04-01 15:19:46 |
|    2 | 2019-04-04 23:37:21 | 2019-04-04 23:50:51 |
|   13 | 2019-04-18 15:28:35 | 2019-04-18 15:33:10 |
|    4 | 2019-04-23 16:35:20 | 2019-04-23 16:42:35 |
|    0 | 2019-04-24 12:03:07 | 2019-04-24 12:10:28 |
|    6 | 2019-05-01 08:05:48 | 2019-05-01 08:20:28 |
|    6 | 2019-05-08 18:04:04 | 2019-05-08 18:14:57 |
|    0 | 2019-05-09 08:18:15 | 2019-05-09 08:29:38 |
|   39 | 2019-06-18 12:49:01 | 2019-06-18 13:10:15 |
|   78 | 2019-09-05 17:17:33 | 2019-09-13 15:24:28 |
|   14 | 2019-09-28 07:05:03 | 2019-09-28 08:12:26 |
|    0 | 2019-09-28 12:55:56 | 2019-09-28 13:21:15 |
|    0 | 2019-09-28 16:47:52 | 2019-10-01 16:28:18 |
|    1 | 2019-10-03 13:11:44 | 2019-12-10 17:56:25 |
|  163 | 2020-05-22 12:08:32 | 2020-08-27 17:21:02 |
+------+---------------------+---------------------+

 

  • Like 1
  • Great Answer 1
Link to post
Share on other sites

Hi Guru Barand, 

Thank you very much.  I couldn't have figured it out this way since I have never really worked much with @variable in mysql. I am not sure I understand their working very well.

 

However, this below is what I tried and it almost works except for the first and last bit of absents and resembles somewhat what requinix suggested.

SELECT
  RecNo,
  DATEDIFF((
    SELECT t.TimeOfLogin FROM india_sessdata t WHERE t.RecNo > t1.RecNo ORDER BY t.RecNo LIMIT 1),
    MIN(TimeOfLogin)
  ) diff FROM india_sessdata t1
  WHERE t1.StudentLogin = 'mina1111'
GROUP BY RecNo;

which gives the following output

+-------+------+
| RecNo | diff |
+-------+------+
|     3 |  184 |
|     6 |    3 |
|     7 |   14 |
|     8 |    5 |
|     9 |    1 |
|    10 |    7 |
|    11 |    7 |
|    13 |    1 |
|    14 |   40 |
|    15 |   79 |
|    17 |   23 |
|    18 |    0 |
|    19 |    0 |
|    20 |    3 |
|    22 |    0 |
|    74 |   21 |
+-------+------+

 

The first is incorrect because I think I am using a subset of values related to StudentLogin = 'mina1111'. This bit  -- MIN(TimeOfLogin) -- in the query needs to be tweaked to get the correct first value. I tried a few things but none worked. 

The last entry is not really wrong but is incomplete. The SUM( ---) function needs to use only TimeOfLogin or dateIn (as used by you). The differences of all dateIn values needs to be calculated and in the end to complete the absents , for the last row alone,  it needs to calculate (dateOut - dateIn). 

 

I am trying to understand your solution. 

The value of Date(@prevout)  

 CASE WHEN DATE(datein) > DATE(@prevout) 

is initialized further down here 

 JOIN (SELECT @prevout := NULL) init         -- initialize @prevout

Shouldn't it be initailized someplace before it is used at DATE(@prevout) ? Also I cannot really visualize what this join is doing. Some explanation would really help.

Thanks.

Link to post
Share on other sites

Meanwhile, here's an alternative solution to my previous one, this one without the SQL variables.

SELECT SUM(CASE WHEN DATE(datein) > DATE(dateout) 
                   THEN DATEDIFF(datein, dateout) - 1 
                   ELSE 0
                   END ) as tot_absent
FROM (
        SELECT a.dateout
             , MIN(b.datein) as datein
        FROM ajoo_login a
             LEFT JOIN
             ajoo_login b ON a.dateout < b.datein
        GROUP BY a.dateout
     ) logins;
+------------+
| tot_absent |
+------------+
|        327 |
+------------+

 

  • Like 1
  • Thanks 1
Link to post
Share on other sites
20 hours ago, Barand said:

Don't keep us in the dark and still expect help.

Sir that's not my intention at all.  It's just that sometimes there are so may aspects to a problem that I try and request help for the core of it while trying to attempt the changes around it myself. It's only if I get stuck subsequently, like because of what i added, then I ask again. But I will keep it in mind and try and avoid that as far as possible. Thank you.

Edited by ajoo
Link to post
Share on other sites
17 hours ago, Barand said:

Meanwhile, here's an alternative solution to my previous one, this one without the SQL variables.


SELECT SUM(CASE WHEN DATE(datein) > DATE(dateout) 
                   THEN DATEDIFF(datein, dateout) - 1 
                   ELSE 0
                   END ) as tot_absent
FROM (
        SELECT a.dateout
             , MIN(b.datein) as datein
        FROM ajoo_login a
             LEFT JOIN
             ajoo_login b ON a.dateout < b.datein
        GROUP BY a.dateout
     ) logins;
+------------+
| tot_absent |
+------------+
|        327 |
+------------+

 

Thank you Guru Barand. May I request you to clear some doubts that I expressed above in my earlier message. Thank you.🙏 

Link to post
Share on other sites
22 hours ago, ajoo said:

Shouldn't it be initailized someplace before it is used at DATE(@prevout) ? Also I cannot really visualize what this join is doing. Some explanation would really help.

Because of the JOIN, it is initialized before it is used. As an alternative to the join you could have two queries ...

SELECT @prevout := NULL;                                 -- initialize @prevout

SELECT SUM(diff) AS tot_absent
FROM (
        SELECT 
              CASE WHEN DATE(datein) > DATE(@prevout) 
                   THEN DATEDIFF(datein, @prevout) - 1 
                   ELSE 0
                   END AS diff
            , datein
            , @prevout := dateout AS dateout             -- store dateout in @prevout
        FROM ajoo_login
     ) logins;

@vars are just like variables  in any other language - you assign values to them (and use those values) as the query processor loops through each of the records

  • Thanks 1
Link to post
Share on other sites

Introducing your "studentlogin" column to the data ...

+----+---------------------+---------------------+--------------+
| id | datein              | dateout             | studentlogin |
+----+---------------------+---------------------+--------------+
|  1 | 2019-03-30 17:05:24 | 2019-03-30 17:09:47 | mina1111     |
|  2 | 2019-04-01 15:13:32 | 2019-04-01 15:19:46 | abcd1234     |
|  3 | 2019-04-04 23:37:21 | 2019-04-04 23:50:51 | xyz12345     |
|  4 | 2019-04-18 15:28:35 | 2019-04-18 15:33:10 | mina1111     |
|  5 | 2019-04-23 16:35:20 | 2019-04-23 16:42:35 | xyz12345     |
|  6 | 2019-04-24 12:03:07 | 2019-04-24 12:10:28 | abcd1234     |
|  7 | 2019-05-01 08:05:48 | 2019-05-01 08:20:28 | abcd1234     |
|  8 | 2019-05-08 18:04:04 | 2019-05-08 18:14:57 | xyz12345     |
|  9 | 2019-05-09 08:18:15 | 2019-05-09 08:29:38 | mina1111     |
| 10 | 2019-06-18 12:49:01 | 2019-06-18 13:10:15 | xyz12345     |
| 11 | 2019-09-05 17:17:33 | 2019-09-13 15:24:28 | abcd1234     |
| 12 | 2019-09-28 07:05:03 | 2019-09-28 08:12:26 | mina1111     |
| 13 | 2019-09-28 12:55:56 | 2019-09-28 13:21:15 | xyz12345     |
| 14 | 2019-09-28 16:47:52 | 2019-10-01 16:28:18 | abcd1234     |
| 15 | 2019-10-03 13:11:44 | 2019-12-10 17:56:25 | mina1111     |
| 16 | 2020-05-22 12:08:32 | 2020-08-27 17:21:02 | xyz12345     |
+----+---------------------+---------------------+--------------+

then

SELECT studentlogin
     , SUM(CASE WHEN DATE(datein) > DATE(dateout) 
                   THEN DATEDIFF(datein, dateout) - 1 
                   ELSE 0
                   END ) as tot_absent
FROM (
        SELECT a.dateout
             , MIN(b.datein) as datein
             , a.studentlogin
        FROM ajoo_login a
             LEFT JOIN
             ajoo_login b ON a.dateout < b.datein
                          AND a.studentlogin = b.studentlogin
        GROUP BY a.studentlogin, a.dateout
     ) logins
GROUP BY studentlogin;
                                                 
+--------------+------------+
| studentlogin | tot_absent |
+--------------+------------+
| abcd1234     |        168 |
| mina1111     |        183 |
| xyz12345     |        409 |
+--------------+------------+                                                 

 

Link to post
Share on other sites

There are 2 gap difference that both these solutions have overlooked.  A look at row 10 and 11 reveals that there a gap of 8 days between dateout on 10th row and datein on 11th row. which is unaccounted for. So on 11th row  the diff should be 22 and not 14.

Further on the last row there is a gap which could be calculated between dateout on 14th row and datein on 15th row i.e. DATEDIFF( '2020-08-27', '2020-05-22').

+-----+---------------+------------------------+
| Row |  dateout      | datein	        |diff  | 
+-----+---------------+------------------------+ 
|  1  | '2019-03-30'  | '2019-04-01'	|  1   |	
|  2  | '2019-04-01'  | '2019-04-04'	|  2   | 
|  3  | '2019-04-04'  | '2019-04-18'	|  13  | 
|  4  | '2019-04-18'  | '2019-04-23'	|  4   | 
|  5  | '2019-04-23'  | '2019-04-24'	|  0   | 
|  6  | '2019-04-24'  | '2019-05-01'	|  6   | 
|  7  | '2019-05-01'  | '2019-05-08'	|  6   | 
|  8  | '2019-05-08'  | '2019-05-09'	|  0   | 
|  9  | '2019-05-09'  | '2019-06-18'	|  39  | 
|  10 | '2019-06-18'  | '2019-09-05'	|  78  | 
|  11 | '2019-09-13'  | '2019-09-28'	|  14  | 
|  12 | '2019-09-28'  | '2019-10-03'	|  4   | 
|  13 | '2019-10-01'  | '2019-10-03'	|  1   | 
|  14 | '2019-12-10'  | '2020-05-22'	|  163 | 
|  15 | '2020-08-27'  | (null)	0   	|      |
+-----+----------------------------------------+

Thanks ! 

 

Link to post
Share on other sites

Sir, 

I am referring to the examples above. If I may refer to the very first solution you proposed as below: An examination of output table produced on running the subquery in your example shows a total absents of 327 days. However inspecting row 11 showa that a gap of 8 days between the datein and dateout of record 11 is unaccounted for.  This will happen for all cases where the datein and dateout of a given row are different.  This can be rectified by using only the datein of each row and ignoring dateout

Finally the absents of the last row (dateout - datein)-1 are not being accounted for in this and subsequent solutions. I hope I am clear in defining the issue this time ! 😅

Thanks loads !🙏 

 

On 11/28/2020 at 4:11 PM, Barand said:

OK, I loaded your data into a test table


INSERT INTO ajoo_login (datein, dateout) VALUES
('2019-03-30 17:05:24', '2019-03-30 17:09:47'),
('2019-04-01 15:13:32', '2019-04-01 15:19:46'),
('2019-04-04 23:37:21', '2019-04-04 23:50:51'),
('2019-04-18 15:28:35', '2019-04-18 15:33:10'),
('2019-04-23 16:35:20', '2019-04-23 16:42:35'),
('2019-04-24 12:03:07', '2019-04-24 12:10:28'),
('2019-05-01 08:05:48', '2019-05-01 08:20:28'),
('2019-05-08 18:04:04', '2019-05-08 18:14:57'),
('2019-05-09 08:18:15', '2019-05-09 08:29:38'),
('2019-06-18 12:49:01', '2019-06-18 13:10:15'),
('2019-09-05 17:17:33', '2019-09-13 15:24:28'),
('2019-09-28 07:05:03', '2019-09-28 08:12:26'),
('2019-09-28 12:55:56', '2019-09-28 13:21:15'),
('2019-09-28 16:47:52', '2019-10-01 16:28:18'),
('2019-10-03 13:11:44', '2019-12-10 17:56:25'),
('2020-05-22 12:08:32', '2020-08-27 17:21:02');

Running the query gives


SELECT SUM(diff) AS tot_absent
FROM (
        SELECT 
              CASE WHEN DATE(datein) > DATE(@prevout) 
                   THEN DATEDIFF(datein, @prevout) - 1 
                   ELSE 0
                   END AS diff
            , datein
            , @prevout := dateout AS dateout             -- store dateout in @prevout
        FROM ajoo_login
             JOIN (SELECT @prevout := NULL) init         -- initialize @prevout
     ) logins;

+------------+
| tot_absent |
+------------+
|        327 |
+------------+

Running just the subquery portion gives


mysql> SELECT
    ->       CASE WHEN DATE(datein) > DATE(@prevout)
    ->            THEN DATEDIFF(datein, @prevout) - 1
    ->            ELSE 0
    ->            END AS diff
    ->     , datein
    ->     , @prevout := dateout AS dateout
    -> FROM ajoo_login
    ->      JOIN (SELECT @prevout := NULL) init;
+------+---------------------+---------------------+
| diff | datein              | dateout             |
+------+---------------------+---------------------+
|    0 | 2019-03-30 17:05:24 | 2019-03-30 17:09:47 |
|    1 | 2019-04-01 15:13:32 | 2019-04-01 15:19:46 |
|    2 | 2019-04-04 23:37:21 | 2019-04-04 23:50:51 |
|   13 | 2019-04-18 15:28:35 | 2019-04-18 15:33:10 |
|    4 | 2019-04-23 16:35:20 | 2019-04-23 16:42:35 |
|    0 | 2019-04-24 12:03:07 | 2019-04-24 12:10:28 |
|    6 | 2019-05-01 08:05:48 | 2019-05-01 08:20:28 |
|    6 | 2019-05-08 18:04:04 | 2019-05-08 18:14:57 |
|    0 | 2019-05-09 08:18:15 | 2019-05-09 08:29:38 |
|   39 | 2019-06-18 12:49:01 | 2019-06-18 13:10:15 |
|   78 | 2019-09-05 17:17:33 | 2019-09-13 15:24:28 |
|   14 | 2019-09-28 07:05:03 | 2019-09-28 08:12:26 |
|    0 | 2019-09-28 12:55:56 | 2019-09-28 13:21:15 |
|    0 | 2019-09-28 16:47:52 | 2019-10-01 16:28:18 |
|    1 | 2019-10-03 13:11:44 | 2019-12-10 17:56:25 |
|  163 | 2020-05-22 12:08:32 | 2020-08-27 17:21:02 |
+------+---------------------+---------------------+

 

 

Link to post
Share on other sites

The whole purpose of this exercise was to to find the number of days between the date the user logging out and the date of their next login

image.png.3382a47b683a40fb94e51e36380d2820.png

6 hours ago, ajoo said:

However inspecting row 11 showa that a gap of 8 days between the datein and dateout of record 11 is unaccounted for.

If you think that then you are nor understanding your own data or your initial description of the problem was wrong. 8 days is the number of days logged in in row 11, not the the absent time between logout/next login.

As far as I am concerned, the query and results are correct.

Link to post
Share on other sites
16 hours ago, Barand said:

The whole purpose of this exercise was to to find the number of days between the date the user logging out and the date of their next login

image.png.3382a47b683a40fb94e51e36380d2820.png

If you think that then you are nor understanding your own data or your initial description of the problem was wrong. 8 days is the number of days logged in in row 11, not the the absent time between logout/next login.

As far as I am concerned, the query and results are correct.

Hello Sir, 

I am quite sure that you are correct. Possibly my description of the problem was flawed. Though I had remarked once or twice in my replies that we should use just the days between logins to find the correct number of gaps or absents and then for the edge condition, add (dateout - datein) of the last record for that user to the total count.

Sir I am not saying that your solution is wrong anywhere. I was only trying to spell out what is desired by me. 

Thank you so much. 

Link to post
Share on other sites

I am getting 502 as the sum below. The last date is the last logout date.

datein		diff
30-03-2019	0
01-04-2019	1
04-04-2019	2
18-04-2019	13
23-04-2019	4
24-04-2019	0
01-05-2019	6
08-05-2019	6
09-05-2019	0
18-06-2019	39
05-09-2019	78
28-09-2019	22
28-09-2019	0
28-09-2019	0
03-10-2019	4
22-05-2020	231
27-08-2020	96
	total   502

 

Link to post
Share on other sites

I've given you a couple of alternative methods - the main change is to to use the previous log in instead of the log out time so it shouldn't be to difficult for you to alter the queries to your requirements. The tricky bit is your "edge condition", but give it a go. (my 503 was because I forgot to subtract the 1 to get the days inbetween when getting the last record's time difference, so I had 97)

I still don't understand why, when you are calculating days between logins, you decide to add in the duration of the login for the last record (15 apples + 1 orange). If anything I would have the thought the last record's time would be from final login to the current date. Just my 0.02 worth

Link to post
Share on other sites

SIr, 

I think I will get the solution from all the solutions provided by you. In case I run into a roadblock. i'll ask again.

11 hours ago, Barand said:

Just my 0.02 worth

is worth 2K Guru Barand. You are absolutely correct . The current date should be used as the last date for the edge condition.  

Thank you.🙏

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.