KristieF Posted November 22, 2018 Share Posted November 22, 2018 Hello - I'm trying to determine rolling user retention. I have a table called 'user' that contains the userId, and the date they created their account. I have a second table, activity_log, that also has the userId, and the dates they had activity. I'm looking for a count of userId's by month/year using the date their account was created. And then how many of those users were active a month from that date, then two months, then 3 months, etc. query 1 would look like: select count DATE_FORMAT(created, '%m %Y') AS DateCreated, userId from user where DATE(created) >= "2015-06-01" group by 1; Query 2 would then count how many of those users, by month and year, subsequently had a date in the activity_log from the user.created date +1, +2, +3, +4, etc. I've seen examples of this for SQL, by week, but I can't get it working in MySQL by month: WITH by_week AS (SELECT user_id, TD_DATE_TRUNC('week', login_time) AS login_week FROM logins GROUP BY 1, 2), with_first_week AS (SELECT user_id, login_week, FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week FROM by_week), with_week_number AS (SELECT user_id, login_week, first_week, (login_week - first_week) / (24 * 60 * 60 * 7) AS week_number FROM with_first_week) SELECT TD_TIME_FORMAT(first_week, 'yyyy-MM-dd') AS first_week, SUM(CASE WHEN week_number = 0 THEN 1 ELSE 0 END) AS week_0, SUM(CASE WHEN week_number = 1 THEN 1 ELSE 0 END) AS week_1, SUM(CASE WHEN week_number = 2 THEN 1 ELSE 0 END) AS week_2, SUM(CASE WHEN week_number = 3 THEN 1 ELSE 0 END) AS week_3, SUM(CASE WHEN week_number = 4 THEN 1 ELSE 0 END) AS week_4, SUM(CASE WHEN week_number = 5 THEN 1 ELSE 0 END) AS week_5, SUM(CASE WHEN week_number = 6 THEN 1 ELSE 0 END) AS week_6, SUM(CASE WHEN week_number = 7 THEN 1 ELSE 0 END) AS week_7, SUM(CASE WHEN week_number = 8 THEN 1 ELSE 0 END) AS week_8, SUM(CASE WHEN week_number = 9 THEN 1 ELSE 0 END) AS week_9 FROM with_week_number GROUP BY 1 ORDER BY 1 Quote Link to comment Share on other sites More sharing options...
Barand Posted November 22, 2018 Share Posted November 22, 2018 Question: If a user joins in January and is active in February and April, is that 3 months, because still active 3 months after joining? 2 months because active in anly 2 of those following months? The following assumes the former. DATA select * from user; select * from user_activity; +---------+----------+------------+ +------------------+---------+-------------+ | user_id | username | created | | user_activity_id | user_id | date_active | +---------+----------+------------+ +------------------+---------+-------------+ | 1 | Peter | 2017-10-04 | | 1 | 1 | 2017-10-04 | | 2 | Paul | 2018-03-01 | | 2 | 1 | 2017-10-19 | | 3 | Mary | 2018-04-15 | | 3 | 1 | 2017-11-03 | | 4 | Jayne | 2018-05-01 | | 4 | 1 | 2017-11-18 | | 5 | John | 2018-06-14 | | 5 | 2 | 2018-03-01 | | 6 | Archie | 2018-01-05 | | 6 | 2 | 2018-03-16 | | 7 | Barbara | 2018-02-25 | | 7 | 2 | 2018-03-31 | | 8 | Carol | 2018-01-26 | | 8 | 2 | 2018-04-15 | | 9 | Dave | 2018-03-20 | | 9 | 2 | 2018-04-30 | | 10 | Ernie | 2018-04-25 | | 10 | 2 | 2018-05-15 | +---------+----------+------------+ | 11 | 2 | 2018-05-30 | | 12 | 2 | 2018-06-14 | | 13 | 2 | 2018-06-29 | | 14 | 2 | 2018-07-14 | | 15 | 2 | 2018-07-29 | | 16 | 2 | 2018-08-13 | | 17 | 3 | 2018-04-15 | | 18 | 3 | 2018-04-30 | | 19 | 3 | 2018-05-15 | | 20 | 3 | 2018-05-30 | | 21 | 3 | 2018-06-14 | | 22 | 3 | 2018-06-29 | | 23 | 3 | 2018-07-14 | | 24 | 3 | 2018-07-29 | | 25 | 4 | 2018-05-01 | | 26 | 4 | 2018-05-16 | | 27 | 4 | 2018-05-31 | | 28 | 4 | 2018-06-15 | | 29 | 4 | 2018-06-30 | | 30 | 4 | 2018-07-15 | | 31 | 5 | 2018-06-14 | | 32 | 5 | 2018-06-29 | | 33 | 5 | 2018-07-14 | | 34 | 5 | 2018-07-29 | | 35 | 5 | 2018-08-13 | | 36 | 5 | 2018-08-28 | | 37 | 5 | 2018-09-12 | | 38 | 5 | 2018-09-27 | | 39 | 5 | 2018-10-12 | | 40 | 6 | 2018-01-05 | | 41 | 6 | 2018-01-20 | | 42 | 6 | 2018-02-04 | | 43 | 6 | 2018-02-19 | | 44 | 6 | 2018-03-06 | | 45 | 6 | 2018-03-21 | | 46 | 6 | 2018-04-05 | | 47 | 6 | 2018-04-20 | | 48 | 6 | 2018-05-05 | | 49 | 6 | 2018-05-20 | | 50 | 6 | 2018-06-04 | | 51 | 6 | 2018-06-19 | | 52 | 6 | 2018-07-04 | | 53 | 6 | 2018-07-19 | | 54 | 6 | 2018-08-03 | | 55 | 7 | 2018-02-25 | | 56 | 7 | 2018-03-12 | | 57 | 7 | 2018-03-27 | | 58 | 8 | 2018-01-26 | | 59 | 8 | 2018-02-10 | | 60 | 8 | 2018-02-25 | | 61 | 8 | 2018-03-12 | | 62 | 8 | 2018-03-27 | | 63 | 8 | 2018-04-11 | | 64 | 8 | 2018-04-26 | | 65 | 8 | 2018-05-11 | | 66 | 8 | 2018-05-26 | | 67 | 8 | 2018-06-10 | | 84 | 8 | 2018-07-27 | | 68 | 9 | 2018-03-20 | | 69 | 9 | 2018-04-04 | | 70 | 9 | 2018-04-19 | | 71 | 9 | 2018-05-04 | | 72 | 9 | 2018-05-19 | | 73 | 9 | 2018-06-03 | | 74 | 9 | 2018-06-18 | | 75 | 9 | 2018-07-03 | | 76 | 9 | 2018-07-18 | | 77 | 9 | 2018-08-02 | | 78 | 10 | 2018-04-25 | | 79 | 10 | 2018-05-10 | | 80 | 10 | 2018-05-25 | | 81 | 10 | 2018-06-09 | | 82 | 10 | 2018-06-24 | | 83 | 10 | 2018-07-09 | +------------------+---------+-------------+ QUERY SELECT months_active , COUNT(user_id) as num_users FROM ( SELECT user_id , MAX(timestampdiff(MONTH, created, date_active)) as months_active FROM user u INNER JOIN user_activity a USING (user_id) GROUP BY user_id ) as retention GROUP BY months_active; RESULTS +---------------+-----------+ | months_active | num_users | +---------------+-----------+ | 1 | 2 | | 2 | 2 | | 3 | 2 | | 4 | 1 | | 5 | 1 | | 6 | 2 | +---------------+-----------+ Quote Link to comment Share on other sites More sharing options...
KristieF Posted November 23, 2018 Author Share Posted November 23, 2018 Barand, thank you for the reply! This is close, but what I'm looking for is: Of the users created in June, how many of those were active in Month 1 (July)? How many of those users were active in Month 2 (August)? etc. through current date. Of the users created in July, how many were active in their Month 1 (August)? How many of those users were active in their Month 2 (September)? etc. through current date. Etc. through current month of users created. Essentially creating output I can use to populate a waterfall chart: Month Created Month 0 Month 1 Month 2 Month 3 June 2 2 1 1 July 2 1 0 0 August 3 3 2 1 Quote Link to comment Share on other sites More sharing options...
Barand Posted November 23, 2018 Share Posted November 23, 2018 Hmm, looks like I need a lot more test data ... Quote Link to comment Share on other sites More sharing options...
KristieF Posted November 23, 2018 Author Share Posted November 23, 2018 This site looks to have what I need, but it uses SQL, and I don't have enough knowledge to turn it into MySQL language. All I know is that it doesn't work when I copy/paste, substituting my tables called user and activity_log https://www.holistics.io/blog/calculate-cohort-retention-analysis-with-sql/ Quote Link to comment Share on other sites More sharing options...
Barand Posted November 23, 2018 Share Posted November 23, 2018 (edited) How about this? Same table but different (more) data DATA SELECT user_id ,created as date_joined ,GROUP_CONCAT(date_active SEPARATOR ', ') as active_dates FROM user JOIN user_activity using (user_id) GROUP BY user_id +---------+-------------+------------------------------------------------------------------------------------+ | user_id | date_joined | active_dates | +---------+-------------+------------------------------------------------------------------------------------+ | 1 | 2018-05-17 | 2018-05-17, 2018-06-17, 2018-07-17, 2018-08-17, 2018-09-17 | | 2 | 2018-05-24 | 2018-05-24, 2018-06-24, 2018-07-24 | | 3 | 2018-05-09 | 2018-05-09, 2018-06-09 | | 4 | 2018-05-10 | 2018-05-10, 2018-06-10, 2018-07-10, 2018-08-10, 2018-09-10, 2018-10-10, 2018-11-10 | | 5 | 2018-05-10 | 2018-05-10, 2018-06-10 | | 6 | 2018-05-15 | 2018-05-15, 2018-06-15, 2018-07-15, 2018-08-15, 2018-09-15, 2018-10-15, 2018-11-15 | | 7 | 2018-05-03 | 2018-05-03, 2018-06-03, 2018-07-03, 2018-08-03, 2018-09-03, 2018-10-03, 2018-11-03 | | 8 | 2018-05-02 | 2018-05-02, 2018-06-02 | | 9 | 2018-05-27 | 2018-05-27, 2018-06-27, 2018-07-27, 2018-08-27, 2018-09-27, 2018-10-27 | | 10 | 2018-05-23 | 2018-05-23, 2018-06-23, 2018-07-23, 2018-08-23, 2018-09-23, 2018-10-23 | | 11 | 2018-06-04 | 2018-06-04, 2018-07-04, 2018-08-04, 2018-09-04, 2018-10-04, 2018-11-04 | | 12 | 2018-06-17 | 2018-06-17, 2018-07-17, 2018-08-17 | | 13 | 2018-06-14 | 2018-06-14, 2018-07-14, 2018-08-14, 2018-09-14, 2018-10-14, 2018-11-14 | | 14 | 2018-06-26 | 2018-06-26, 2018-07-26 | | 15 | 2018-06-06 | 2018-06-06, 2018-07-06 | | 16 | 2018-06-24 | 2018-06-24, 2018-07-24, 2018-08-24 | | 17 | 2018-06-16 | 2018-06-16, 2018-07-16, 2018-08-16, 2018-09-16, 2018-10-16, 2018-11-16 | | 18 | 2018-06-17 | 2018-06-17, 2018-07-17, 2018-08-17, 2018-09-17, 2018-10-17, 2018-11-17 | | 19 | 2018-06-20 | 2018-06-20, 2018-07-20, 2018-08-20, 2018-09-20 | | 20 | 2018-06-10 | 2018-06-10, 2018-07-10, 2018-08-10, 2018-09-10, 2018-10-10, 2018-11-10 | | 21 | 2018-07-08 | 2018-07-08, 2018-08-08, 2018-09-08 | | 22 | 2018-07-28 | 2018-07-28, 2018-08-28, 2018-09-28, 2018-10-28 | | 23 | 2018-07-12 | 2018-07-12, 2018-08-12, 2018-09-12, 2018-10-12, 2018-11-12 | | 24 | 2018-07-28 | 2018-07-28, 2018-08-28, 2018-09-28, 2018-10-28 | | 25 | 2018-07-06 | 2018-07-06, 2018-08-06 | | 26 | 2018-07-26 | 2018-07-26, 2018-08-26 | | 27 | 2018-07-20 | 2018-07-20, 2018-08-20, 2018-09-20, 2018-10-20, 2018-11-20 | | 28 | 2018-07-09 | 2018-07-09, 2018-08-09 | | 29 | 2018-07-13 | 2018-07-13, 2018-08-13, 2018-09-13, 2018-10-13, 2018-11-13 | | 30 | 2018-07-15 | 2018-07-15, 2018-08-15, 2018-09-15, 2018-10-15, 2018-11-15 | | 31 | 2018-08-12 | 2018-08-12, 2018-09-12, 2018-10-12, 2018-11-12 | | 32 | 2018-08-07 | 2018-08-07, 2018-09-07, 2018-10-07, 2018-11-07 | | 33 | 2018-08-18 | 2018-08-18, 2018-09-18, 2018-10-18, 2018-11-18 | | 34 | 2018-08-15 | 2018-08-15, 2018-09-15, 2018-10-15, 2018-11-15 | | 35 | 2018-08-26 | 2018-08-26, 2018-09-26, 2018-10-26 | | 36 | 2018-08-08 | 2018-08-08, 2018-09-08, 2018-10-08 | | 37 | 2018-08-22 | 2018-08-22, 2018-09-22, 2018-10-22, 2018-11-22 | | 38 | 2018-08-06 | 2018-08-06, 2018-09-06, 2018-10-06, 2018-11-06 | | 39 | 2018-08-01 | 2018-08-01, 2018-09-01 | | 40 | 2018-08-11 | 2018-08-11, 2018-09-11, 2018-10-11 | +---------+-------------+------------------------------------------------------------------------------------+ QUERY SELECT monthname(created) as created , timestampdiff(MONTH, created, date_active) as duration , COUNT(DISTINCT user_id) as user_count FROM user JOIN user_activity USING (user_id) GROUP BY MONTH(created), duration; +--------------+----------+------------+ | created | duration | user_count | +--------------+----------+------------+ | May | 0 | 10 | | May | 1 | 10 | | May | 2 | 7 | | May | 3 | 6 | | May | 4 | 6 | | May | 5 | 5 | | May | 6 | 3 | | June | 0 | 10 | | June | 1 | 10 | | June | 2 | 8 | | June | 3 | 6 | | June | 4 | 5 | | June | 5 | 5 | | July | 0 | 10 | | July | 1 | 10 | | July | 2 | 7 | | July | 3 | 6 | | July | 4 | 4 | | August | 0 | 10 | | August | 1 | 10 | | August | 2 | 9 | | August | 3 | 6 | +--------------+----------+------------+ Using php to run the query and format the output: CODE $res = $db->query("SELECT monthname(created) as created , timestampdiff(MONTH, created, date_active) as duration , COUNT(DISTINCT user_id) as user_count FROM user LEFT JOIN user_activity USING (user_id) WHERE created > CURDATE() - INTERVAL 1 YEAR GROUP BY MONTH(created), duration" ); $data = []; $empty = array_fill_keys(range(0,11), ''); // empty array for each row // // store data in array with same structure as required output // foreach ($res as $rec) { if ( !isset($data[$rec['created']])) { $data[$rec['created']] = $empty; } $data[$rec['created']][$rec['duration']] = $rec['user_count']; } // // generate the table output from the data array // $thead = '<tr><th>Date<br>Created</th>'; foreach (array_keys($empty) as $m) { $thead .= "<th>Month $m</th>"; } $thead .= "</tr>\n"; $tdata = ''; foreach ($data as $month =>$users) { $tdata .= "<tr><td class='month'>$month</td><td>" . join('</td><td>', $users ) . "</td><tr>\n"; } ?> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="generator" content="PhpED 18.0 (Build 18044, 64bit)"> <title>Sample</title> <meta name="author" content="Barand"> <meta name="creation-date" content="11/23/2018"> <style type='text/css'> table { font-family: verdana, sans-serif; font-size: 11pt; border-collapse: collapse; } th { background-color: #369; color: white; padding: 5pt; } td { text-align: center; padding: 8px; } td.month { text-align: left; background-color: #C5F8F8; } </style> </head> <body> <table border='1'> <thead><?=$thead?></thead> <tbody><?=$tdata?></tbody> </table> </body> </html> RESULT (I'll leave the pretty gradient colours to you) Edited November 23, 2018 by Barand Correction to query to use COUNT(DISTINCT user_id) 1 Quote Link to comment Share on other sites More sharing options...
KristieF Posted November 24, 2018 Author Share Posted November 24, 2018 Barand, this worked! Many many thanks!!! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.