Jump to content

Get count of 'current' associated status each month


Psycho

Recommended Posts

I need to build a report that displays the count of values based on a condition over time. I am going to give a watered down version of the problem for clarity.

 

Assume I have two tables: projects and project_status. The project_status table tracks the changes to the status of the projects. I want to build a report on the number of projects that are currently in an 'open' status at the end of each month. The issue is that there could be multiple changes to the status of a project within a month or none at all. I can run a query to get the number of open projects for a single month (Join the projects table on the last status prior to the specific month). But I want to run a query that gets me the count for all months in the report range. Otherwise, I have to create a loop and run the query for each month in the report.

 

Here is some sample date

 

projects:

project_id | name
 1            Project A
 2            Project B
 3            Project C
 4            Project D

project_status:

status_id | project_id | status | change_date
 1           1           'Open'   '2015-01-12 17:07:31'
 2           1           'Closed' '2015-03-16 17:07:42'
 3           2           'Open'   '2015-01-13 17:09:31'
 4           2           'Closed' '2015-02-10 17:09:36'
 5           3           'Open'   '2015-01-13 17:11:39'
 6           4           'Open'   '2015-04-14 17:12:25'

So:

Project A: Opened in January, Closed in March

Project B: Opened in January, Closed in February

Project C: Opened in January, not closed

Project D: Opened in April, not closed

 

If I wanted a report for the first quarter, the data I want returned would be something like this (note: the numbers represent the count at the end of each month):

January:  3
February: 2
March:    1

Below is the query that would give the number of projects that are in an 'open' status at the end of March but, as I said, I want to have a query that gives me the records for each month in the report range rather than running separate queries for each month

SELECT COUNT(*)
FROM projects
 
JOIN (
    SELECT project_id, MAX(change_date) as last_date
    FROM `project_status`
    WHERE change_date < '2015-04-01'
    GROUP BY project_id
    ) AS last_status
  ON projects.project_id = last_status.project_id
 
JOIN project_status
  ON last_status.project_id = project_status.project_id
  AND last_status.last_date = project_status.change_date
 
WHERE project_status.status = 'Open'

Here is an export from MySQL of the sample data for anyone that wishes to help

CREATE TABLE `projects` (
  `project_id` int(11) NOT NULL auto_increment,
  `name` varchar(32) collate latin1_general_ci NOT NULL,
  PRIMARY KEY  (`project_id`)
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=5 ;
 
INSERT INTO `projects` VALUES (1, 'Project A');
INSERT INTO `projects` VALUES (2, 'Project B');
INSERT INTO `projects` VALUES (3, 'Project C');
INSERT INTO `projects` VALUES (4, 'Project D');
 
CREATE TABLE `project_status` (
  `status_id` int(11) NOT NULL auto_increment,
  `project_id` int(11) NOT NULL,
  `status` varchar(16) collate latin1_general_ci default NULL,
  `change_date` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`status_id`),
  KEY `project_id` (`project_id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=8 ;
 
INSERT INTO `project_status` VALUES (1, 1, 'Open', '2015-01-12 17:07:31');
INSERT INTO `project_status` VALUES (2, 1, 'Closed', '2015-03-16 17:07:42');
INSERT INTO `project_status` VALUES (3, 2, 'Open', '2015-01-13 17:09:31');
INSERT INTO `project_status` VALUES (4, 2, 'Closed', '2015-02-10 17:09:36');
INSERT INTO `project_status` VALUES (5, 3, 'Open', '2015-01-13 17:11:39');
INSERT INTO `project_status` VALUES (6, 4, 'Open', '2015-04-14 17:12:25');
Link to comment
Share on other sites

Not sure if this will be helpful, but it sounds a lot like something I recently had to do. Hopefully it helps. Output was used for line chart. See attachment. This is the query I used for my purpose:

SELECT C.month, sum(slab) as slab, sum(dried_in) as dried_in, sum(drywall) as drywall, sum(frame) as frame, C.year_month_number
FROM (
  (SELECT
              CASE MONTH (l.frame_date)
                WHEN 1 THEN CONCAT('Jan','-',Year (l.frame_date))
                WHEN 2 THEN CONCAT('Feb', '-', Year (l.frame_date))
                WHEN 3 THEN CONCAT('Mar','-',Year (l.frame_date))
                WHEN 4 THEN CONCAT('Apr', '-', Year (l.frame_date))
                WHEN 5 THEN CONCAT('May', '-', Year (l.frame_date))
                WHEN 6 THEN CONCAT('Jun', '-', Year (l.frame_date))
                WHEN 7 THEN CONCAT('Jul', '-', Year (l.frame_date))
                WHEN 8 THEN CONCAT('Aug', '-', Year (l.frame_date))
                WHEN 9 THEN CONCAT('Sep', '-', Year (l.frame_date))
                WHEN 10 THEN CONCAT('Oct', '-', Year (l.frame_date))
                WHEN 11 THEN CONCAT('Nov', '-', Year (l.frame_date))
                WHEN 12 THEN CONCAT('Dec', '-', Year (l.frame_date))
                ELSE 'Unknown'
              END as 'month',
              null as slab,
              null as drywall,
              COUNT(IF(l.frame_date is not null, 1, 0)) AS frame,
              null AS dried_in,
              CONCAT(Year (l.frame_date),LPAD(MONTH (l.frame_date),2,0)) year_month_number
    FROM
              lot as l INNER JOIN lot_type AS lt ON l.lot_type_id = lt.lot_type_id
                       INNER JOIN block as b ON b.block_id=l.block_id
                       INNER JOIN community as c ON c.community_id=b.community_id
    WHERE
              ( c.contract_type_id = 1 OR c.contract_type_id = $contract_type_id ) AND
              l.$active_column=1 AND
              l.lot_type_id <> 1 and
              l.frame_date is not null and
              (frame_date <= CURDATE() and frame_date >= DATE_ADD(DATE_ADD(CURDATE(),INTERVAL -11 month), interval -(day(DATE_ADD(CURDATE(),INTERVAL -11 month)) - 1) day))
    GROUP by MONTH (l.frame_date)
    ORDER BY MONTH (l.frame_date) ASC)

UNION ALL
  (SELECT
              CASE MONTH (l.drywall_date)
                WHEN 1 THEN CONCAT('Jan','-',Year (l.drywall_date))
                WHEN 2 THEN CONCAT('Feb', '-', Year (l.drywall_date))
                WHEN 3 THEN CONCAT('Mar','-',Year (l.drywall_date))
                WHEN 4 THEN CONCAT('Apr', '-', Year (l.drywall_date))
                WHEN 5 THEN CONCAT('May', '-', Year (l.drywall_date))
                WHEN 6 THEN CONCAT('Jun', '-', Year (l.drywall_date))
                WHEN 7 THEN CONCAT('Jul', '-', Year (l.drywall_date))
                WHEN 8 THEN CONCAT('Aug', '-', Year (l.drywall_date))
                WHEN 9 THEN CONCAT('Sep', '-', Year (l.drywall_date))
                WHEN 10 THEN CONCAT('Oct', '-', Year (l.drywall_date))
                WHEN 11 THEN CONCAT('Nov', '-', Year (l.drywall_date))
                WHEN 12 THEN CONCAT('Dec', '-', Year (l.drywall_date))
                ELSE 'Unknown'
              END as 'month',
              null as slab,
              COUNT(IF(l.drywall_date is not null, 1, 0)) AS drywall,
              null as frame,
              null AS dried_in,
              CONCAT(Year (l.drywall_date),LPAD(MONTH (l.drywall_date),2,0)) year_month_number
    FROM
              lot as l INNER JOIN lot_type AS lt ON l.lot_type_id = lt.lot_type_id
                       INNER JOIN block as b ON b.block_id=l.block_id
                       INNER JOIN community as c ON c.community_id=b.community_id
    WHERE
              ( c.contract_type_id = 1 OR c.contract_type_id = $contract_type_id ) AND
              l.$active_column=1 AND
              l.lot_type_id <> 1 and
              l.drywall_date is not null and
              (drywall_date <= CURDATE() and drywall_date >= DATE_ADD(DATE_ADD(CURDATE(),INTERVAL -11 month), interval -(day(DATE_ADD(CURDATE(),INTERVAL -11 month)) - 1) day))
    GROUP by MONTH (l.drywall_date)
    ORDER BY MONTH (l.drywall_date) ASC)

UNION ALL

  (SELECT
              CASE MONTH (l.slab_date)
                WHEN 1 THEN CONCAT('Jan','-',Year (l.slab_date))
                WHEN 2 THEN CONCAT('Feb', '-', Year (l.slab_date))
                WHEN 3 THEN CONCAT('Mar','-',Year (l.slab_date))
                WHEN 4 THEN CONCAT('Apr', '-', Year (l.slab_date))
                WHEN 5 THEN CONCAT('May', '-', Year (l.slab_date))
                WHEN 6 THEN CONCAT('Jun', '-', Year (l.slab_date))
                WHEN 7 THEN CONCAT('Jul', '-', Year (l.slab_date))
                WHEN 8 THEN CONCAT('Aug', '-', Year (l.slab_date))
                WHEN 9 THEN CONCAT('Sep', '-', Year (l.slab_date))
                WHEN 10 THEN CONCAT('Oct', '-', Year (l.slab_date))
                WHEN 11 THEN CONCAT('Nov', '-', Year (l.slab_date))
                WHEN 12 THEN CONCAT('Dec', '-', Year (l.slab_date))
                ELSE 'Unknown'
              END as 'month',
              COUNT(IF(l.slab_date is not null, 1, 0)) AS slab,
              null as drywall,
              null as frame,
              null AS dried_in,
              CONCAT(Year (l.slab_date),LPAD(MONTH (l.slab_date),2,0)) year_month_number
    FROM
              lot as l INNER JOIN lot_type AS lt ON l.lot_type_id = lt.lot_type_id
                       INNER JOIN block as b ON b.block_id=l.block_id
                       INNER JOIN community as c ON c.community_id=b.community_id
    WHERE
              ( c.contract_type_id = 1 OR c.contract_type_id = $contract_type_id ) AND
              l.$active_column=1 AND
              l.lot_type_id <> 1 and
              l.slab_date is not null and
              (slab_date <= CURDATE() and slab_date >= DATE_ADD(DATE_ADD(CURDATE(),INTERVAL -11 month), interval -(day(DATE_ADD(CURDATE(),INTERVAL -11 month)) - 1) day))
    GROUP by MONTH (l.slab_date)
    ORDER BY MONTH (l.slab_date) ASC)

UNION ALL

  (SELECT CASE MONTH (l.dried_in_date)
              WHEN 1 THEN CONCAT('Jan', '-', Year (l.dried_in_date))
              WHEN 2 THEN CONCAT('Feb', '-', Year (l.dried_in_date))
              WHEN 3 THEN CONCAT('Mar', '-', Year (l.dried_in_date))
              WHEN 4 THEN CONCAT('Apr', '-', Year (l.dried_in_date))
              WHEN 5 THEN CONCAT('May', '-', Year (l.dried_in_date))
              WHEN 6 THEN CONCAT('Jun', '-', Year (l.dried_in_date))
              WHEN 7 THEN CONCAT('Jul', '-', Year (l.dried_in_date))
              WHEN 8 THEN CONCAT('Aug', '-', Year (l.dried_in_date))
              WHEN 9 THEN CONCAT('Sep', '-', Year (l.dried_in_date))
              WHEN 10 THEN CONCAT('Oct', '-', Year (l.dried_in_date))
              WHEN 11 THEN CONCAT('Nov', '-', Year (l.dried_in_date))
              WHEN 12 THEN CONCAT('Dec', '-', Year (l.dried_in_date))
            ELSE 'Unknown'
            END as 'month',
            null AS slab,
            null as drywall,
            null as frame,
            Count(IF(l.dried_in_date is not null, 1 , 0)) AS dried_in,
            CONCAT(Year (l.dried_in_date), LPAD(MONTH (l.dried_in_date),2,0)) year_month_number
    FROM
            lot as l INNER JOIN lot_type AS lt ON l.lot_type_id = lt.lot_type_id
                     INNER JOIN block as b ON b.block_id=l.block_id
                     INNER JOIN community as c ON c.community_id=b.community_id
    WHERE
          ( c.contract_type_id = 1 OR c.contract_type_id = $contract_type_id ) AND
            l.$active_column=1 AND
            l.lot_type_id <> 1 and
            dried_in_date is not null and
            (dried_in_date <= CURDATE() and
              dried_in_date >= DATE_ADD(DATE_ADD(CURDATE(),INTERVAL -11 month), interval -(day(DATE_ADD(CURDATE(),INTERVAL -11 month)) - 1) day))
    GROUP by MONTH (l.dried_in_date)
    ORDER BY MONTH (l.dried_in_date) asc)
  ) as C
GROUP BY C.month
ORDER BY C.year_month_number asc

post-179806-0-75077000-1449162727_thumb.jpg

Edited by benanamen
Link to comment
Share on other sites

I *think* I see what you are doing there. You have separate queries to get the counts of different parameters and UNION them together. Each query appears to have logic to get the COUNT() for a particular parameter in each month/year.

 

Maybe I oversimplified my sample data. For example, I could have a project that is 'Open' on Dec. 5, gets changed to a status of 'Closed' on Dec. 5, then gets moved back to 'Open" on Dec. 10. When I get the number of projects that are open at the end of December, I need that project to count only once even though there are two status records in December with a a value of 'Open'.

 

I'll continue to review what you've posted to see if there is some solution I am not seeing.

Link to comment
Share on other sites

try

SELECT mname
, @tot := @tot + opened - closed as open_at_end
FROM (
	SELECT MONTH(change_date) as mth
        , MONTHNAME(change_date) as mname
	, SUM(status='Open') as opened
	, SUM(status='Closed') as closed
	FROM project_status
        WHERE QUARTER(change_date) = 1
	GROUP BY mth
	) totals
JOIN (SELECT @tot:=0) as init
Edited by Barand
Link to comment
Share on other sites

@Barand, I was waiting for you to respond :)

 

What you supplied works perfectly for the specific sample data I provided. But, when I add some more realistic data (data that existed prior to January) it is not giving me the expected values. At the bottom I have provided some updated sample data. That data should provide the following output:

January: 5

Feb: 4

March: 3

 

Using your query above I have made some progress and am very close, but there is one problem I am facing. I created the following query which correctly provides the change in open count each month:

SELECT MONTHNAME(change_date) AS monthname,
 
--    @open_count := @open_count +
              SUM(status='Open') - SUM(status='Closed')
              AS open_at_end
 
FROM project_status
WHERE DATE(change_date) BETWEEN '2015-01-01' AND '2015-03-31'
GROUP BY YEAR(change_date), MONTH(change_date)
ORDER BY YEAR(change_date), MONTH(change_date)

Result:

Jan: 3

Feb: -1

Mar: -1

 

So, the only thing missing is that it needs the starting value at the beginning of January and apply those changes to the running value for each month (which is what that commented line is for). So, I used the query I first provided to calculate and set that starting value and assign it to @open_count and it is being set correctly (2 with the new data). So, when I uncomment that line above I expect that it will return: January will be (2 + 3) 5, Feb will be (5 - 1) 4, March will be (4 - 1) 3. But, instead I am getting 5, 1, 1. Here is the same query, but I have hard coded the initial @open_count to 2 instead of providing the longer query above.

SET @open_count = 2;
SELECT MONTHNAME(change_date) AS monthname,
 
    @open_count := @open_count +
              SUM(status='Open') - SUM(status='Closed')
              AS open_at_end
 
FROM project_status
WHERE DATE(change_date) BETWEEN '2015-01-01' AND '2015-03-31'
GROUP BY YEAR(change_date), MONTH(change_date)
ORDER BY YEAR(change_date), MONTH(change_date)

Here is the new sample data

CREATE TABLE `projects` (
  `project_id` int(11) NOT NULL auto_increment,
  `name` varchar(32) collate latin1_general_ci NOT NULL,
  PRIMARY KEY  (`project_id`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=9 ;
 
INSERT INTO `projects` VALUES (1, 'Project A');
INSERT INTO `projects` VALUES (2, 'Project B');
INSERT INTO `projects` VALUES (3, 'Project C');
INSERT INTO `projects` VALUES (4, 'Project D');
INSERT INTO `projects` VALUES (5, 'Project E');
INSERT INTO `projects` VALUES (6, 'Project F');
INSERT INTO `projects` VALUES (7, 'Project G');
INSERT INTO `projects` VALUES (8, 'Project H');
 
CREATE TABLE `project_status` (
  `status_id` int(11) NOT NULL auto_increment,
  `project_id` int(11) NOT NULL,
  `status` varchar(16) collate latin1_general_ci default NULL,
  `change_date` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`status_id`),
  KEY `project_id` (`project_id`)
) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=16 ;
 
INSERT INTO `project_status` VALUES (1, 1, 'Open', '2015-01-12 17:07:31');
INSERT INTO `project_status` VALUES (2, 1, 'Closed', '2015-03-16 17:07:42');
INSERT INTO `project_status` VALUES (3, 2, 'Open', '2015-01-13 17:09:31');
INSERT INTO `project_status` VALUES (4, 2, 'Closed', '2015-02-10 17:09:36');
INSERT INTO `project_status` VALUES (5, 3, 'Open', '2015-01-13 17:11:39');
INSERT INTO `project_status` VALUES (6, 4, 'Open', '2015-04-14 17:12:25');
INSERT INTO `project_status` VALUES (8, 5, 'Open', '2015-02-03 12:48:09');
INSERT INTO `project_status` VALUES (9, 5, 'Closed', '2015-02-11 12:48:23');
INSERT INTO `project_status` VALUES (10, 5, 'Open', '2015-02-25 12:48:39');
INSERT INTO `project_status` VALUES (11, 6, 'Open', '2014-12-17 13:46:12');
INSERT INTO `project_status` VALUES (12, 7, 'Open', '2014-11-11 13:46:31');
INSERT INTO `project_status` VALUES (13, 7, 'Closed', '2014-12-16 13:46:48');
INSERT INTO `project_status` VALUES (14, 8, 'Open', '2014-12-17 13:47:01');
INSERT INTO `project_status` VALUES (15, 8, 'Closed', '2015-02-10 13:47:14');
 
Edited by Psycho
Link to comment
Share on other sites

I got it! I made the query a sub-query with the @open_count calculation as part of the outer query (like you had it). Not sure why it wasn't calculating correctly as a single query.

 

I still have more work to do as there are more complicated conditions than what I posted, but I'm pretty comfortable I can handle those. Here is the working query, but if you have any suggestions for improvement I would welcome them

 

 

-- Calculate initial open count
SET @open_count = (
    SELECT COUNT(*)
    FROM projects
 
    JOIN (
        SELECT project_id, MAX(change_date) as last_date
        FROM `project_status`
        WHERE change_date < '2015-01-01'
        GROUP BY project_id
        ) AS last_status
      ON projects.project_id = last_status.project_id
 
    JOIN project_status
      ON last_status.project_id = project_status.project_id
      AND last_status.last_date = project_status.change_date
 
    WHERE project_status.status = 'Open'
    );
 
-- Query the monthly changes in open count and apply to @open_count
SELECT yearNum, monthName, @open_count := @open_count + openChange as open_at_end
 
FROM (
    SELECT MONTH(change_date) as monthNum,
           MONTHNAME(change_date) AS monthName,
           YEAR(change_date) as yearNum,
           SUM(status='Open') - SUM(status='Closed') as openChange
    FROM project_status
    WHERE DATE(change_date) BETWEEN '2015-01-01' AND '2015-03-31'
    GROUP BY YEAR(change_date), MONTH(change_date)
) as periodTotals
 
ORDER BY yearNum, monthNum
Link to comment
Share on other sites

I realised that, given the cumulative nature of the query, it would be dependent on knowing how many projects were open already at the start of the quarter.

 

This would require setting the initial value of @tot to that number.

 

It is much like a bank statement with an opening balance. Taking a pragmatic approach, I would store this balance (the final value in the query results for a quarter) in another table, in my case `projects_open`

CREATE TABLE `projects_open` (
  `year` year(4) NOT NULL DEFAULT '0000',
  `qtr` tinyint(4) NOT NULL DEFAULT '0',
  `bf_bal` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`year`,`qtr`)
) ;

INSERT INTO `projects_open` VALUES (2014,4,2),(2015,1,3);

Then my query becomes

SELECT mname
, @tot := @tot + opened - closed as open_at_end
FROM (
    SELECT MONTH(change_date) as mth
    , MONTHNAME(change_date) as mname
    , SUM(status='Open') as opened
    , SUM(status='Closed') as closed
    FROM project_status
    WHERE YEAR(change_date)=2015 AND QUARTER(change_date) = 1
    GROUP BY mth
    ) totals
JOIN (
    SELECT @tot:=
        (SELECT bf_bal FROM projects_open WHERE year=2014 AND qtr=4)
    ) as init;
Link to comment
Share on other sites

Thanks Barand,

 

Unfortunately, in trying to take out some of the other complexities in order to focus on the specific problem I was facing, I ended up removing an aspect that makes that approach invalid. In my actual data there are hundreds of thousands of data and it was difficult to validate results. So, I created a small test bed of data to test the query using just "Open" and "Closed". My intent was to change it to status IN (list of status ids) once I found a solution.

 

In my actual data there are many different statuses that can represent "Open" or "Closed". A project can change from one Open status to another Open status - the same holds true for Closed statuses. Because of this, getting the 'difference' between "Open" and "Closed" statuses will not work.

 

This will not be used often. At this point, I'm going to use the query I started with that gets me the value for a single point in time and run it in a loop to get the value for all the periods in the report range. As long as the time is somewhat reasonable it will get me to a viable solution for now. I may also consider caching, but the intent is that the report can be run with many different options.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.