June 19, 2018 at 4:23 am
This query has been driving me nuts and need some help please. I have a working example below but, to try to explain, I collect data on what databases have been used or not since the last service start/restart. So, to sum the Days Uptime, I need to sum on a Group based on the SQL Server instance name and the database name. However, the tricky part is that there are some basic rules to identify 'unused' databases in a sequence of periods of uptime and what rows to sum. For example:-
a) If there are n rows for a Server and Database representing periods of uptime and the given database has never been marked as Used on any row, return and sum all the rows in that group
b) If a given database has been used in a previous period of uptime but not marked as used in more recent period periods of uptime, only sum the rows AFTER the last period of uptime where it was used.
c) In any case, if the most recent row for a given Server and Database shows the database has been used, don't sum or return any rows for it.
If you refer to the query I've developed below, it's using an INNER JOIN. I've tried using a LEFT JOIN but that isn't the solution. I can see the issue in the query and that is that the subquery on which the INNER JOIN is made is based on a clause, WHERE Used = 1. This works for groups that have at least one record where Used = 1 but, of course, zero records are return for groups of rows where all of those rows have Used = 0. In these cases, instead of summing all the rows because all are Used = 0, those Databases are not returned/summed at all. So, at this point, I'm guessing there is a smart way to write this query to avoid the problem but I don't know what it is! Would appreciate some assistance with this.
IF EXISTS (SELECT OBJECT_ID('TEMPDB..#tbla')) DROP TABLE #tbla
GO
CREATE TABLE #tbla
(
SQLInstance varchar(16),
DBName varchar(8),
SvcStartDate datetime,
Used bit,
DaysUptime SMALLINT
)
SET NOCOUNT ON
-- DB1 is all Used = 1 so ignore completely
INSERT INTO #tbla VALUES ('SQL1', 'DB1','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB1','2018-06-15 04:14:00', 1, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB1','2018-06-16 02:10:10', 1, 1)
-- DB2 is 0 in every case so all rows should be included (summing the last column)
INSERT INTO #tbla VALUES ('SQL1', 'DB2','2018-06-14 04:14:25', 0, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB2','2018-06-15 04:14:00', 0, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB2','2018-06-16 02:10:10', 0, 1)
-- DB3 has the middle of the 3 rows where Used = 1 but the most recent row is 1 so ignore/exclude all.
INSERT INTO #tbla VALUES ('SQL1', 'DB3','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB3','2018-06-15 04:14:00', 0, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB3','2018-06-16 02:10:10', 1, 1)
-- DB4 shows Used = 0 on the most recent row but Used = 1 on the two preceding rows - therefore only include the most recent row where Used = 0
INSERT INTO #tbla VALUES ('SQL1', 'DB4','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB4','2018-06-15 04:14:00', 1, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB4','2018-06-16 02:10:10', 0, 1)
-- DB5 shows the two most recent rows where Used = 0 so these can be included but the oldest row has Used = 1 so this is to be included
INSERT INTO #tbla VALUES ('SQL1', 'DB5','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB5','2018-06-15 04:14:00', 0, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB5','2018-06-16 02:10:10', 0, 1)
-- DB6, include ALL rows as all are consecutive where Used = 0
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-04-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-05-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-06-01 01:00:00', 0, 20)
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-06-21 05:12:11', 0, 1)
-- DB7, The most recent shows Used = 1 so exlude all rows
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-04-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-05-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-06-01 01:00:00', 0, 20)
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-06-21 05:12:11', 1, 1)
-- DB8, The most recent row has Used = 0 so is to be included but the preceding row has Used = 1 so all preceding rows excluded regardless.
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-04-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-05-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-06-01 01:00:00', 1, 20)
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-06-21 05:12:11', 0, 1) --<< only row to be included
-- select the source data
SELECT * FROM #tbla ORDER BY SQLInstance, DBName, SvcStartDate desc
-- select the max SvcStartDate in each group where Used = 1
-- This query doesn't take into account cases where the Used values in a group are ALL equal to zero. For example, SQL1 and DB2
SELECT
a.SQLInstance,
a.DBName,
Max(a.SvcStartDate) as MaxSvcStartDate,
Sum(a.DaysUpTime) as SumDaysUptime
FROM
#tbla a
INNER JOIN (
SELECT
SQLInstance,
DBName,
max(svcStartDate) as MaxsvcStartDate
FROM
#tbla
WHERE
Used = 1
GROUP BY
SQLInstance,
DBName
) b
ON a.SQLInstance = b.SQLInstance AND a.DBName = b.DBName AND a.SvcStartDate > b.MaxsvcStartDate
WHERE
a.Used = 0
GROUP BY
a.SQLInstance,
a.DBName
ORDER BY
SQLInstance,
DBName
-- result set is not correct as required - Inner Join - The 3 rows returns are correct but missing DB2 and DB6 (ie. DB2 and DB6 have all rows Used = 0)
/*
SQLInstance DBName MaxSvcStartDate SumDaysUptime
SQL1 DB4 2018-06-16 02:10:10.000 1
SQL1 DB5 2018-06-16 02:10:10.000 2
SQL2 DB8 2018-06-21 05:12:11.000 1
*/
-- Required result should be
/*
SQLInstance DBName MaxSvcStartDate SumDaysUptime
SQL1 DB2 2018-06-16 02:10:10.000 6
SQL1 DB4 2018-06-16 02:10:10.000 1
SQL1 DB5 2018-06-16 02:10:10.000 2
SQL2 DB6 2018-06-21 05:12:11.000 81
SQL2 DB8 2018-06-21 05:12:11.000 1
*/
June 19, 2018 at 5:10 am
Updating for anyone interested, if I insert a dummy/seed row in the input table set to USED = 1, DaysUpTime = 0, my query returns the correct results. It's a messy workaround and I would rather not use it.
IF EXISTS (SELECT OBJECT_ID('TEMPDB..#tbla')) DROP TABLE #tbla
GO
CREATE TABLE #tbla
(
SQLInstance varchar(16),
DBName varchar(8),
SvcStartDate datetime,
Used bit,
DaysUptime SMALLINT
)
SET NOCOUNT ON
-- DB1 is all Used = 1 so ignore completely
INSERT INTO #tbla VALUES ('SQL1', 'DB1','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL1', 'DB1','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB1','2018-06-15 04:14:00', 1, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB1','2018-06-16 02:10:10', 1, 1)
-- DB2 is 0 in every case so all rows should be included (summing the last column)
INSERT INTO #tbla VALUES ('SQL1', 'DB2','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL1', 'DB2','2018-06-14 04:14:25', 0, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB2','2018-06-15 04:14:00', 0, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB2','2018-06-16 02:10:10', 0, 1)
-- DB3 has the middle of the 3 rows where Used = 1 but the most recent row is 1 so ignore/exclude all.
INSERT INTO #tbla VALUES ('SQL1', 'DB3','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL1', 'DB3','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB3','2018-06-15 04:14:00', 0, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB3','2018-06-16 02:10:10', 1, 1)
-- DB4 shows Used = 0 on the most recent row but Used = 1 on the two preceding rows - therefore only include the most recent row where Used = 0
INSERT INTO #tbla VALUES ('SQL1', 'DB4','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL1', 'DB4','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB4','2018-06-15 04:14:00', 1, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB4','2018-06-16 02:10:10', 0, 1)
-- DB5 shows the two most recent rows where Used = 0 so these can be included but the oldest row has Used = 1 so this is to be included
INSERT INTO #tbla VALUES ('SQL1', 'DB5','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL1', 'DB5','2018-06-14 04:14:25', 1, 4)
INSERT INTO #tbla VALUES ('SQL1', 'DB5','2018-06-15 04:14:00', 0, 1)
INSERT INTO #tbla VALUES ('SQL1', 'DB5','2018-06-16 02:10:10', 0, 1)
-- DB6, include ALL rows as all are consecutive where Used = 0
INSERT INTO #tbla VALUES ('SQL2', 'DB6','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-04-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-05-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-06-01 01:00:00', 0, 20)
INSERT INTO #tbla VALUES ('SQL2', 'DB6','2018-06-21 05:12:11', 0, 1)
-- DB7, The most recent shows Used = 1 so exlude all rows
INSERT INTO #tbla VALUES ('SQL2', 'DB7','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-04-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-05-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-06-01 01:00:00', 0, 20)
INSERT INTO #tbla VALUES ('SQL2', 'DB7','2018-06-21 05:12:11', 1, 1)
-- DB8, The most recent row has Used = 0 so is to be included but the preceding row has Used = 1 so all preceding rows excluded regardless.
INSERT INTO #tbla VALUES ('SQL2', 'DB8','1753-01-01', 1, 0) --<< Seed row as a workaround, if zero for DaysUptime, doesn't affect correct sum
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-04-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-05-01 01:00:00', 0, 30)
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-06-01 01:00:00', 1, 20)
INSERT INTO #tbla VALUES ('SQL2', 'DB8','2018-06-21 05:12:11', 0, 1) --<< only row to be included
-- select the source data
SELECT * FROM #tbla ORDER BY SQLInstance, DBName, SvcStartDate desc
June 19, 2018 at 5:52 am
SELECT a.SQLInstance,
a.DBName,
Max(a.SvcStartDate) as MaxSvcStartDate,
Sum(a.DaysUpTime) as SumDaysUptime
FROM #tbla a
WHERE NOT EXISTS(SELECT *
FROM #tbla b
WHERE b.Used=1
AND b.SvcStartDate > a.SvcStartDate
AND b.SQLInstance=a.SQLInstance
AND b.DBName = a.DBName)
AND a.Used=0
GROUP BY
a.SQLInstance,
a.DBName
ORDER BY
SQLInstance,
DBName
June 19, 2018 at 6:10 am
Jonathan AC Roberts - Tuesday, June 19, 2018 5:52 AMSELECT a.SQLInstance,
a.DBName,
Max(a.SvcStartDate) as MaxSvcStartDate,
Sum(a.DaysUpTime) as SumDaysUptime
FROM #tbla a
WHERE NOT EXISTS(SELECT *
FROM #tbla b
WHERE b.Used=1
AND b.SvcStartDate > a.SvcStartDate
AND b.SQLInstance=a.SQLInstance
AND b.DBName = a.DBName)
AND a.Used=0
GROUP BY
a.SQLInstance,
a.DBName
ORDER BY
SQLInstance,
DBName
Excellent, thanks. Works perfectly. Well done.
June 19, 2018 at 7:11 am
Jonathan AC Roberts - Tuesday, June 19, 2018 5:52 AM
Excellent, thanks. Works perfectly. Well done.
Thanks, It's nice to see a well presented question with the set up of data and an example of what's required.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply