January 28, 2008 at 2:04 pm
I need to be able to summarize the data for a report ordered by StaffName allowing for multiple rows per Staff per Year
Each column needs to be ordered in Ascending date order and can be blank if there was no review performed for this row and year
I can filter out the desired years from actual data by " WHERE Year(ReviewDate) >= @iYear " {@iYear = Year(GetDate())-3}
Note: my actual data is a StaffId Foreign Key - I just wrote this using a varchar field for demonstration purposes
Also note: I'm not concerned with the StaffName showing or not showing per row; the report should handle that
Desired Result Set:
***********************************************************
StaffName 2006_Dates 2007_Dates 2008_Dates
-------------------------------------------------------
Blow, Joe 2006-03-23 2007-03-13 2008-01-24
2006-07-18 2007-06-28
2007-12-21
Doe, Jane 2006-05-16 2007-03-12 2008-01-22
2006-11-19 2007-06-11
2007-12-16
Dunno, I 2006-12-10 2007-07-05
2007-12-18
***********************************************************
I could insert all data into a temp table and loop through each row and add/update a row per Staff based on the date.
However, is there any way I can do this using a SET BASED solution rather than a RBAR TEMP TABLE/CURSOR method?
Thank you for any suggestions (or straight denials that it can actually be done).
Here is a sample data script:
Create Table #tTestData(
StaffName varchar(40),
ReviewDate smalldatetime
)
Insert Into #tTestData(StaffName, ReviewDate)
Select 'Blow, Joe','2008-01-24' Union All
Select 'Doe, Jane','2008-01-22' Union All
Select 'Dunno, I', '2007-07-05' Union All
Select 'Blow, Joe','2007-06-28' Union All
Select 'Doe, Jane','2007-06-11' Union All
Select 'Blow, Joe','2007-03-13' Union All
Select 'Blow, Joe','2007-12-21' Union All
Select 'Doe, Jane','2007-12-16' Union All
Select 'Doe, Jane','2007-03-12' Union All
Select 'Dunno, I', '2007-12-18' Union All
Select 'Blow, Joe','2006-03-23' Union All
Select 'Blow, Joe','2006-07-18' Union All
Select 'Doe, Jane','2006-05-16' Union All
Select 'Doe, Jane','2006-11-19' Union All
Select 'Dunno, I', '2006-12-10'
Select StaffName, ReviewDate
From #tTestData
Order By StaffName, ReviewDate
Drop Table #tTestData
January 28, 2008 at 6:35 pm
Yes, as single SQL statement can be written but I would be very concerned about resource utilization (e.g. the SQL would run like a pig).
Please be sure to read Jeff Moden's article titled "Hidden RBAR: Triangular Joins" at http://www.sqlservercentral.com/articles/T-SQL/61539/
This SQL ranks each staff member's review for a specific year and is a hidden RBAR. With SQL Server 2005/2008, better performance can be obtained using the rank window function.
select#tTestData.StaffName
,#tTestData.ReviewDate
,count(*)as ReviewOrder
from#tTestData
join#tTestDataas PriorReview
on PriorReview.StaffName = #tTestData.StaffName
and PriorReview.ReviewDate <= #tTestData.ReviewDate
whereyear(#tTestData.ReviewDate) = @iyear - 2
andyear(PriorReview.ReviewDate) = @iyear - 2
group by #tTestData.StaffName
,#tTestData.ReviewDate
This SQL determines the maximum number of reviews for a staff member in any of the past three years:
select StaffName
,CASE WHEN Year0Cnt > Year1Cnt and Year1Cnt > Year0Cntthen Year0Cnt
when Year1Cnt > Year0Cnt then Year1Cnt
else Year0Cnt
end as StaffReviewCnt
from(
Select StaffName
, SUM( case when year(ReviewDate) = @iyear - 2 then 1 else 0 end ) as Year2Cnt
, SUM( case when year(ReviewDate) = @iyear - 1 then 1 else 0 end ) as Year1Cnt
, SUM( case when year(ReviewDate) = @iyear - 0 then 1 else 0 end ) as Year0Cnt
From #tTestData
group By StaffName
) as StaffYearlyReviews
This SQL enumerates each of the possible ranks up to four in any given year. A auxiliary numbers table would be a better solution - see http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html
select 1 as RowNum union all select 2 union all select 3 union all select 4
Finally, combine all the individual SQL statements:
declare@iYear smallint
set@iyear = 2008
selectStaffReviews.StaffName
,Review2.ReviewDate
,Review1.ReviewDate
,Review0.ReviewDate
from(select StaffName
,CASE WHEN Year0Cnt > Year1Cnt and Year1Cnt > Year0Cntthen Year0Cnt
when Year1Cnt > Year0Cnt then Year1Cnt
else Year0Cnt
end as StaffReviewCnt
from(
Select StaffName
, SUM( case when year(ReviewDate) = @iyear - 2 then 1 else 0 end ) as Year2Cnt
, SUM( case when year(ReviewDate) = @iyear - 1 then 1 else 0 end ) as Year1Cnt
, SUM( case when year(ReviewDate) = @iyear - 0 then 1 else 0 end ) as Year0Cnt
From #tTestData
group By StaffName
) as StaffYearlyReviews
) as StaffReviews
join(select 1 as RowNum union all select 2 union all select 3 union all select 4 )
as ReviewRow
on ReviewRow.RowNum between 1 and StaffReviews.StaffReviewCnt
left outer join
(select#tTestData.StaffName
,#tTestData.ReviewDate
,count(*)as ReviewOrder
from#tTestData
join#tTestDataas PriorReview
on PriorReview.StaffName = #tTestData.StaffName
and PriorReview.ReviewDate <= #tTestData.ReviewDate
whereyear(#tTestData.ReviewDate) = @iyear - 2
andyear(PriorReview.ReviewDate) = @iyear - 2
group by #tTestData.StaffName
,#tTestData.ReviewDate
) as Review2
on Review2.StaffName= StaffReviews.StaffName
and Review2.ReviewOrder = ReviewRow.RowNum
left outer join
(select#tTestData.StaffName
,#tTestData.ReviewDate
,count(*)as ReviewOrder
from#tTestData
join#tTestDataas PriorReview
on PriorReview.StaffName = #tTestData.StaffName
and PriorReview.ReviewDate <= #tTestData.ReviewDate
whereyear(#tTestData.ReviewDate) = @iyear - 1
andyear(PriorReview.ReviewDate) = @iyear - 1
group by #tTestData.StaffName
,#tTestData.ReviewDate
) as Review1
on Review1.StaffName= StaffReviews.StaffName
and Review1.ReviewOrder = ReviewRow.RowNum
left outer join
(select#tTestData.StaffName
,#tTestData.ReviewDate
,count(*)as ReviewOrder
from#tTestData
join#tTestDataas PriorReview
on PriorReview.StaffName = #tTestData.StaffName
and PriorReview.ReviewDate <= #tTestData.ReviewDate
whereyear(#tTestData.ReviewDate) = @iyear - 0
andyear(PriorReview.ReviewDate) = @iyear - 0
group by #tTestData.StaffName
,#tTestData.ReviewDate
) as Review0
on Review0.StaffName= StaffReviews.StaffName
and Review0.ReviewOrder = ReviewRow.RowNum
The result set is
Blow, Joe2006-03-23 00:00:002007-03-13 00:00:002008-01-24 00:00:00
Blow, Joe2006-07-18 00:00:002007-06-28 00:00:00NULL
Blow, JoeNULL2007-12-21 00:00:00NULL
Doe, Jane2006-05-16 00:00:002007-03-12 00:00:002008-01-22 00:00:00
Doe, Jane2006-11-19 00:00:002007-06-11 00:00:00NULL
Doe, JaneNULL2007-12-16 00:00:00NULL
Dunno, I2006-12-10 00:00:002007-07-05 00:00:00NULL
Dunno, INULL2007-12-18 00:00:00NULL
SQL = Scarcely Qualifies as a Language
January 29, 2008 at 8:49 am
Thank you - I thought there had to be a way but couldn't quite get there and wasn't given the time to work it out.
Checking the execution plan I see why you have concerns about it being resource intensive. There are certainly a lot of table scans going on! Works nicely though and well thought out.
I wrote a stored procedure after I posted using a couple of temp tables and single row processing. It's not Set based, but it does work and while there are a lot of queries involved they are all Clustered Index Scans for the most part returning a single row. Might be more maintainable for my replacement if I get hit by the proverbial I.T. bus.
My actual data requirements are for the last 5 years of data per staff member, but thankfully this is a small company so really only talking between 50-60 staff members to report on so it's not like it's a huge server hit. I will give both methods a shot and the one which crosses the finish line first will get the job.
Thank you for taking the time with this it is very much appreciated.
Mike
January 29, 2008 at 9:46 am
I would do it using a user defined function. Some / Many may not consider this a SET based approach.
PLEASE NOTE this example creates a permanat table and user defined function
Create Table tTestData
(
StaffName varchar(40),
ReviewDate smalldatetime
)
Insert Into tTestData(StaffName, ReviewDate)
Select 'Blow, Joe','2008-01-24' Union All
Select 'Doe, Jane','2008-01-22' Union All
Select 'Dunno, I', '2007-07-05' Union All
Select 'Blow, Joe','2007-06-28' Union All
Select 'Doe, Jane','2007-06-11' Union All
Select 'Blow, Joe','2007-03-13' Union All
Select 'Blow, Joe','2007-12-21' Union All
Select 'Doe, Jane','2007-12-16' Union All
Select 'Doe, Jane','2007-03-12' Union All
Select 'Dunno, I', '2007-12-18' Union All
Select 'Blow, Joe','2006-03-23' Union All
Select 'Blow, Joe','2006-07-18' Union All
Select 'Doe, Jane','2006-05-16' Union All
Select 'Doe, Jane','2006-11-19' Union All
Select 'Dunno, I', '2006-12-10'
GO
CREATE FUNCTION dbo.udf_GetDates
(
@pStaffName VARCHAR(40),
@pReviewDate DATETIME
)
RETURNS VARCHAR(2000)
AS
BEGIN
DECLARE @RDates VARCHAR(2000)
SELECT @RDates = COALESCE(@RDates + ', ', '') + CONVERT(VARCHAR, ReviewDate, 101)
FROM
tTestData
WHERE
StaffName = @pStaffName
ANDDATEDIFF(YEAR, ReviewDate, @pReviewDate) = 0
RETURN @RDates
END
GO
SELECTStaffName,
[2006 Reviews] = dbo.udf_GetDates(StaffName, '01/01/2006'),
[2007 Reviews] = dbo.udf_GetDates(StaffName, '01/01/2007'),
[2008 Reviews] = dbo.udf_GetDates(StaffName, '01/01/2008')
FROM
(
Select DISTINCT StaffName
From
tTestData
) A
GO
DROP TABLE tTestData
DROP FUNCTION udf_GetDates
Regards,
gova
January 29, 2008 at 10:28 am
Another interesting approach. Thank you. The data doesn't result in the format I need with this method however. It creates a nice comma-delimited dates per year instead.
Actually though, I wanted to say I like the way you used COALESCE with the @RDates variable for the comma delimiter to handle the first iteration and subsequent appends. I never considered doing it that way. I am going to shamelessly incorporate that into my coding arsenal.
Learn something new every day - right?
Cheers, Mike
January 29, 2008 at 3:05 pm
This returns exactly what you need.
This script uses table Tally which contains sequential integer numbers.
You may create it yourself or search this forum for one of scripts creating it.
[Code]
Create Table tTestData (
StaffName varchar(40),
ReviewDate smalldatetime
)
SET DATEFORMAT YMD
Insert Into tTestData(StaffName, ReviewDate)
Select 'Blow, Joe','2008-01-24' Union All
Select 'Doe, Jane','2008-01-22' Union All
Select 'Dunno, I', '2007-07-05' Union All
Select 'Blow, Joe','2007-06-28' Union All
Select 'Doe, Jane','2007-06-11' Union All
Select 'Blow, Joe','2007-03-13' Union All
Select 'Blow, Joe','2007-12-21' Union All
Select 'Doe, Jane','2007-12-16' Union All
Select 'Doe, Jane','2007-03-12' Union All
Select 'Dunno, I', '2007-12-18' Union All
Select 'Blow, Joe','2006-03-23' Union All
Select 'Blow, Joe','2006-07-18' Union All
Select 'Doe, Jane','2006-05-16' Union All
Select 'Doe, Jane','2006-11-19' Union All
Select 'Dunno, I', '2006-12-10'
GO
CREATE FUNCTION dbo.tReviewsOfYear
(
@ReviewYear SMALLDATETIME
)
RETURNS @reviews TABLE (
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
StaffName VARCHAR(40),
ReviewNo int NULL,
ReviewDate SMALLDATETIME NOT NULL
)
AS
BEGIN
--To make sure we deal with "year only" value
SET @ReviewYear = DATEADD(YY, DATEDIFF(YY, 0, @ReviewYear), 0)
DECLARE @StaffName VARCHAR(200)
DECLARE @Count INT
INSERT INTO @reviews (StaffName, ReviewDate)
SELECT StaffName, ReviewDate
FROM dbo.tTestData
WHERE ReviewDate >= @ReviewYear
ANDReviewDate < DATEADD(YY, 1, @ReviewYear)
ORDER BY StaffName, ReviewDate
-- Sequential update driven by PRIMARY KEY
UPDATE R
SET @Count = ReviewNo = CASE WHEN @StaffName = StaffName THEN @Count + 1 ELSE 1 END,
@StaffName = StaffName
FROM @reviews R
RETURN
END
GO
select CASE WHEN T.N = 1 THEN S.StaffName ELSE '' END,
T.N, Y2006.ReviewDate [2006_Dates], Y2007.ReviewDate [2007_Dates], Y2008.ReviewDate [2008_Dates]
from dbo.tTestData S
INNER JOIN dbo.Tally T ON T.N > 0 AND T.N < 5000
-- If you expect more than 5000 reviews per year per customer increase this number
LEFT HASH JOIN dbo.tReviewsOfYear ('2006-01-01') Y2006 ON S.StaffName = Y2006.StaffName AND T.N = Y2006.ReviewNo
LEFT HASH JOIN dbo.tReviewsOfYear ('2007-01-01') Y2007 ON S.StaffName = Y2007.StaffName AND T.N = Y2007.ReviewNo
LEFT HASH JOIN dbo.tReviewsOfYear ('2008-01-01') Y2008 ON S.StaffName = Y2008.StaffName AND T.N = Y2008.ReviewNo
WHERE Y2006.ReviewNo IS NOT NULL OR Y2007.ReviewNo IS NOT NULL OR Y2008.ReviewNo IS NOT NULL
GROUP BY S.StaffName, T.N, Y2006.ReviewDate , Y2007.ReviewDate , Y2008.ReviewDate
ORDER BY S.StaffName, T.N
[/Code]
_____________
Code for TallyGenerator
January 30, 2008 at 3:07 am
i have a similar problem where setbased solution would speed things up dramatically:
As part of a large report, I have to to calculate various expenses that have incurred.
I achieve this by executing a separate storedproc which updates the temp tables that are created by the Main storedproc.
There are 50 different Expense types with different selection criteria, so Ive created a table to keep all these expenses and there selection criteria.
Ie:
Insert Into fst_Expenses_Template(Descr, Where_SD_GB, Where_TRA)
Select ‘Admin Fees’,’ E3_field = 132 ,’ E1_exp_level1 = 0132’ Union All
Select ‘Broker fees’,’ E3_field = 135 ,’ E1_exp_level1 = 0135’ Union All
Select ‘Legal Fees’,’ E3_field = 139 ,’ E1_exp_level1 = 0139’ Union All
…
To process each expensetype, I repetitively read (in WHILE loop) from this Expenses "template" table and build up a query-string which will Update a separate temp table with the results. (see code below which ive simplified for readability).
Could this be done set-based ?
DECLARE @REP_count TinyInt ,
@intRow TinyInt,
@commonWhere_SD_GB Varchar (60),
@commonWhere_TRA Varchar (60),
@descr Varchar(50),
@sql Varchar(5000)
INSERT INTO #EXPENSE_Tbl
SELECT *
FROM fst_Expenses_Template
SET @REP_count = @@ROWCOUNT
SET @intRow = 1
/*-----------------------------------
Loop for each of the 45 expense types
-------------------------------------*/
WHILE @intRow <= @REP_count
BEGIN
SELECT
@descr = Descr,
@commonWhere_SD_GB = Where_SD_GB,
@commonWhere_TRA = Where_TRA
FROM
fst_Expenses_Template --#EXPENSE_Tbl
WHERE
Num = @intRow
SET @sql='
UPDATE #EXPENSE_Tbl
SET
SD_GB_End =
(
SELECT SUM(gb_field_end)
FROM
fst_tbl_SD
WHERE
' + @commonWhere_SD_GB + '
),
SD_GB_Start =
(
SELECT SUM(gb_field)
FROM
fst_tbl_SD
WHERE
' + @commonWhere_SD_GB + '
),
TRA_G9 =
(
SELECT SUM(g9_income_base)
FROM
fst_tbl_Transactions
WHERE
' + @commonWhere_TRA + '
),
TRA_H5 =
(
SELECT SUM(h5_Income_FX_gl)
FROM
fst_tbl_Transactions
WHERE
' + @commonWhere_TRA + '
),
TRA_H77 =
(
SELECT SUM(h77_gls)
FROM
fst_tbl_Transactions
WHERE
' + @commonWhere_TRA + '
)
WHERE
#EXPENSE_Tbl.Descr = ''' + @descr + '''
'
Exec (@SQL)
SET @intRow= @intRow + 1
END
/*--------------------
Return the results
-------------------- -*/
.
January 30, 2008 at 8:30 am
Ok, Sergiy wins. Lightning fast and easily maintainable if required.
I had to look at that function line a few times ... SET @Count = ReviewNo = CASE ...
to figure out exactly what he did though. And once again I've never considered doing that.
I'm learning just how much I don't know I guess!
Thank you Sergiy
January 30, 2008 at 2:22 pm
You welcome.
This UPDATE has 2 other imprescriptible parts outside of the sattement itself:
[Code]ORDER BY StaffName, ReviewDate[/Code]
which inserts rows in the table in specific order and
[Code]ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,[/Code]
which holds the rows in the order they've been inserted.
If you want to perform the same trick on a static table you need to have specifically crafted index and use WITH INDEX hint.
_____________
Code for TallyGenerator
January 31, 2008 at 7:56 am
Why use UPDATE at all? Or use a function? This is possible to do in a single query.
And still have some options to include or exclude "non-valid" members!
SELECTCASE WHEN u.theIndex = 0 THEN u.StaffName ELSE '' END AS StaffName,
MAX(CASE WHEN u.ReviewDate LIKE '2006%' THEN u.ReviewDate ELSE '' END) AS [2006_Dates],
MAX(CASE WHEN u.ReviewDate LIKE '2007%' THEN u.ReviewDate ELSE '' END) AS [2007_Dates],
MAX(CASE WHEN u.ReviewDate LIKE '2008%' THEN u.ReviewDate ELSE '' END) AS [2008_Dates]
FROM(
SELECTt1.StaffName,
CONVERT(CHAR(10), t1.ReviewDate, 120) AS ReviewDate,
(
SELECTCOUNT(*)
FROM#TestData AS t2
WHEREt2.StaffName = t1.StaffName
AND YEAR(t2.ReviewDate) = YEAR(t1.ReviewDate)
AND t2.ReviewDate < t1.ReviewDate
AND t2.ReviewDate >= '20060101'
AND t2.ReviewDate < '20090101'
) AS theIndex
FROM#TestData AS t1
/* Remove this comment to only display staff who has dates in valid range
WHEREt1.ReviewDate >= '20060101'
AND t1.ReviewDate < '20090101'
*/
) AS u
GROUP BYu.StaffName,
u.theIndex
ORDER BYu.StaffName,
u.theIndexYou have also the option to change AND t2.ReviewDate < t1.ReviewDate to AND t2.ReviewDate > t1.ReviewDate if you want the dates sorted ascending instead.
N 56°04'39.16"
E 12°55'05.25"
January 31, 2008 at 8:24 am
This is how you do it in SQL Server 2005 with all members includedSELECTCASE WHEN u.RecID = 1 THEN u.StaffName ELSE '' END AS StaffName,
MAX(CASE WHEN u.ReviewDate LIKE '2006%' THEN u.ReviewDate ELSE '' END) AS [2006_Dates],
MAX(CASE WHEN u.ReviewDate LIKE '2007%' THEN u.ReviewDate ELSE '' END) AS [2007_Dates],
MAX(CASE WHEN u.ReviewDate LIKE '2008%' THEN u.ReviewDate ELSE '' END) AS [2008_Dates]
FROM(
SELECTStaffName,
CASE
WHEN ReviewDate >= '20060101' AND ReviewDate < '20090101' THEN CONVERT(CHAR(10), ReviewDate, 120)
ELSE ''
END AS ReviewDate,
ROW_NUMBER() OVER (PARTITION BY StaffName, YEAR(ReviewDate) ORDER BY ReviewDate) AS RecID
FROM#TestData
) AS u
GROUP BYu.StaffName,
u.RecID
ORDER BYu.StaffName,
u.RecID
N 56°04'39.16"
E 12°55'05.25"
January 31, 2008 at 8:25 am
And this how you do it in SQL Server 2005 with only valid member (member has a reviewdate in wanted range)SELECTCASE WHEN u.RecID = 1 THEN u.StaffName ELSE '' END AS StaffName,
MAX(CASE WHEN u.ReviewDate LIKE '2006%' THEN u.ReviewDate ELSE '' END) AS [2006_Dates],
MAX(CASE WHEN u.ReviewDate LIKE '2007%' THEN u.ReviewDate ELSE '' END) AS [2007_Dates],
MAX(CASE WHEN u.ReviewDate LIKE '2008%' THEN u.ReviewDate ELSE '' END) AS [2008_Dates]
FROM(
SELECTStaffName,
CONVERT(CHAR(10), ReviewDate, 120) AS ReviewDate,
ROW_NUMBER() OVER (PARTITION BY StaffName, YEAR(ReviewDate) ORDER BY ReviewDate) AS RecID
FROM#TestData
WHEREReviewDate >= '20060101'
AND ReviewDate < '20090101'
) AS u
GROUP BYu.StaffName,
u.RecID
ORDER BYu.StaffName,
u.RecID
N 56°04'39.16"
E 12°55'05.25"
January 31, 2008 at 8:35 am
Yet another method. Thanks Peter.
The output results aren't exactly what I need (but I think you're editing the post as I type this because I've tried 2 different query posts of yours with different results)
I'm still leaning towards Sergiy's methods here for a couple of reasons:
a) It results in less table scans in the execution plan.
b) I've already modified for my actual needs (5 years), written the report, and put it in production. :o)
What I've really learned here is how many ways something like this can be achieved without resorting to procedural sql code. This is probably the biggest obstacle for a longtime coder like myself who is trying to improve SQL performance. Must-Not-Loop!
Thanks again all, Mike
January 31, 2008 at 8:44 am
What? I have only two table scans. And if you index the #TestData table properly you will get two clustered index scans only!
Peso with 32 reads
|--Compute Scalar
|--Stream Aggregate
|--Compute Scala
|--Sort
|--Nested Loops
|--Compute Scalar
| |--Clustered Index Scan
|--Compute Scalar
|--Stream Aggregate
|--Clustered Index Scan
Sergiy with 4385 reads
|--Compute Scalar
|--Sequence
|--Table-valued function
|--Table-valued function
|--Table-valued function
|--Sort
|--Filter
|--Hash Match
|--Hash Match
| |--Hash Match
| | |--Nested Loops
| | | |--Clustered Index Scan
| | | |--Table Spool
| | | |--Index Seek
| | |--Clustered Index Scan
| |--Clustered Index Scan
|--Clustered Index Scan
|--Clustered Index Insert
|--Compute Scalar
|--Top
|--Clustered Index Scan
|--Clustered Index Update
|--Compute Scalar
|--Top
|--Clustered Index Scan
That TABLE SPOOL looks nasty to me. And 10 clustered index scans?
N 56°04'39.16"
E 12°55'05.25"
January 31, 2008 at 8:51 am
Whoops...Sorry, forgot to index my test data :blush:
Viewing 15 posts - 1 through 15 (of 38 total)
You must be logged in to reply to this topic. Login to reply