a very COMPLEX aggregation query

Question

a very COMPLEX aggregation query

Viewing 15 posts - 16 through 30 (of 60 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 77014 More actions · Answer 1

It looks like the first line is a grand total for Civil.

Part of the problem is that you're using T-SQL as a reporting tool and it's not. You'd be much better off using a true reporting tool like SSRS, Crystal Reports, or even Excel. Sure, you need to do some pre-processing in T-SQL, but you shouldn't try to do everything in T-SQL.

First you need to aggregate based on each case in order to determine the length of time it was in each status. Here is a sample of that:

SELECT Case_ID, Legal_Omn, PPL_Area

,MIN(CASE WHEN Review_Status = 'Proposed' THEN Change_Date END) AS Proposed

,MIN(CASE WHEN Review_Status = 'Active' THEN Change_Date END) AS Active

,MIN(CASE WHEN Review_Status = 'Resolved' THEN Change_Date END) AS Resolved

,MIN(CASE WHEN Review_Status = 'Closed' THEN Change_Date END) AS Closed

FROM #Case_Review

GROUP BY Case_ID, Legal_Omn, PPL_Area

From that, you should be able to easily calculate your length of time in each status.

After that, you should ship it off to your reporting tool to do rest. You are going to have to pivot the results to get your counts based on status, but each of the reporting tools has a feature that will do that easily.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

polkadot SSChampion Points: 13664 More actions · Answer 2

What are the numbers in the first row??

The numbers in the first row are for all Civil law matters. Civial is a parent to WA_Tenant type, WA_Traffic and WA_Consumer.

The number 1 in the >60 days is how many total cases took >60 days to resolve. The number 2 in the 30-60 day category is how many cases took 30-60 days to resolve. There are only 3 cases in this table, so that accounts for all of them.

We are not familiar with your business or your needs and your explanation just doesn't make sense to me.

Business background: a legal firm that handles all types of legal matters. These can be Civil, Criminal, and Family Law categories. But in my table I've only included cases relating to CIVIL category, a sampling of which are WA_Tenant type, WA_Traffic and WA_Consumer cases.

When the firm gets called and a case gets opened under the appropriate CIVIL speciality and we want to see how long it is taking for cases to get resolved in general (all CIVIL cases) and *also* specifically (per CIVIL law type).

Are the number in the grids the count of cases that are in that status? So for example you get the count of cases where the datediff between min and max (Change_Date) is less than 30 days and status = 'Proposed'?

Yes! actually hold on not exactly. The datediff between min and max (Change_Date) is less than 30 days between the FIRST Proposed status and the FIRST Active status. That's the transition. The datetime in those first rows representing the transition must be involved in the calculation. You'll notice time will elapse between the last Proposed status and the first Active status of each case_id (that's why it must be calculated by transition). The second transition is between Active and Resolved/Closed status.

So, for case_id 220 the two dates we're interested for the first transition Proposed->Active are:

2012-04-18 17:05:00.000

2012-04-19 12:20:00.000

for Active ->Resolved transition:

2012-04-20 08:05:00.000

2012-04-27 12:21:00.000

Then for the second column it would be datediff of min and max (Change_Date) where status = 'Active'?

Yes!

Where does 'Resolved' come into this? Need some details before we can go any further.

Yes, all cases are eventually resolved. They will forever be resolved at that point (we hope). They will not be reopened so we are not tracking how long cases sit in Resolved status before being reopened...therefore no representation for this transition in the grid.

--Quote me

polkadot SSChampion Points: 13664 More actions · Answer 3

Sean Lange, thanks for showing me how to apply Lyn's advice. now I know.

Here is the desired result as a #temp table.

create table #Result

(

Col1 varchar(50),

Col2 varchar(50),

Col3 varchar(50),

Col4 varchar(50)

)

insert #Result

select '', '<30 days', '30-60 days', '>60 days' union all

select 'Civil', '', '2', '1' union all

select 'WA_Tenant', '', '', '' union all

select 'Proposed', '', '1', '' union all

select 'Active', '1', '', '' union all

select 'WA_Consumer', '', '', '' union all

select 'Proposed', '', '1', '' union all

select 'Active', '', '', '1'

select * from #result;

--Quote me

polkadot SSChampion Points: 13664 More actions · Answer 4

drew allen, I appreciate your point very much. Also, thanks for that query and illustration of what Excel is capable of and how it simplifies matters.

We have not determined which reporting tool we're going to use. The tsql is desired for the data validation process, ie to determine that the data we need is there and corresponds to other backend ways we are currently using for this output. Plus the boss wants T-sql at this point.

That's why I still need to use T-SQL. Mind looking at it as a "programmer"....ignoring the performance at this point? It *is* a datawarehouse that we're pulling from.

--Quote me

Dwain Camps SSC Guru Points: 86978 More actions · Answer 5

Polkadot,

I'm not getting the same counts as your results set (I think I'm looking at the latest), however my counts match a visual check of the input data.

Regardless, I think the below will give you the structure of what you'll need (even if my counts are wrong because I don't fully understand your business logic).

DECLARE @Case_Review TABLE

(Case_ID int, Legal_Omn varchar(30),PPL_Area varchar(30),Revision int

,Review_Status varchar(30),Change_Date datetime);

insert into @Case_Review

SELECT 220, 'Civil', 'WA_Tenant', 1, 'Proposed', '2012-04-17 12:17'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant', 2, 'Proposed', '2012-04-17 14:05'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant',3, 'Proposed', '2012-04-18 17:03'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant',4, 'Proposed', '2012-04-18 17:05'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant', 5, 'Active', '2012-04-19 12:20'

UNION ALL SELECT 220,'Civil', 'WA_Tenant', 6, 'Active', '2012-04-19 12:45'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant',7, 'Active', '2012-04-20 08:05'

UNION ALL SELECT 220,'Civil', 'WA_Tenant', 8, 'Resolved', '2012-05-27 12:21'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant',9, 'Resolved', '2012-05-27 12:45'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant',10, 'Resolved', '2012-05-28 12:00'

UNION ALL SELECT 220, 'Civil', 'WA_Tenant',11, 'Resolved', '2012-05-28 12:01'

UNION ALL SELECT 230, 'Civil', 'WA_Traffic', 1, 'Proposed', '2012-04-17 01:00'

UNION ALL SELECT 230, 'Civil', 'WA_Traffic', 2, 'Proposed', '2012-05-29 07:00'

UNION ALL SELECT 230, 'Civil', 'WA_Traffic', 3, 'Active', '2012-05-29 12:00'

UNION ALL SELECT 230,'Civil', 'WA_Traffic', 4, 'Active', '2012-05-29 12:05'

UNION ALL SELECT 230, 'Civil', 'WA_Traffic', 5, 'Active', '2012-05-29 12:00'

UNION ALL SELECT 230, 'Civil', 'WA_Traffic', 6, 'Closed', '2012-06-01 13:00'

UNION ALL SELECT 230, 'Civil', 'WA_Traffic', 7, 'Closed', '2012-06-01 13:01'

UNION ALL SELECT 260, 'Civil', 'WA_Consumer', 1, 'Proposed', '2012-04-17 14:00'

UNION ALL SELECT 260, 'Civil', 'WA_Consumer', 2, 'Active', '2012-05-20 14:25'

UNION ALL SELECT 260, 'Civil', 'WA_Consumer',3, 'Active', '2012-07-10 15:00'

UNION ALL SELECT 260, 'Civil', 'WA_Consumer',4, 'Closed', '2012-10-20 15:20';

;WITH UniqueCases AS (

SELECT Case_ID, Legal_Omn, PPL_Area, Review_Status

,CASE WHEN Review_Status = 'Active'

THEN DATEDIFF(day, Change_Date

,ISNULL(

(SELECT TOP 1 Change_Date

FROM @Case_Review c2

WHERE c1.Case_ID = c2.Case_ID and c2.Review_Status IN ('Closed', 'Resolved')

ORDER BY Change_Date)

,0)

)

WHEN Review_Status = 'Proposed'

THEN DATEDIFF(day, Change_Date

, ISNULL(

(SELECT TOP 1 Change_Date

FROM @Case_Review c2

WHERE c1.Case_ID = c2.Case_ID and c2.Review_Status = 'Active'

ORDER BY Change_Date)

,0)

)

ELSE 0 END As Days

,ROW_NUMBER() OVER (PARTITION BY Case_ID, PPL_Area, Review_Status

ORDER BY Case_ID, PPL_Area, Review_Status, Revision DESC) As rk

FROM @Case_Review c1)

,Counts AS (

SELECT Case_ID, Legal_Omn, PPL_Area, Review_Status

,CASE WHEN Days <30 AND Days >= 0 THEN 1 ELSE 0 END As [<30]

,CASE WHEN Days >=30 AND Days <= 60 THEN 1 ELSE 0 END As [30-60]

,CASE WHEN Days >60 THEN 1 ELSE 0 END As [>60]

FROM UniqueCases

WHERE rk = 1 and Review_Status NOT IN ('Closed', 'Resolved'))

SELECT PPL_Area, Review_Status, [<30], [30-60], [>60]

FROM (

SELECT PPL_Area, Review_Status, [<30], [30-60], [>60]

FROM (

SELECT PPL_Area, Review_Status

,SUM([<30]) As [<30], SUM([30-60]) As [30-60], SUM([>60]) AS [>60]

FROM Counts

GROUP BY PPL_Area, Review_Status) x1

UNION ALL

SELECT PPL_Area, Review_Status, [<30], [30-60], [>60]

FROM (

SELECT Legal_Omn AS PPL_Area, NULL AS Review_Status

,SUM([<30]) As [<30], SUM([30-60]) As [30-60], SUM([>60]) AS [>60]

FROM Counts WHERE Review_Status = 'Active'

GROUP BY Legal_Omn) x2) x3

ORDER BY CASE PPL_Area

WHEN 'WA_Consumer' THEN 3

WHEN 'WA_Traffic' THEN 2

WHEN 'WA_Tenant' THEN 1

ELSE 0 END

, CASE Review_Status

WHEN 'Active' THEN 2

ELSE 1 END

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

polkadot SSChampion Points: 13664 More actions · Answer 6

Looking at now. U N BELIEVABLE. Will be studying it.

--Quote me

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

Hehe... you like that huh?

Lucky for you I had a little time on my hands this morning.:-)

Allow me to let you in on why the counts may or may not be off. I had to make an assumption that I could not glean from your description - either you weren't clear or I simply was too lazy to go through it all, can't remember.

Basically, because there are duplicate statuses for a case, when looking up the date of the transition (e.g., from Opened to Active), I made the assumption to use the first transition to calculate the duration in the Open state. Then when I eliminate the duplicates, I keep only the initial record for the Open state. That is, assuming that I implemented this logic right.

I am not sure how that fits with your business rules. Check the two subqueries that select out the date based on status for this logic.

BTW. This problem bears a remarkable resemblence to a Financial report called credit aging. I used this same technique to develop the days buckets (columns) when I had to rewrite that query about 4 months ago. This of course has the wrinkle of duplicate records to filter through, but otherwise more or less the same.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 8

Drew, you said:

After that, you should ship it off to your reporting tool to do rest. You are going to have to pivot the results to get your counts based on status, but each of the reporting tools has a feature that will do that easily.

I've seen quite a few cases on the forum where someone with excellent credentials (such as yourself) makes a statement like this to the effect of "let your reporting tool handle the fancy stuff" while the SQL simply provides a result set the report can use to work its magic.

Being obstinatly contrary, as I typically am, I shall venture to disagree with that statement under certain circumstances.

Consider the following. On your team you have a pretty good T-SQL person, but your Crystal Reports team lacks shall we say excellence. Basic reports are OK for them but any advanced features take a long time to get right. In this case, doesn't it make sense to format the data as closely to the report's output as possible, to make up for the lack of super-Crystal Reports programmers?

Let me convey a true example, a war story if you will. We had a fairly complex report in this system that was running like a real dog (not the Credit Aging report I mentioned earlier, another dog). I took a look at the query for the report and in about 4 hours I 1) improved the query plan cost by >50%, 2) reduced the run time signficantly and 3) reduced the row count in the result set from about 80,000 to about 30.

The report now needed to be modified so that instead of counting across 80,000 records, all it had to do was select correctly from the 30 row result set. That modification to the report has now taken more than 4 weeks (granted part time) but countless more hours than the improvement that I made to the query.

So if I seem contrary sometimes, I generally have my reasons that are rooted in the real world.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

drew.allen SSC Guru Points: 77014 More actions · Answer 9

dwain.c (4/19/2012)
Drew, you said:
After that, you should ship it off to your reporting tool to do rest. You are going to have to pivot the results to get your counts based on status, but each of the reporting tools has a feature that will do that easily.
I've seen quite a few cases on the forum where someone with excellent credentials (such as yourself) makes a statement like this to the effect of "let your reporting tool handle the fancy stuff" while the SQL simply provides a result set the report can use to work its magic.
Being obstinatly contrary, as I typically am, I shall venture to disagree with that statement under certain circumstances.

My statement is based on the assumption that your department is fully staffed with reasonably adequate people. Of course, there will always be situations where you are either understaffed or some of your staff is underqualified for their position.

That being said, a pivot report is hardly a complex report. It's essentially drag and drop if the data is prepared correctly. Anyone hired as a report writer should know how to create a pivot report.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

polkadot SSChampion Points: 13664 More actions · Answer 10

My statement is based on the assumption that your department is fully staffed with reasonably adequate people. Of course, there will always be situations where you are either understaffed or some of your staff is underqualified for their position.
That being said, a pivot report is hardly a complex report. It's essentially drag and drop if the data is prepared correctly. Anyone hired as a report writer should know how to create a pivot report.
Drew

Bingo Drew. I am hardly adequate or qualified;-)!!!! Although, I can easily do a pivot. I am thankful that without asking that I justify my need, someone helped me with the T-SQL.

On the other hand, I appreciate the candid feedback about what is customary division of labor and customary delivery of reporting solutions in a typical business environment. You can have no doubt that this information is very useful to me to.

I am learning from the solution and from the discussion that ensued and I am grateful to BOTH. Professional Hug hug.

--Quote me

polkadot SSChampion Points: 13664 More actions · Answer 11

**correction***I was asking dwain if he'd elaborate the 'certain circumstances'?, but I see that I didn't read his whole post following.

I shall venture to disagree with that statement under certain circumstances.

I may have to work on the performance of this query.

--Quote me

drew.allen SSC Guru Points: 77014 More actions · Answer 12

polkadot (4/19/2012)
drew. will you please elaborate the 'certain circumstances'?
I shall venture to disagree with that statement under certain circumstances.

That was a quote of Dwain.C. He mentions some of the circumstances in his post, but you'd have to ask him if you need more details.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

polkadot SSChampion Points: 13664 More actions · Answer 13

(1) dwain the code does almost exactly what it is supposed to. BUT (edit). WA_Traffic Proposed time should actually be 30-60 days and your code places it at <30 days. Any idea? Otherwise it looks good.

(2)

I understand almost all the code EXCEPT the last two SELECT statements.

In the first, you are grouping by PPL_Area, Review_Status) x1 <----what's the x1????

in the second, you are grouping by Legal_Omn) x2) x3 <----what's that???

in the second you are also ordering by CASE PPL_Area

WHEN 'WA_Consumer' THEN 3 <---------?

WHEN 'WA_Traffic' THEN 2 <----------?

WHEN 'WA_Tenant' THEN 1 <------------?

....

WHEN 'Active' THEN 2 <-------------?

What are these lines doing? Thank you all for helping me!

(3)

Can we modify the CASE statement to handle this issue dynamically? PPL_Areas will grow over time and the code needs to be able to report on all areas, as the table grows.

--Quote me

drew.allen SSC Guru Points: 77014 More actions · Answer 14

I'm getting different results from Dwain for one of the records. He has WA_Traffic in a Proposed Status for <30 days, but I'm getting 30-60 days.

Here is how I would approach the problem. NOTE: I don't have SQL 2008 at work, so I am using the old syntax for the ROLLUP operator. I also didn't prettify the data, because it doesn't need to be pretty for data validation (as the stated goal).

You do need to know how to read this. Anywhere there is a NULL in the data, it represents a subtotal on the non-NULL fields.

SELECT Legal_Omn, PPL_Area, Review_Status

,COUNT(CASE WHEN Days < 31 THEN Case_ID END) AS [<30]

,COUNT(CASE WHEN Days BETWEEN 31 AND 60 THEN Case_ID END) AS [30-60]

,COUNT(CASE WHEN Days > 60 THEN Case_ID END) AS [>60]

FROM CTE

CROSS APPLY (

SELECT 'Proposed' AS Review_Status, Proposed AS Days

UNION ALL

SELECT 'Active', Active

) AS rs

GROUP BY Legal_Omn, PPL_Area, Review_Status

WITH ROLLUP

SET STATISTICS IO, TIME OFF

This query scans the table ONCE, whereas Dwain's code scans the table ELEVEN times.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 77014 More actions · Answer 15

dwain.c (4/18/2012)
,ROW_NUMBER() OVER (PARTITION BY Case_ID, PPL_Area, Review_Status ORDER BY Case_ID, PPL_Area, Review_Status, Revision DESC) As rk

I only quoted the relevant section of the code.

By definition, the partitioning fields are the same within a given partition. Ordering by partitioning fields is pointless, because they are necessarily the same.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA