August 5, 2009 at 8:09 pm
aktikt (8/5/2009)
Jeff, thanks for your patronizing complements... Just kidding.
By the way Aktikt, just to be clear... I never patronize... it's either praise or pork chops. No in between. Heh... and I'm not kidding. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2009 at 9:26 pm
Ok... here's the code that actually does the split by month segment instead of by day. The reason this had to be done is that the 11 row example spawned 3,009 internal rows (1 for each day in the range of each row). Just imagine what would happen if we tried that with a large number of rows. :sick:
This new way "only" spawned 105 rows... 1 for each month in the range for each row. Still, for a 12 million row problem, that may still be way too much. This may have to be done by year of StartDate.
As always, the details are in the code...
--===== Outer SELECT does the display conversions and the sorting.
SELECT Year(TotalData.TheMonth) AS Year,
Month(TotalData.TheMonth) AS Month,
TotalData.TotalAmount AS TotalAmount
FROM (--==== Calculate the total amount for each month
SELECT DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0) AS TheMonth,
SUM(
c.Amount/(DATEDIFF(mm,c.StartDate,c.EndDate)+1) --Amount divided by the total number of days
* DATEDIFF(dd, --===== Multiplied by the Number of days in a month segment
CASE --===== Month start date or row start date, which ever is latest
WHEN c.StartDate > DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)
THEN c.StartDate
ELSE DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)
END
,
CASE --===== Month end date or row end date, which ever is earliest
WHEN c.EndDate < DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N),0)-1
THEN c.EndDate
ELSE DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N),0)-1
END
) +1
) AS TotalAmount
FROM #Claims c
INNER JOIN Tally t ON t.N <= DATEDIFF(mm,c.StartDate,c.EndDate)+1 --Intentional/required triangular join
GROUP BY DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)
) AS TotalData
ORDER BY TotalData.TheMonth
________________________________________________________________________________________________________________________
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2009 at 4:05 am
Jeff Moden (8/5/2009)
Nigel will probably beat me to it, though.
Not a chance, I was sleeping while you were working on this.
Nice solution Jeff, I think I just about get it.
I thought that the number of intermediate rows generated may be a bottleneck but couldn't figure out a way to reduce them. Good one!
August 6, 2009 at 7:47 am
Jeff, condolences to you and your friend's family.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
August 6, 2009 at 7:48 am
Jeff,
I have listed the results of running your query against existing Claims data containing
35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:
WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. If this scales linearly then it would take 11 minutes to run. Besides, it turns out that long claims are rather rare and 1 day for claim length is very common. Thus, it should perform even better on this type of data.
Thanks for your help. You too Nigel. 🙂
Great work!
aktikt
SQL Server parse and compile time:
CPU time = 19 ms, elapsed time = 19 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Year Month TotalAmount
----------- ----------- ----------------------
2006 1 447093.718447179
2006 2 401313.289622182
2006 3 440011.516648319
2006 4 419537.0008398
2006 5 466245.122821026
2006 6 450941.492471082
2006 7 426732.216494343
2006 8 434883.843448311
2006 9 436754.21254134
2006 10 464639.761466278
2006 11 488523.837852463
2006 12 462196.566039379
2007 1 483823.293872574
2007 2 502495.751410013
2007 3 580603.76534816
2007 4 1040816.52684632
2007 5 1053798.30161442
2007 6 1029035.60165523
2007 7 977611.117479759
2007 8 843939.072468995
2007 9 835551.49081171
2007 10 864416.608813132
2007 11 856591.992495634
2007 12 871064.222941644
(24 row(s) affected)
SQL Server Execution Times:
CPU time = 7296 ms, elapsed time = 1974 ms.
August 6, 2009 at 7:52 am
Think you might need to take a step back from all the fun code and ask a couple questions.
First, (you already know this but we don't) what kind of 'claims' are you looking at? If medical claims, what date you want to allocate to may depend on the type of services. For example, if Inpatient hospital claims, you probably want to assign everything to the discharge, and therefore would want to use the last date, not split the cost between months. Just to make it fun, most utilization reports (non-cost related) look at inpatient claims on admission, and therefore the number of cases would change the allocation of claims to months.
If you're looking at physician claims, then you are probably right to split cost between months, but would be better off looking at the detail line level rather than the whole claim, as a service won't have a date span for physician services, and you can just sum the dollars.
If you're looking at non-healthcare 'claims', then do considerations like the above apply, where the business pays based on a determination at either the start or end of the service rendered? If so, you'll need to allocate appropriately.
food for thought.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
August 6, 2009 at 8:00 am
aktikt (8/6/2009)
Jeff,I have listed the results of running your query against existing Claims data containing
35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:
WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. aktikt
Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild[/url].
You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate < '1/1/2008'
You would change the above to #Claims.enddate if that was what you wanted to use, etc.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
August 6, 2009 at 12:19 pm
nigel (8/6/2009)
Jeff Moden (8/5/2009)
Nigel will probably beat me to it, though.Not a chance, I was sleeping while you were working on this.
Nice solution Jeff, I think I just about get it.
I thought that the number of intermediate rows generated may be a bottleneck but couldn't figure out a way to reduce them. Good one!
Thanks Nigel. It may still hurl pretty badly on a 12 million row solution... but so would a simple update.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2009 at 12:21 pm
jcrawf02 (8/6/2009)
Jeff, condolences to you and your friend's family.
Thanks. I very much appreciate that. I don't normally air my personal stuff but this one really got to me and I let it slip... he was 2 years younger than me and, from what I can tell, in better physical condition than me.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2009 at 12:24 pm
aktikt (8/6/2009)
Jeff,I have listed the results of running your query against existing Claims data containing
35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:
WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. If this scales linearly then it would take 11 minutes to run. Besides, it turns out that long claims are rather rare and 1 day for claim length is very common. Thus, it should perform even better on this type of data.
Thanks for your help. You too Nigel. 🙂
Great work!
aktikt
SQL Server parse and compile time:
CPU time = 19 ms, elapsed time = 19 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Year Month TotalAmount
----------- ----------- ----------------------
2006 1 447093.718447179
2006 2 401313.289622182
2006 3 440011.516648319
2006 4 419537.0008398
2006 5 466245.122821026
2006 6 450941.492471082
2006 7 426732.216494343
2006 8 434883.843448311
2006 9 436754.21254134
2006 10 464639.761466278
2006 11 488523.837852463
2006 12 462196.566039379
2007 1 483823.293872574
2007 2 502495.751410013
2007 3 580603.76534816
2007 4 1040816.52684632
2007 5 1053798.30161442
2007 6 1029035.60165523
2007 7 977611.117479759
2007 8 843939.072468995
2007 9 835551.49081171
2007 10 864416.608813132
2007 11 856591.992495634
2007 12 871064.222941644
(24 row(s) affected)
SQL Server Execution Times:
CPU time = 7296 ms, elapsed time = 1974 ms.
That's cool... thanks for the performance feedback. How many rows did it actually process?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2009 at 12:27 pm
jcrawf02 (8/6/2009)
aktikt (8/6/2009)
Jeff,I have listed the results of running your query against existing Claims data containing
35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:
WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. aktikt
Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild[/url].
You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate < '1/1/2008'
You would change the above to #Claims.enddate if that was what you wanted to use, etc.
I absolutely agree... as you said, it also depends on where such a limit is placed in the code as to whether or not the entire 12 million rows will still be processed or not.
I also agree that Gail's blogs are some of the best.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 14, 2009 at 4:59 pm
Jcrawf02,
Your comment,
Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild.
You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate 0
THEN ' ' + p.MiddleName + ' '
ELSE ' '
END +
p.lastName as FullName FROM Person
Thanks,
aktikt
August 17, 2009 at 10:21 am
Caveat Emptor - I'm hardly an expert, and would therefore suggest that you continue to read up on the articles and forums here to gain more knowledge. I just happened to see that what you were attempting fit nicely into Gail's explanation.
However, I am happy to play around with SQL statements to make them work better. In your first example, I'm not quite sure what you're trying to accomplish? We're inflating the siteCount by 100 times, and then rounding to what precision? Casting to what datatype?
SELECT Cast(Cast(Round(100 * Sum(SiteCount))))
FROM Sites
In your second, I can see that you're trying to concatenate a full name out of first, middle, lastname fields, and want to avoid the extra space when there is no middlename.
SELECT
p.FirstName +
CASE WHEN Len(p.Middlename) > 0
THEN ' ' + p.MiddleName + ' '
ELSE ' '
END +
p.lastName as FullName FROM Person
I haven't tried this, but my first thought was that coalesce() or isnull() were the way to do this cleanly. If that function is giving you headaches, then your CASE seems to do the job. I've got a situation like this at work, gonna play around and see what I find.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply