September 16, 2009 at 8:03 pm
I have a situation described below - let me know if more information is needed.
create table multrecs (
acct bigint,
cdt smalldatetime,
val varchar(20),
ndt smalldatetime
)
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-09')
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-15')
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-22')
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-05-26')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-01-04')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-01-15')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-03-22')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-03-30')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-05-15')
insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-03-04')
insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-04-14')
insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-05-22')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-02-04')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-04-04')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-06-02')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-06-10')
I need to pull the records for each ACCT, CDT, VAL combination
such that I get the record that has minimum NDT and then
the next record whch is 38 days after it and then
the next record which is 38 days after it and so on
so for the first set above I need
702, '2009-04-06', 'tstval1', '2009-04-09' the minimum NDT
(702, '2009-04-06', 'tstval1', '2009-05-26') the one 38 days after it
For the second combo
(802, '2009-01-01', 'tstval2', '2009-01-04') The minimum NDT
(802, '2009-01-01', 'tstval2', '2009-03-22') 38 days after first one
(802, '2009-01-01', 'tstval2', '2009-05-15') 38 days after one above - 03/22/09
For the third combo
I will need all three - as they are 38 days apart
For the fourth
I will need first three
Any help will be greatly appreciated.
September 17, 2009 at 11:07 am
Would you mind sharing what you've tried so far and what you have trouble with?
Btw: strange requirement though.... what's the business reason behind it?
September 17, 2009 at 1:44 pm
SQLMAIN, here is a temp table solution. It works by determining the previous ndt date for each record in the group, then keeping a running sum of the interval between the ndt and the previous ndt. It is powered by Jeff Moden's running sum solution using the "quirky update". Admitedly, I don't fully grasp how it works exactly, so when I am using it, I have to add columns to the temp table for each of the variables I use, and populate the columns with the current value of each variable at various point for each record. That way, I can move stuff around, and see how it affects the value of each variable, and when in the code it does it. Anyhow, I think this works as per your specs, but there are no guarantees of any data beyond that. Also, I tried to to all the updates in one fell swoop, but I could not get it to work.
I can't find the link to Jeff's article, I have it on my work pc, but I am at home. Maybe someone could post it?
IF OBJECT_ID('TempDB..#multrecs','u') IS NOT NULL
DROP TABLE #multrecs
SELECT
ID = IDENTITY(INT,1,1),
acct,
cdt,
val,
ndt
INTO #multrecs
FROM multrecs
ORDER BY acct,cdt,val,ndt
--Add columns for later use
ALTER TABLE #multrecs
ADD MIN_ndt BIT,
PREV_NDT SMALLDATETIME,
INTERVAL INT,
INTERVAL_RUN INT
--Start by setting the MIN_ndt flag for the lowest ndt for each acct
UPDATE #multrecs
SET MIN_ndt = CASE WHEN t1.acct IS NOT NULL THEN 1 ELSE 0 END
FROM #multrecs m LEFT OUTER JOIN
(
SELECT
acct,
cdt,
val,
MIN_ndt = MIN(ndt)
FROM multrecs
GROUP BY acct,cdt,val
) t1
ON m.acct = t1.acct
AND m.cdt = t1.cdt
AND m.val = t1.val
AND m.ndt = t1.MIN_ndt
--Set the prev_ndt field = MIN_ndt for the first of each group
--I am assuming if the ndt is lowest, there are no previous.
--It will be use later for a datediff calculation to check for
--the 38 day interval.
UPDATE m
SET PREV_NDT = ndt
FROM #multrecs m
WHERE MIN_ndt = 1
--Declare the local variables for the updates
DECLARE @acct VARCHAR(20)
DECLARE @cdt SMALLDATETIME
DECLARE @ndt SMALLDATETIME
DECLARE @val VARCHAR(20)
DECLARE @prev_ndt SMALLDATETIME
DECLARE @interval_run INT
--First, populate the prev_ndt field for the rest of the records
UPDATE #multrecs
SET @prev_ndt =
CASE WHEN @acct = acct
AND @cdt = cdt
AND @val = val
AND prev_ndt IS NULL
THEN @ndt
ELSE prev_ndt
END,
prev_ndt = @prev_ndt,
@acct = acct,
@cdt = cdt,
@val = val,
@ndt = ndt
FROM #multrecs
--Now populate the interval field, which will tell you the interval
--between the ndt and the previous ndt
UPDATE m
SET interval = DATEDIFF(d,prev_ndt,ndt),
interval_run = 0
FROM #multrecs m
--Finally, update the interval_run field. It will show 0 for the first record in each
--acct. Then it will keep a running sum of the interval, and reset when it gets
--to be >= 38
UPDATE m
SET @interval_run = interval_run =
CASE WHEN acct = @acct AND @interval_run = 38
ORDER BY acct,cdt,val,ndt
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
September 17, 2009 at 3:18 pm
Here's the link to the article I referenced. http://www.sqlservercentral.com/articles/Advanced+Querying/61716/. However, it seems to be under construction now, but I am fairly certain it described pretty well how the quirky update works. Any how, the test scripts are still there, but be sure to be aware of Jeff's caveats.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
September 17, 2009 at 5:14 pm
Greg Snidow (9/17/2009)
Admitedly, I don't fully grasp how it works exactly
The article is under construction atm, but I've prettymuch got it memorized :hehe:. The whole theory works off of using the clustered index to update the table. You *must* both create and specify this clustered index in the order that the rows should be read. It will usually work fine regardless, but without specifying the index, it *could* fail.
Greg Snidow (9/17/2009)
Also, I tried to to all the updates in one fell swoop, but I could not get it to work.
Here's an example of it all in one pass. (I inserted the data into a temp table #multrecs instead of his default multrecs)
Ugh, this thing murders my formatting :crying:
-------------------- Solution -----------------------------
ALTER TABLE #multrecs ADD Display bit
CREATE CLUSTERED INDEX Quirky ON #multrecs (acct,cdt,val,ndt) -- <----- Note this clustered index. Very important.
DECLARE @PrevAcctbigint,
@PrevCDTdatetime,
@PrevValvarchar(20),
@LDNDTdatetime,--Last Displayed NDT
@Displaybit-- Even though you don't use it, this won't function without the 3 part update
UPDATE #multrecs
SET @Display = Display = CASE WHENAcct = @PrevAcct
AND CDT = @PrevCDT
AND Val = @PrevVal
AND DATEDIFF(d,@LDNDT,NDT) < 38
THEN 0
ELSE 1
END,
@LDNDT = CASE WHENAcct = @PrevAcct
AND CDT = @PrevCDT
AND Val = @PrevVal
AND DATEDIFF(d,@LDNDT,NDT) < 38
THEN @LDNDT
ELSE NDT
END,
@PrevAcct = Acct,
@PrevCDT = CDT,
@PrevVal = Val
FROM #Multrecs WITH (INDEX(0))
SELECT * FROM #MultRecs
WHERE Display = 1
September 17, 2009 at 5:46 pm
Garadin the whole theory works off of using the clustered index to update the table. You *must* both create and specify this clustered index in the order that the rows should be read.
The index! I knew I forgot something crutial to its working properly. I knew I had to specify the order on the insert into the table, but completely forgot about the blooming index. Thanks for the insight.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
September 17, 2009 at 6:11 pm
It's ok. Everybody forgets the index :-P.
I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.
September 17, 2009 at 7:46 pm
Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
September 17, 2009 at 8:10 pm
I did try the solution given above (with clustered index) but somehow it was not working for me.
I am not on my machine right now and do not have exactly what I was trying - but I will post the one I wrote and will appreciate if someone can point out the problem with my SQL.
I finally ended up doing a while loop - picking up MIN date and then the {{next which is 38 days or more after the one picked last for the group until no more left for the group}}
{{}} this statement above made my while loop.
Thanks for the solutions. And I will post my clustered index SQL for your expert perusal.
September 17, 2009 at 9:41 pm
First, apologies for not having the rewrite on the running total article done, yet.
Part of the reason it's not done yet is I've been testing the heck out of some very large examples. In the process of doing that, I've determined that specifying the index hint is absolutely NOT necessary on single table Quirky Updates. You do, of course, need a clustered index for the update to follow and it must use a 3 part update as well as following some other simple rules. Another of those rules is that it must be a single table update. By that I mean, no joins... period.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2009 at 9:47 pm
Garadin (9/17/2009)
It's ok. Everybody forgets the index :-P.I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.
It depends... if it's a running total, even CLR's have a tough time catching the Quirky Update because they have to go through an additional layer of API's where the Quirky Update doesn't. Cursors just don't stand a chance. There are some "data smears" (ie. special type of grouped numbering) that can be resolved using a double row number that will beat the Quirky Update.
I'm actually splitting up the article to cover a lot of that. I may even do a comparison against the CLR method if I can find someone to write a good one... I'm just not a C programmer.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2009 at 9:54 pm
Seth,
I haven't executed your good code but it looks right. My only suggestion would be to add OPTION (MAXDOP 1) to the update to prevent any parallelism which will really mess up the works.
The other thing is, I know I've recommended the used of WITH(INDEX(0)) in the past just to be on the safe side. In the last couple of months of testing, I've convinced myself that it's absolutely not necessary and considering that it will make a 3 second run on a million rows take 28 seconds instead, it's not worth it. I wouldn't use it any more. The important parts are the way you've used both the 3 part updates and the 2 part updates. That's the big key to making it work everytime.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2009 at 9:55 pm
Greg Snidow (9/17/2009)
Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.
[Edit] Blast. [/Edit]
September 17, 2009 at 10:00 pm
Jeff Moden (9/17/2009)
Seth,I haven't executed your good code but it looks right. My only suggestion would be to add OPTION (MAXDOP 1) to the update to prevent any parallelism which will really mess up the works.
The other thing is, I know I've recommended the used of WITH(INDEX(0)) in the past just to be on the safe side. In the last couple of months of testing, I've convinced myself that it's absolutely not necessary and considering that it will make a 3 second run on a million rows take 28 seconds instead, it's not worth it. I wouldn't use it any more. The important parts are the way you've used both the 3 part updates and the 2 part updates. That's the big key to making it work everytime.
Interesting. I had no idea it slowed it down. Ok. I edited out that huge paragraph of reasoning above because you just made it all irrelevant with one sentence =).
I'll start using the MAXDOP option immediately and scratch the index.
September 17, 2009 at 10:06 pm
Jeff Moden (9/17/2009)
Garadin (9/17/2009)
It's ok. Everybody forgets the index :-P.I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.
It depends... if it's a running total, even CLR's have a tough time catching the Quirky Update because they have to go through an additional layer of API's where the Quirky Update doesn't. Cursors just don't stand a chance. There are some "data smears" (ie. special type of grouped numbering) that can be resolved using a double row number that will beat the Quirky Update.
I'm actually splitting up the article to cover a lot of that. I may even do a comparison against the CLR method if I can find someone to write a good one... I'm just not a C programmer.
The CLR examples I've seen only really start to win at a very large number of rows. As I said though, I've only "heard" this, I have zero experience with CLR's.
Cursors stand no chance of beating this of course, but they're not as badly beaten as I'd have expected, and they're light years ahead of most "set based" solutions.
The data smears I'm not familiar with (At least I don't think I am, the double rownumber thing sounds familiar though.) but I'm very interested to find out. I think I've been close to using something very similar in trying to figure out a way to do this in a single select query, but I've just never *quite* gotten it. Every time I come to one of these, my first instinct is to use this quirky update, as it's one of my favorites, but I always try to think about a different way to do it first, as I sometimes find myself applying a single technique to a bunch of things it's not really needed for :cool:.
Viewing 15 posts - 1 through 15 (of 70 total)
You must be logged in to reply to this topic. Login to reply