Help writing a SQL

Question

Post reply

Help writing a SQL

SQLMAIN

SSC Eights!

Points: 936
More actions
September 16, 2009 at 8:03 pm

#210933

I have a situation described below - let me know if more information is needed.
create table multrecs (
acct bigint,
cdt smalldatetime,
val varchar(20),
ndt smalldatetime
)
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-09')
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-15')
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-22')
insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-05-26')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-01-04')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-01-15')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-03-22')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-03-30')
insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-05-15')
insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-03-04')
insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-04-14')
insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-05-22')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-02-04')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-04-04')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-06-02')
insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-06-10')
I need to pull the records for each ACCT, CDT, VAL combination
such that I get the record that has minimum NDT and then
the next record whch is 38 days after it and then
the next record which is 38 days after it and so on
so for the first set above I need
702, '2009-04-06', 'tstval1', '2009-04-09' the minimum NDT
(702, '2009-04-06', 'tstval1', '2009-05-26') the one 38 days after it
For the second combo
(802, '2009-01-01', 'tstval2', '2009-01-04') The minimum NDT
(802, '2009-01-01', 'tstval2', '2009-03-22') 38 days after first one
(802, '2009-01-01', 'tstval2', '2009-05-15') 38 days after one above - 03/22/09
For the third combo
I will need all three - as they are 38 days apart
For the fourth
I will need first three
Any help will be greatly appreciated.

Viewing 15 posts - 1 through 15 (of 70 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Would you mind sharing what you've tried so far and what you have trouble with?

Btw: strange requirement though.... what's the business reason behind it?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Greg Snidow SSCoach Points: 16108 More actions · Answer 2

SQLMAIN, here is a temp table solution. It works by determining the previous ndt date for each record in the group, then keeping a running sum of the interval between the ndt and the previous ndt. It is powered by Jeff Moden's running sum solution using the "quirky update". Admitedly, I don't fully grasp how it works exactly, so when I am using it, I have to add columns to the temp table for each of the variables I use, and populate the columns with the current value of each variable at various point for each record. That way, I can move stuff around, and see how it affects the value of each variable, and when in the code it does it. Anyhow, I think this works as per your specs, but there are no guarantees of any data beyond that. Also, I tried to to all the updates in one fell swoop, but I could not get it to work.

I can't find the link to Jeff's article, I have it on my work pc, but I am at home. Maybe someone could post it?

IF OBJECT_ID('TempDB..#multrecs','u') IS NOT NULL

DROP TABLE #multrecs

SELECT

ID = IDENTITY(INT,1,1),

acct,

cdt,

val,

ndt

INTO #multrecs

FROM multrecs

ORDER BY acct,cdt,val,ndt

--Add columns for later use

ALTER TABLE #multrecs

ADD MIN_ndt BIT,

PREV_NDT SMALLDATETIME,

INTERVAL INT,

INTERVAL_RUN INT

--Start by setting the MIN_ndt flag for the lowest ndt for each acct

UPDATE #multrecs

SET MIN_ndt = CASE WHEN t1.acct IS NOT NULL THEN 1 ELSE 0 END

FROM #multrecs m LEFT OUTER JOIN

(

SELECT

acct,

cdt,

val,

MIN_ndt = MIN(ndt)

FROM multrecs

GROUP BY acct,cdt,val

) t1

ON m.acct = t1.acct

AND m.cdt = t1.cdt

AND m.val = t1.val

AND m.ndt = t1.MIN_ndt

--Set the prev_ndt field = MIN_ndt for the first of each group

--I am assuming if the ndt is lowest, there are no previous.

--It will be use later for a datediff calculation to check for

--the 38 day interval.

UPDATE m

SET PREV_NDT = ndt

FROM #multrecs m

WHERE MIN_ndt = 1

--Declare the local variables for the updates

DECLARE @acct VARCHAR(20)

DECLARE @cdt SMALLDATETIME

DECLARE @ndt SMALLDATETIME

DECLARE @val VARCHAR(20)

DECLARE @prev_ndt SMALLDATETIME

DECLARE @interval_run INT

--First, populate the prev_ndt field for the rest of the records

UPDATE #multrecs

SET @prev_ndt =

CASE WHEN @acct = acct

AND @cdt = cdt

AND @val = val

AND prev_ndt IS NULL

THEN @ndt

ELSE prev_ndt

END,

prev_ndt = @prev_ndt,

@acct = acct,

@cdt = cdt,

@val = val,

@ndt = ndt

FROM #multrecs

--Now populate the interval field, which will tell you the interval

--between the ndt and the previous ndt

UPDATE m

SET interval = DATEDIFF(d,prev_ndt,ndt),

interval_run = 0

FROM #multrecs m

--Finally, update the interval_run field. It will show 0 for the first record in each

--acct. Then it will keep a running sum of the interval, and reset when it gets

--to be >= 38

UPDATE m

SET @interval_run = interval_run =

CASE WHEN acct = @acct AND @interval_run = 38

ORDER BY acct,cdt,val,ndt

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.

Greg Snidow SSCoach Points: 16108 More actions · Answer 3

Here's the link to the article I referenced. http://www.sqlservercentral.com/articles/Advanced+Querying/61716/. However, it seems to be under construction now, but I am fairly certain it described pretty well how the quirky update works. Any how, the test scripts are still there, but be sure to be aware of Jeff's caveats.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.

Garadin One Orange Chip Points: 29613 More actions · Answer 4

Greg Snidow (9/17/2009)
Admitedly, I don't fully grasp how it works exactly

The article is under construction atm, but I've prettymuch got it memorized :hehe:. The whole theory works off of using the clustered index to update the table. You *must* both create and specify this clustered index in the order that the rows should be read. It will usually work fine regardless, but without specifying the index, it *could* fail.

Greg Snidow (9/17/2009)
Also, I tried to to all the updates in one fell swoop, but I could not get it to work.

Here's an example of it all in one pass. (I inserted the data into a temp table #multrecs instead of his default multrecs)

Ugh, this thing murders my formatting :crying:

-------------------- Solution -----------------------------

ALTER TABLE #multrecs ADD Display bit

CREATE CLUSTERED INDEX Quirky ON #multrecs (acct,cdt,val,ndt) -- <----- Note this clustered index. Very important.

DECLARE @PrevAcctbigint,

@PrevCDTdatetime,

@PrevValvarchar(20),

@LDNDTdatetime,--Last Displayed NDT

@Displaybit-- Even though you don't use it, this won't function without the 3 part update

UPDATE #multrecs

SET @Display = Display = CASE WHENAcct = @PrevAcct

AND CDT = @PrevCDT

AND Val = @PrevVal

AND DATEDIFF(d,@LDNDT,NDT) < 38

THEN 0

ELSE 1

END,

@LDNDT = CASE WHENAcct = @PrevAcct

AND CDT = @PrevCDT

AND Val = @PrevVal

AND DATEDIFF(d,@LDNDT,NDT) < 38

THEN @LDNDT

ELSE NDT

END,

@PrevAcct = Acct,

@PrevCDT = CDT,

@PrevVal = Val

FROM #Multrecs WITH (INDEX(0))

SELECT * FROM #MultRecs

WHERE Display = 1

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Greg Snidow SSCoach Points: 16108 More actions · Answer 5

Garadin the whole theory works off of using the clustered index to update the table. You *must* both create and specify this clustered index in the order that the rows should be read.

The index! I knew I forgot something crutial to its working properly. I knew I had to specify the order on the insert into the table, but completely forgot about the blooming index. Thanks for the insight.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.

Garadin One Orange Chip Points: 29613 More actions · Answer 6

It's ok. Everybody forgets the index :-P.

I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.

If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Greg Snidow SSCoach Points: 16108 More actions · Answer 7

Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.

SQLMAIN SSC Eights! Points: 936 More actions · Answer 8

I did try the solution given above (with clustered index) but somehow it was not working for me.

I am not on my machine right now and do not have exactly what I was trying - but I will post the one I wrote and will appreciate if someone can point out the problem with my SQL.

I finally ended up doing a while loop - picking up MIN date and then the {{next which is 38 days or more after the one picked last for the group until no more left for the group}}

{{}} this statement above made my while loop.

Thanks for the solutions. And I will post my clustered index SQL for your expert perusal.

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 9

First, apologies for not having the rewrite on the running total article done, yet.

Part of the reason it's not done yet is I've been testing the heck out of some very large examples. In the process of doing that, I've determined that specifying the index hint is absolutely NOT necessary on single table Quirky Updates. You do, of course, need a clustered index for the update to follow and it must use a 3 part update as well as following some other simple rules. Another of those rules is that it must be a single table update. By that I mean, no joins... period.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 10

Garadin (9/17/2009)
It's ok. Everybody forgets the index :-P.
I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

It depends... if it's a running total, even CLR's have a tough time catching the Quirky Update because they have to go through an additional layer of API's where the Quirky Update doesn't. Cursors just don't stand a chance. There are some "data smears" (ie. special type of grouped numbering) that can be resolved using a double row number that will beat the Quirky Update.

I'm actually splitting up the article to cover a lot of that. I may even do a comparison against the CLR method if I can find someone to write a good one... I'm just not a C programmer.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 11

Seth,

I haven't executed your good code but it looks right. My only suggestion would be to add OPTION (MAXDOP 1) to the update to prevent any parallelism which will really mess up the works.

The other thing is, I know I've recommended the used of WITH(INDEX(0)) in the past just to be on the safe side. In the last couple of months of testing, I've convinced myself that it's absolutely not necessary and considering that it will make a 3 second run on a million rows take 28 seconds instead, it's not worth it. I wouldn't use it any more. The important parts are the way you've used both the 3 part updates and the 2 part updates. That's the big key to making it work everytime.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Garadin One Orange Chip Points: 29613 More actions · Answer 12

Greg Snidow (9/17/2009)
Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.

[Edit] Blast. [/Edit]

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 13

Jeff Moden (9/17/2009)
Seth,
I haven't executed your good code but it looks right. My only suggestion would be to add OPTION (MAXDOP 1) to the update to prevent any parallelism which will really mess up the works.
The other thing is, I know I've recommended the used of WITH(INDEX(0)) in the past just to be on the safe side. In the last couple of months of testing, I've convinced myself that it's absolutely not necessary and considering that it will make a 3 second run on a million rows take 28 seconds instead, it's not worth it. I wouldn't use it any more. The important parts are the way you've used both the 3 part updates and the 2 part updates. That's the big key to making it work everytime.

Interesting. I had no idea it slowed it down. Ok. I edited out that huge paragraph of reasoning above because you just made it all irrelevant with one sentence =).

I'll start using the MAXDOP option immediately and scratch the index.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 14

Jeff Moden (9/17/2009)
Garadin (9/17/2009)
It's ok. Everybody forgets the index :-P.
I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.
It depends... if it's a running total, even CLR's have a tough time catching the Quirky Update because they have to go through an additional layer of API's where the Quirky Update doesn't. Cursors just don't stand a chance. There are some "data smears" (ie. special type of grouped numbering) that can be resolved using a double row number that will beat the Quirky Update.
I'm actually splitting up the article to cover a lot of that. I may even do a comparison against the CLR method if I can find someone to write a good one... I'm just not a C programmer.

The CLR examples I've seen only really start to win at a very large number of rows. As I said though, I've only "heard" this, I have zero experience with CLR's.

Cursors stand no chance of beating this of course, but they're not as badly beaten as I'd have expected, and they're light years ahead of most "set based" solutions.

The data smears I'm not familiar with (At least I don't think I am, the double rownumber thing sounds familiar though.) but I'm very interested to find out. I think I've been close to using something very similar in trying to figure out a way to do this in a single select query, but I've just never *quite* gotten it. Every time I come to one of these, my first instinct is to use this quirky update, as it's one of my favorites, but I always try to think about a different way to do it first, as I sometimes find myself applying a single technique to a bunch of things it's not really needed for :cool:.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]