March 4, 2008 at 7:35 am
psst - Jack? (secret time)
Jeff's solution is referred by MS tech-heads (some of their bloggers) as an "inner loop" technique. Meaning - you have to touch each record one at a time a "accumulate" the values as you pass through them (yes - just like the variable would act in a "real" loop). The @var being used is a mechanism to "pass info" between consecutive records.
And the "consecutive" part is where it gets tricky - and that's where the clustered index comes into play (since you need to control the order of the rows during processing - which is usually a big no-no in set-based processing).
So far - it smells and acts like a "regular" loop with a cursor, right? The difference here seems to be that the row updates don't require individual commits (like a "classic" cursor would). The update for all intents and purposes performs at the same speed as a "regular" update. So - you manage to con SQL Server into giving you the ordering benefits of a cursor, while maintaining the performance of a "traditional" set-based solution.
In other words - it's not exactly set-based, due to the ordering and looping. But it performs at the same level as something good "set-based", so - who really cares that it's technically not?
"Just ignore the man behind the curtain...."
(It actually in my mind shows the way a cursor SHOULD work. The "individual row update" part just kills me - there's no justification why the updates couldn't be kept in memory, and "flushed" all at once. Heck - even procedural languages allow for that, so why doesn't SQL Server?)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 4, 2008 at 7:46 am
Thanks for the explanation Matt. This is basically how I thought it would work, but trying to understand it sets my head spinning because it is so different from how a "normal" update would work.
I just had an interesting thought, how would an update trigger handle the "inner loop", does it see it as a single update or individual updates?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
March 4, 2008 at 7:48 am
Heh... ya beat me to it... thanks, Matt.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2008 at 7:49 am
Jack Corbett (3/4/2008)
Thanks for the explanation Matt. This is basically how I thought it would work, but trying to understand it sets my head spinning because it is so different from how a "normal" update would work.I just had an interesting thought, how would an update trigger handle the "inner loop", does it see it as a single update or individual updates?
It's a single update operation and is treated as such...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 4, 2008 at 7:52 am
That would have been my assumption, but it never hurts to ask.
Thanks,
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
March 5, 2008 at 9:45 am
Here's another approach that I use sometimes. Pack all the data you want into a fixed length string, aggregating on your desired ID then unpack if from the string.
select
Widget = convert(int, substring(AggData, 1, 11)) ,
DateValue = convert(datetime, substring(AggData, 12,24))
from (
select AggData = max(convert(char(11), Widget) + convert(char(24), DateValue, 121))
from MyTable
group by Widget
) t
I didn't actually test this sample since I don't have access to SQL Server at home, but it should work with minimal tweeking.
March 5, 2008 at 10:19 am
Jeff, Matt,
that has just blown my socks off! Have you any links to further reading on this 'inner loop' technique.
March 5, 2008 at 10:24 am
Heh... yeah... I know a really good one 😉
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2008 at 11:13 am
I just got into work to try it, and it looks like I didn't understand the question. Sorry about that.
March 5, 2008 at 11:47 am
OK, the same technigue of packing aggregate into a string, but using a correlated subquery.
select
Widget,
DateValue
from (
select
Widget,
DateValue,
prev=(
select max(convert(varchar, DateValue, 121) + convert(char(11), Widget))
from #mytable t2
where t2.DateValue < t.DateValue
)
from #mytable t
) t
where prev is null
or Widget <> convert(int, substring(prev,24,11))
March 5, 2008 at 11:49 am
Heh... at least you tried. Thanks for the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2008 at 11:51 am
Ralph Curry (3/5/2008)
OK, the same technigue of packing aggregate into a string, but using a correlated subquery.
select
Widget,
DateValue
from (
select
Widget,
DateValue,
prev=(
select max(convert(varchar, DateValue, 121) + convert(char(11), Widget))
from #mytable t2
where t2.DateValue < t.DateValue
)
from #mytable t
) t
where prev is null
or Widget <> convert(int, substring(prev,24,11))
Oh... be careful... that's got a triangular join in it... please see the following for why that can be so very bad...
http://www.sqlservercentral.com/articles/T-SQL/61539/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2008 at 12:25 pm
Thanks, that was a good read. I knew that the correlated subquery caused it to be a RBAR approach, just packaged in a single query in case you are querying a system you don't have DDL rights on. I also assumed that if there was an index on (DateValue, Widget) (maybe DESC) it would do the max in the correlated subquery more efficiently than the brute-force triangle join.
While I have your attention, how about this:
select
identity(int, 1, 1) seq_id,
Widget,
DateValue
into #myNewTable
from #mytable
order by DateValue, Widget
-- create clustered index in (seq_id)
select t.*
from #myNewTable t
left join #myNewTable t2
on t.seq_id = t2.seq_id + 1
where t.seq_id = 1 or t.Widget <> t2.Widget
March 6, 2008 at 4:47 am
Thats one of the better ideas, Ralph. It takes 25 ms duration, 15 ms CPU time, and 214 logical reads.
The update method I showed, as long as it looks, takes 15 ms duration, 15 ms CPU time, 2 writes, and only 155 logical reads.
Either way, though, they're both lightning fast compared to any kind of a loop or correlated subquery...
Nice job.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2008 at 3:18 pm
Jeff, I was finally able to take a look at this, and was able to modify what you provided to generate my results.
Thanks for your help.
Dan
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply