January 5, 2009 at 4:19 pm
TheSQLGuru (1/5/2009)
1) I see people do this regulary just out of habit and NOT use the ID column or use it in such a way that indexing isn't useful. They waste the overhead of creating the clustered index for nothing.
Truly. I think I'm using as intended here though!
TheSQLGuru (1/5/2009)
2) regarding a later comment you made about really liking table variables, in many instances they perform SIGNIFICANTLY worse than temp tables. I probably made $25-30K last year identifying and fixing this one issue for my clients. The inability to have statistics causes the optimizer to have HORIBLE plans from a performance perspective. On some few occassions the lack of index capability makes repeated table variable hits extraordinarily expensive too.
I didn't say they were better than temp tables - I just said being able to index them made them about 80x more useful for me. I would whole heartedly agree that temp tables can provide much more predictable results in high volume circumstances, but I wouldn't put people off table variables completely. They can be significantly useful in balancing load between cpu and io, and, for many small rowset tasks, will perform equitably, if indexed properly.
See my previous comment about rallying your dad's car. (not your dad specifically mind!! 🙂 )
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 5, 2009 at 6:22 pm
TheSQLGuru (1/5/2009)
1)I see people do this regulary just out of habit and NOT use the ID column or use it in such a way that indexing isn't useful. They waste the overhead of creating the clustered index for nothing.2) regarding a later comment you made about really liking table variables, in many instances they perform SIGNIFICANTLY worse than temp tables. I probably made $25-30K last year identifying and fixing this one issue for my clients. The inability to have statistics causes the optimizer to have HORIBLE plans from a performance perspective. On some few occassions the lack of index capability makes repeated table variable hits extraordinarily expensive too.
Well, the ID column is used correctly, but I saw no reason to print that part of the Stored Procedure. In this section, the split is based upon many other Business Rules and the use of a multiple key is both correct and necessary.
We have spent a great deal of time validating the use of @TableVariables where applicable. They have increased the speed of efficiency of many of our Stored Procedures. This required significant testing and validation - so I am most confident they are correct where used.
As a generalization, they work very, very well in most of our instances. We do not generally require anything like terabytes of data in our common Stored Procedures, hence they can handle the load quite well. I look forward to being able to use the ID column in an index!
The major issue I have seen with @TableVariables is they disappear too quickly when testing and the data must be loaded again if further information is necessary.
I wasn't born stupid - I had to study.
January 5, 2009 at 6:47 pm
TheSQLGuru (1/4/2009)
Jeff Moden (1/3/2009)
jacroberts (1/3/2009)
This method sounds like a bit of a dark art.Oh, it absolutely is... one that is condemned by many "experts" who have yet to get it to break. But one extra pinch of "Sand People Toe Nail" or leave out even 1 hair of a Jedi eye-lid, and it will burn you. However, do it all right and absolutely nothing can touch it for performance and it'll never break in the face of changing data.
This is an incorrect statement.
Partitioned tables CAN cause this UPDATE @var... 'trick' to give incorrect values, so there are cases where it will break. See here, the 2/1/2008 3:08am post from ALZDBA: http://www.sqlservercentral.com/Forums/Topic449802-203-2.aspx
Parallel Operations CAN cause incorrect results too:
http://www.sqlservercentral.com/Forums/Topic449802-203-3.aspx 2/6/2008 2:55pm post.
I proposed those as possible points of issue, as well as Enterprise Edition merry-go-round scans as well. Testing proved the first two concerns but the EE advanced scan one didn't come up as a 'give bad results' point but that still doesn't mean it won't break the query. Not making something happen in testing of this nature does NOT guarantee the assertion. Only documentation from Microsoft can do that.
Bottom line: DO NOT USE THIS UPDATE @VAR ... TRICK UNLESS YOU a) DON'T CARE ABOUT GETTING INCORRECT RESULTS OR b) YOU ARE VERY KNOWLEDGEABLE ABOUT YOUR SYSTEM, DATA AND THE PITFALLS OF THIS TRICK AND HOW TO HOPEFULLY WORK AROUND THEM.
I highly encourage all who read this thread to FULLY read that entire thread. It was one of the biggest ever on this site I think and contains a wealth of information.
No, it's absolutely a correct statement. There are, as you and I both mentioned, caveats to it's usage. But, once you have one of these babies working correctly, it will not break.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 5, 2009 at 7:05 pm
I've gotta agree with Kevin on the table variable thing. The only reason I'll ever use table variables is if I have to make a temporary storage area to work out a problem in a function and that's only because a temp table can't be used in a function. As he said, in most of the cases I've ever worked with, just the act of shifting a problem from a table variable to a temp table can cause substantial gains in performance. So does not using functions, in many cases...:P
But, that's not the only reason... in Query Analyzer and SMS, it's much easier to troubleshoot something with temp tables because they persist in the session. You can "see" and "play" with the contents of temp tables very easily because you don't have to run all the code to build it an populate it every bloody time you want to see something like you do with table variables. It's a shame that scalar variables don't persist the same way... troubleshooting would be "point-in-time" easy.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 5, 2009 at 7:18 pm
TheSQLGuru (1/4/2009)
Bottom line: DO NOT USE THIS UPDATE @VAR ... TRICK UNLESS YOU a) DON'T CARE ABOUT GETTING INCORRECT RESULTS OR b) YOU ARE VERY KNOWLEDGEABLE ABOUT YOUR SYSTEM, DATA AND THE PITFALLS OF THIS TRICK AND HOW TO HOPEFULLY WORK AROUND THEM.
Actually, despite what this sounds like, it's 100% spot on. The "pseudo-cursor" or "quirky update" has some very specific rules to follow pretty much as I laid out in the article. At the time I wrote the article, I didn't test against a partitioned table and Johan was cool enough to point it out bigtime. It's been a while mostly because of a wonderful fairly new job, but I need to do some testing against partitioned tables and see what happens under different scenarios. The concerns over parallelism are also noted and the use of MAXDOP, although probably not needed because of the type of scan we're forcing, will probably make that a non issue.
But, regardless of all that, I said it early with a bit of tongue-in-cheek humor and, now, Kevin has just hit everyone in the head with a bat and everyone really needs to pay attention...
IF YOU DO IT WRONG OR LEAVE EVEN A MINOR DETAIL OUT, IT WILL CREATE INCORRECT DATA. It's an advanced technique where all of the rules MUST be CORRECTLY followed every time without exception.
Heh... on the other hand, isn't that true of all code? 😉
Be careful, folks. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
January 5, 2009 at 7:25 pm
david.rowland (1/4/2009)
I think they {cursors} still have their place
Ok, I'm all for opposing opinions because that's what makes good discussions and I learn something from every single discussion.
Now, for the purposes of discussion and, hopefully, without a whole bunch of folks doing the "flame on" thingy, you mentioned that you work with tables that have millions of rows... what do YOU use cursors for in such an environment and why do you think a cursor is a better choice for those things?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 5, 2009 at 7:32 pm
Matt Whitfield (1/5/2009)
not afaik but Jeff will correct me if there is a way to do it.You can create a unique constraint across multiple columns, but not a clustered index over multiples...
One question though - in the provided data, ID was unique - is that not generically the case?
A clustered index and a PK, clustered or not, can both have multiple columns. As you well know (just saying it for folks who may not), you can only have 1 clustered index on a table at a time because a clustered index lives in the data, causes the data to be sorted (not reliable for SELECTs, folks... used ORDER BY on those), and, since the data can only be sorted one way at a time, means that you can only have 1 clustered index on table at a time. Again, it can be a multi-column index.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 5, 2009 at 7:52 pm
There is definitely one situation where table vars help - high-volumn sproc calls where recompiles kill performance. Those are pretty rare situations but they do exist.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 5, 2009 at 8:04 pm
TheSQLGuru (1/5/2009)
There is definitely one situation where table vars help - high-volumn sproc calls where recompiles kill performance. Those are pretty rare situations but they do exist.
I am really surprised by these replies to @TableVariables. They have substantially improved performance for many of our calls - including the use of functions and other manipulations. Maybe it is a situation of rows within the tables? We generally have smaller datasets of a few thousand records.
From my time here in the past, I am sure you'se guys are correct in your findings - it just surprises me how very different those findings are. I wrote a number of the test cases, ( seems to be a lost art for many folks 😉 ) and have seen the dramatic increase in speed. Hence, I do find this interesting that you'se guys are coming up with just the opposite.
Now - to the Clustered multiple key index on a @TableVariable. Because of the smaller size, this does not have to be Clustered - only one needed in my case. Can you give me some psuedo-code as to how to write a multiple key index?
Thanks
I wasn't born stupid - I had to study.
January 6, 2009 at 3:10 am
Jeff Moden (1/5/2009)
A clustered index and a PK, clustered or not, can both have multiple columns. As you well know (just saying it for folks who may not), you can only have 1 clustered index on a table at a time because a clustered index lives in the data, causes the data to be sorted (not reliable for SELECTs, folks... used ORDER BY on those), and, since the data can only be sorted one way at a time, means that you can only have 1 clustered index on table at a time. Again, it can be a multi-column index.
Sorry Jeff - wasn't clear in what I said. What I meant to say is that I don't know a way that you can create a non-unique clustered index over multiple columns... Do you know a way?
edit -> p.s. did you get my mail by the way? not sure if you received it or not?
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 6, 2009 at 3:49 am
Jeff Moden (1/5/2009)
I've gotta agree with Kevin on the table variable thing. The only reason I'll ever use table variables is if I have to make a temporary storage area to work out a problem in a function and that's only because a temp table can't be used in a function. As he said, in most of the cases I've ever worked with, just the act of shifting a problem from a table variable to a temp table can cause substantial gains in performance. So does not using functions, in many cases...:P
edit -> whoops! completely forgot the text. I have seen many times where table variables are abused, and this lowers performance to an unacceptable level. BUT... used correctly they can increase speed and be particularly useful in getting more oomph out of a server that may be already busting it's balls on disk i/o. I don't think it's valid to dismiss them out of hand.
For example - it took me all of 2 minutes to find an example where a table variable went faster than a temp table... the following code is a particularly mundane example, but it does involve 2^20 rows being inserted into a table.
This will probably run like a dog on your box though! 😀
23.921 seconds:
create TABLE #temptable (id int identity (1,1) primary key clustered, val1 int, val2 int, val3 int)
INSERT INTO #temptable (val1,val2,val3)
values (1,2,3)
declare @x int
set @x = 0
WHILE @x < 20
BEGIN
INSERT INTO #temptable (val1,val2,val3)
select val1,val2,val3 from #temptable
set @x = @x + 1
END
drop TABLE #temptable
16.031 seconds:
DECLARE @temptable table (id int identity (1,1) primary key clustered, val1 int, val2 int, val3 int)
INSERT INTO @temptable (val1,val2,val3)
values (1,2,3)
declare @x int
set @x = 0
WHILE @x < 20
BEGIN
INSERT INTO @temptable (val1,val2,val3)
select val1,val2,val3 from @temptable
set @x = @x + 1
END
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 6, 2009 at 5:19 am
Matt Whitfield (1/6/2009)
Sorry Jeff - wasn't clear in what I said. What I meant to say is that I don't know a way that you can create a non-unique clustered index over multiple columns... Do you know a way?edit -> p.s. did you get my mail by the way? not sure if you received it or not?
It's pretty much what you would expect: for example
Create clustered INDEX IX_C_Fred on Fred ( ColA asc, COlB desc, Colc)
where Fred is a table or a view.
It can't be done though for table variables, since the only indexes that can be defined are those caused by UNIQUE or PRIMARY KEY constraints, so they are inherently unique not non-unique.
Tom
Tom
January 6, 2009 at 6:05 am
Tom.Thomson (1/6/2009)
It can't be done though for table variables, since the only indexes that can be defined are those caused by UNIQUE or PRIMARY KEY constraints, so they are inherently unique not non-unique.
Yep - that's pretty much what I was asking in the first place, just didn't word it correctly. I was specifically asking about creating non-unique clustered indexes over multiple columns on a table variable.
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 6, 2009 at 6:35 am
Matt Whitfield (1/6/2009)
Jeff Moden (1/5/2009)
A clustered index and a PK, clustered or not, can both have multiple columns. As you well know (just saying it for folks who may not), you can only have 1 clustered index on a table at a time because a clustered index lives in the data, causes the data to be sorted (not reliable for SELECTs, folks... used ORDER BY on those), and, since the data can only be sorted one way at a time, means that you can only have 1 clustered index on table at a time. Again, it can be a multi-column index.Sorry Jeff - wasn't clear in what I said. What I meant to say is that I don't know a way that you can create a non-unique clustered index over multiple columns... Do you know a way?
edit -> p.s. did you get my mail by the way? not sure if you received it or not?
Ok... sorry. Understood.
No, I didn't get your email. I have a pretty agressive spam filter and it sometimes eats the wrong things. Send me a couple of test emails with the words "SQL Test" in the header and lemme see if I can train the bugger. Thanks, Matt.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 6, 2009 at 6:39 am
Jeff Moden (1/6/2009)
Ok... sorry. Understood.No, I didn't get your email. I have a pretty agressive spam filter and it sometimes eats the wrong things. Send me a couple of test emails with the words "SQL Test" in the header and lemme see if I can train the bugger. Thanks, Matt.
No need to apologise - was my dumb wording. That will serve me for posting up when i should really be in bed!
Emails on the way 🙂
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
Viewing 15 posts - 181 through 195 (of 272 total)
You must be logged in to reply to this topic. Login to reply