October 13, 2011 at 4:39 am
I have a query regarding Query optimization.[font="Arial"][/font]
Which one of the following is preferred considering the performance of the SQL query executed..??
Method 1
--------
UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id))
UPDATE Table1 SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id))
------------------------
------------------------
UPDATE Table1 SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))
Method 2
--------
UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id)),
SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id)),
------------------------
------------------------
SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))
October 13, 2011 at 5:12 am
The second one would be faster becuase it is single SET based update on the table compared to 10 different update statements.
The syntax in the second query seems to be incorrect.
thanks
sarat ๐
Curious about SQL
October 13, 2011 at 5:16 am
You could try running a test example using the estimated execution plan button and the percentage taken by each method. I can tell you that the single UPDATE affecting several columns took me 36% of the cost of the total batch from a simple test I made while the first method took me the remaining 64%.
Best regards,
Best regards,
Andre Guerreiro Neto
Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
October 13, 2011 at 5:26 am
codebyo (10/13/2011)
You could try running a test example using the estimated execution plan button and the percentage taken by each method. I can tell you that the single UPDATE affecting several columns took me 36% of the cost of the total batch from a simple test I made while the first method took me the remaining 64%.Best regards,
Even with the actual plan, this is completely unreliable. You need both CPU and io and the best way to do that is with actual timings or better still the profiler. If you are unsure of the veracity of this claim, I'll post up a couple of queries - one takes 4%, the other 96% of the total cost when they are run in the same batch. The 4% query however takes 3 times longer to run than the 96% cost query. They're both string-splitters. The 4% query uses a rCTE as a source of rows/numbers which is heavy on io.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 13, 2011 at 5:30 am
I'll post up a couple of queries
It would be beneficial for everyone here. Thanks!
October 13, 2011 at 5:35 am
Chris, whats on your mmind..Which Method is better..first or second..??
October 13, 2011 at 5:35 am
ChrisM@Work(10/13/2011)
Even with the actual plan, this is completely unreliable. You need both CPU and io and the best way to do that is with actual timings or better still the profiler. If you are unsure of the veracity of this claim, I'll post up a couple of queries - one takes 4%, the other 96% of the total cost when they are run in the same batch. The 4% query however takes 3 times longer to run than the 96% cost query. They're both string-splitters. The 4% query uses a rCTE as a source of rows/numbers which is heavy on io.
Oh, yes. I'm aware of that. Thanks for pointing that out. ๐
I was not saying that would be the best solution though.
It was just a quick test and in this particular UPDATE example it worked as expected because both of my tests were using the same columns and exactly the same code except for the single UPDATE vs several UPDATEs.
But I should have stated that more clearly. Thank you for the correction. ๐
Best regards,
Best regards,
Andre Guerreiro Neto
Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
October 13, 2011 at 5:59 am
No problem. This relies heavily on the testing framework which Jeff Moden et al set up for his excellent string splitter article[/url].
Here's the rCTE code:
CREATE FUNCTION [dbo].[DS8K_rCTEa](@pString [varchar](8000), @pDelimiter [char](1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
-- DS8K_rCTEa
WITH x AS (
SELECT
ItemNumber = 1,
pos = 0,
Nextpos = CHARINDEX(',',@pString,1)
UNION ALL
SELECT
ItemNumber = x.ItemNumber + 1,
pos = x.nextpos,
nextpos = ISNULL(NULLIF(CHARINDEX(',',@pString,x.nextpos+1),0),8000)
FROM x
WHERE x.nextpos < 8000 AND x.nextpos+x.pos>0
)
SELECT
ItemNumber,
Item = CASE WHEN pos+nextpos = 0 THEN @pString ELSE SUBSTRING(@pString,pos+1,Nextpos-(pos+1)) END
FROM x
Here's a (hard) tally table version:
ALTER FUNCTION [dbo].[DS8K_Tally1U](@pString [varchar](8000), @pDelimiter [char](1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
-- DS8K_Tally1U (one-based permanent tally table)
SELECT
ItemNumber= CAST(0 AS BIGINT),
Item= LEFT(@pString,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,1),0)-1,8000))
UNION ALL
SELECT
ItemNumber= ROW_NUMBER() OVER(ORDER BY pos),
Item= SUBSTRING(@pString,pos,[len])
FROM (
SELECT
pos= n+1,
[len]= ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,(n+1)),0)-(n+1),8000)
FROM dbo.Tally1 WITH (NOLOCK)
WHERE n <= ISNULL(DATALENGTH(@pString),0)
AND SUBSTRING(@pString,n,1) = @pDelimiter
) x
Here's the code to test the two functions in sequence:
USE tempdb
GO
SELECT * into JBMTest
FROM (
SELECT 0 as SomeID, NULL as SomeValue UNION ALL --1 NULL
SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)
SELECT 2, SPACE(1) UNION ALL --1 b (1 space)
SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)
SELECT 4, ',' UNION ALL --2 b b (both are empty strings)
SELECT 5, '55555' UNION ALL --1 E
SELECT 6, ',55555' UNION ALL --2 b E
SELECT 7, ',55555,' UNION ALL --3 b E b
SELECT 8, '55555,' UNION ALL --2 b B
SELECT 9, '55555,1' UNION ALL --2 E E
SELECT 10, '1,55555' UNION ALL --2 E E
SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E
SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E
SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b
SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b
SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)
SELECT 16, 'This,is,a,test.' --E E E E
) d
SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = QUOTENAME(split.Item,'"')
FROM JBMTest test CROSS APPLY dbo.DS8K_rCTEa(test.SomeValue,',') split
SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = QUOTENAME(split.Item,'"')
FROM JBMTest test CROSS APPLY dbo.DS8K_Tally1U(test.SomeValue,',') split
Try this too, it's fun but not quite legit:
CREATE FUNCTION [dbo].[DS8K_CTE2U](@pString [varchar](8000),@pDelimiter [char](1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
-- DS8K_CTE2U (zero-based row-constructor tally table)
-- When a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order -
-- at least that's the "official" line ;)
SELECT
ItemNumber= CAST(0 AS BIGINT),
Item= CAST(LEFT(@pString, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, 1),0)-1,8000)) AS VARCHAR(8000))
UNION ALL
SELECT
ItemNumber= ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), -- row_number appears to be necessary
Item= SUBSTRING(@pString,n,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,n),0)-n,8000))
FROM (
SELECT n = n+1
FROM (
SELECT TOP (1+ISNULL(DATALENGTH(@pString),0))
n = (n1 + n2 + n3 + n4)
FROM (VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t3 (n3)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t1 (n1)
CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t2 (n2)
CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4 (n4) --
) n
WHERE SUBSTRING(@pString,n,1) = @pDelimiter
) d
Now, to test the duration of these two queries, your best bet is Jeff's test setup which is an attachment at the end of the article. This query yields a good summary of the results:
SELECT
SplitterName,
Min_ = MIN(Duration),
Max_ = MAX(Duration),
Sum_ = SUM(Duration),
Avg_ = AVG(Duration)
FROM (
SELECT *, Placing = ROW_NUMBER() OVER(PARTITION BY NumberOfElements, MaxElementLength ORDER BY Duration)
FROM dbo.TestResults -- 392 rows
) d
GROUP BY SplitterName
ORDER BY SUM(Duration)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 13, 2011 at 6:02 am
Junglee_George (10/13/2011)
Chris, whats on your mmind..Which Method is better..first or second..??
Easy - test them ๐
It's an unrestricted update though, one pass is likely to be better than two or more.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 13, 2011 at 8:09 am
Junglee_George (10/13/2011)
Which one of the following is preferred considering the performance of the SQL query executed..??
Actually, neither one is preferred. The first one is worst--as has already been mentioned--because it performs 10 separate updates as opposed to the 1 update for the second, but both of them are bad, because they both perform 10 separate selects from MainTable. Your query can be rewritten to only perform one select as well.
;
WITH MT AS (
SELECT
Col1 = Count(CASE WHEN Age BETWEEN 1 and 2 THEN [ID] END)
,Col2 = Count(CASE WHEN Age BETWEEN 2 AND 3 THEN [ID] END)
,Col3 = Count(CASE WHEN Age BETWEEN 3 AND 4 THEN [ID] END)
FROM MainTable
WHERE PID = 1
)
UPDATE Table1
SET Col1 = MT.Col1
,Col2 = MT.Col2
,Col3 = MT.Col3
FROM MT
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 13, 2011 at 11:23 pm
Thank You all for the responses...
That provides lots of insight...!!!
Thanks a lot...
October 14, 2011 at 3:18 pm
Actually , difference between 2 or 3 update statements depend on workload exist there and almost ,it could be such big deal for performance
But actually , I totally agree with the coding algorithm of Drew particularly more for big workload + huge data entity where it could be quite significant performance .
But as more simple + fast query , you could run the below:
DECLARE @COL1 INT , @COL2 INT,@COL3 INT
SELECT @COL1 = ISNULL( Count(CASE WHEN AGE BETWEEN 1 and 2 AND PID =1 THEN 1 ELSE NULL END) ,0)FROM TABLE1
SELECT @COL2= ISNULL( Count(CASE WHEN AGe BETWEEN 2 AND 3 AND PID =1 THEN 1 ELSE NULL END),0)FROM TABLE1
SELECT @COL3= ISNULL(Count(CASE WHEN AGe BETWEEN 3 AND 4 AND PID =1 THEN 1 ELSE NULL END),0)FROM TABLE1
update TABLE1 set col1 =@COL1
, COL2=@COL2
, COL3=@COL3
Where more specific parameters filtration of each count scope would be there thereby , it could benefit within huge data entity if you created 3 filtered nonclustered indexes according to the criteria above
[font="Arial Black"]Performance Guard[/font]
[font="Courier New"]Just ask me to get 0 sec for all queries[/font]
October 14, 2011 at 4:10 pm
Junglee_George (10/13/2011)
I have a query regarding Query optimization.[font="Arial"][/font]Which one of the following is preferred considering the performance of the SQL query executed..??
Method 1
--------
UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id))
UPDATE Table1 SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id))
------------------------
------------------------
UPDATE Table1 SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))
Method 2
--------
UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id)),
SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id)),
------------------------
------------------------
SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))
UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id))
UPDATE Table1 SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id))
...
You do realize that MT.Age when it is 2 it will be counted in both Col1 and Col2. The same is true for the other columns where the upper bound of one is equal to the lower bound of the next.
Same is true with method 2 as well.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply