August 20, 2008 at 4:09 pm
I have a query which updates columns in Table1 from max value in Table2 joining on Table1 and Table2 as follows:
UPDATE TAB1
SET TAB1.COL1 = TAB2.COL1,
TAB1.COL2 = TAB2.COL2
FROM TAB1 , TAB2 B
WHERE TAB1.COL3 = TAB2.COL3
AND TAB1.COL4 = TAB2.COL4
AND TAB2.COL1 = ( SELECT MAX(COL1)
FROM TAB2 C
WHERE TAB1.COL3 = C.COL3
AND TAB1.COL4 = C.COL4)
Although query is serving the requirement I have but I suspect there may be a better way to structure this query. Any suggestions?
August 21, 2008 at 6:21 am
What does your execution plan look like? Based on what I can see of the criteria I don't really see a better way. The only thing you might want to do is create a derived table in the from instead of a correlated subquery in the where. Something like this:
[font="Courier New"]UPDATE TAB1
SET TAB1.COL1 = TAB2.COL1,
TAB1.COL2 = TAB2.COL2
FROM
TAB1 JOIN
TAB2 B ON
TAB1.COL3 = TAB2.COL3 AND
TAB1.COL4 = TAB2.COL4 JOIN
(SELECT
MAX(COL1) AS COL1
FROM
TAB2 C
WHERE
TAB1.COL3 = C.COL3 AND
TAB1.COL4 = C.COL4) AS D ON
B.Col1 = D.Col1
[/font]
As always test both, checking the plans and stats.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 21, 2008 at 7:51 am
I think Jack meant something like:
UPDATE T1
SET Col1 = T2.Col1
    ,Col2 = T2.Col2
FROM Tab1 T1
    JOIN Tab2 T2
        ON T1.Col3 = T2.Col3
            AND T1.Col4 = T2.Col4
    JOIN
    (
        SELECT C.Col3, C.Col4, MAX(C.Col1) AS Col1
        FROM Tab2 C
        GROUP BY C.Col3, C.Col4
    ) D
        ON T2.Col3 = D.Col3
            AND T2.Col4 = D.Col4
            AND T2.Col1 = D.Col1
I think, however, that the TSQL UPDATE syntax is dangerous in this instance.
If there are multiple rows of in TAB2 containing the same Col1, Col3 and Col4 values with
different values in Col2 then a random last value of Col2 will be updated.
To get around this, either the value of Col2 needs to be defined:
UPDATE T1
SET Col1 = T2.Col1
    ,Col2 = T2.Col2
FROM Tab1 T1
    JOIN Tab2 T2
        ON T1.Col3 = T2.Col3
            AND T1.Col4 = T2.Col4
    JOIN
    (
        SELECT C.Col3, C.Col4, MAX(C.Col1) AS Col1
        FROM Tab2 C
        GROUP BY C.Col3, C.Col4
    ) D
        ON T2.Col3 = D.Col3
            AND T2.Col4 = D.Col4
            AND T2.Col1 = D.Col1
    JOIN
    (
        SELECT C2.Col3, C2.Col4, C2.Col1, Max(C2.col2) AS Col2
        FROM Tab2 C2
        GROUP BY C2.Col3, C2.Col4, C2.Col1
    ) D2
        ON D.Col3 = D2.Col3
            AND D.Col4 = D2.Col4
            AND D.Col1 = D2.Col1
            AND T2.Col2 = D2.Col2
or the ANSI UPDATE syntax should be used to report an error if there is more than one value of Col2
UPDATE Tab1
SET Col1 =
    (
        SELECT T21.Col1
        FROM Tab2 T21
        WHERE T21.Col3 = Tab1.Col3
            AND T21.Col4 = Tab1.Col4
            AND T21.Col1 =
            (
                SELECT MAX(T211.Col1)
                FROM Tab2 T211
                WHERE T211.Col3 = T21.Col3
                        AND T211.Col4 = T21.Col4
            )
    )
    ,Col2 =
    (
        SELECT T21.Col2
        FROM Tab2 T21
        WHERE T21.Col3 = Tab1.Col3
            AND T21.Col4 = Tab1.Col4
            AND T21.Col1 =
            (
                SELECT MAX(T211.Col1)
                FROM Tab2 T211
                WHERE T211.Col3 = T21.Col3
                        AND T211.Col4 = T21.Col4
            )
    )
WHERE EXISTS
(
    SELECT *
    FROM Tab2 T2
    WHERE T2.Col3 = Tab1.Col3
        AND T2.Col4 = Tab1.Col4
)
If the ANSI syntax looks a bit scary, then you could try a hybrid syntax:
UPDATE T1
SET Col1 = T2.Col1
    ,Col2 =
    (
        SELECT T21.Col2
        FROM Tab2 T21
        WHERE T21.Col3 = D.Col3
            AND T21.Col4 = D.Col4
            AND T21.Col1 = D.Col1
    )
FROM Tab1 T1
    JOIN Tab2 T2
        ON T1.Col3 = T2.Col3
            AND T1.Col4 = T2.Col4
    JOIN
    (
        SELECT C.Col3, C.Col4, MAX(C.Col1) AS Col1
        FROM Tab2 C
        GROUP BY C.Col3, C.Col4
    ) D
        ON T2.Col3 = D.Col3
            AND T2.Col4 = D.Col4
            AND T2.Col1 = D.Col1
August 21, 2008 at 12:47 pm
Thanks for the suggestions and the comment about possible multiple values of Col2 for Tab2. I will try different structures and check which execution plan looks better.
August 21, 2008 at 4:16 pm
The execution plan for my original query and the derived table query is exactly the same with 50%-50% cost. I am sure that if I add the complexity for multiple values for Tab2.Col2 the execution plan will change.
I have following non-clustered indexes on tables, which the execution plan is using:
CREATE INDEX [IX_Tab2_01] ON [dbo].[Tab2]
(
[Col3] ASC,
[Col4] ASC,
[Col1] ASC
)
INCLUDE
(
[Col2]
)
CREATE INDEX [IX_Tab1_01] ON [dbo].[Tab1]
(
[Col3] ASC,
[Col4] ASC
)
August 24, 2008 at 10:13 pm
please help me solve this problem:
ALTER proc [dbo].[S_RptCustProd]
@StartPrd varchar(6),
@EndPrd varchar(6),
@Class varchar(15),
@Tipe varchar(1)
as
if exists(select * from tempdb.dbo.sysobjects where id=object_id('tempdb.dbo.#summary'))
drop table #summary
create table #summary (
CustNo varchar(20) null,
CustName varchar(100) null,
Industry varchar(50) null
)
declare @Start int, @End int, @ColName varchar(50), @StartStr varchar(50), @InvoicePajak varchar(50)
set @Start = left(@StartPrd,4)
set @End = left(@EndPrd,4)
insert into #summary (CustNo, CustName, Industry)
select distinct B.CustNo, B.CustName, B.Industry
from SalesData A
left outer join MsCustomer B on A.CustNo = B.CustNo
where A.InvoiceDate between @StartPrd+'01' and dateadd(dd,-1,dateadd(mm,1,@EndPrd+'01'))
and A.ItemClass like @Class + '%'
and A.CustNo like @Tipe + '%'
while @Start <= @End begin
set @ColName = 'C' + convert(varchar,@Start)
set @StartStr = convert(varchar,@Start)
exec ('alter table #summary add ' + @ColName + ' numeric(17,2) null')
exec ('update #summary set ' + @ColName + ' = isnull(' +
'(select sum(isnull(OriPrice,0)-isnull(OriDisc,0)-isnull(OriOutSource,0)- isnull((select sum(OriPPh) from PPh where InvoicePajak = '''+@InvoicePajak+''' and Period between '''+@StartPrd+''' and '''+@EndPrd+'''),0)) ' +
' from SalesData B ' +
' where #summary.CustName = B.Customer ' +
' and B.InvoiceDate between ''' + @StartPrd + '01'' and dateadd(dd,-1,dateadd(mm,1,''' + @EndPrd + '01'')) ' +
' and year(B.InvoiceDate) = ' + @StartStr + ' ' +
' and B.ItemClass like ''' + @Class + '%'' ' +
' and B.CustNo like '''+ @Tipe + '%''),0) ') /*+
' group by Customer),0) ')*/
set @Start = @Start + 1
end
select * from #summary
i use that procedure but there is an error like this:
Msg 130, Level 15, State 1, Line 1
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
please help me.... thx...
August 24, 2008 at 11:39 pm
KB (8/21/2008)
The execution plan for my original query and the derived table query is exactly the same with 50%-50% cost.
Ummmm.... never trust the "cost" of a batch in the execution plan. Not only is it frequently wrong, but sometimes it's downright misleading. Here's an example of code where one part takes 0% and the other takes 100%... but not in real life. If you actually measure the parts with statistics on, the reverse is true...
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2008-01-01'
;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 5, @DateVal)
)
select @Bitbucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO
--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2008-01-01'
SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))
@Bitbucket = @StartDate-1+t.N
FROM Tally t
ORDER BY N
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
... and if you don't have a Tally table yet, now's the time...
http://www.sqlservercentral.com/articles/TSQL/62867/
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2008 at 11:41 pm
pakaw_man (8/24/2008)
please help me solve this problem:
I'm off to bed just now... but I have a suggestion for you... start your own thread for this problem. It'll get more attention, that way. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply