June 9, 2011 at 12:56 pm
Hi All,
Thanks in advance.
I have a situation where I'm selecting data into a table and needing to update fields from another table. Currently, I am doing multiple update statements and using a join to filter data.
Rather than issue multiple updates, I was wanting to use a CTE.
Here's an excerpt of update(s):
SELECT Data Into Temp Table
UPDATE
pa
SET
Company = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = '01'
UPDATE
pa
SET
BusinessUnit = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = '02'
UPDATE
pa
SET
Department = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = '03'
And so on, there's six or seven of these. Rather than have multiple updates, I would like to do using CTE.
I know this will work:
WITH GLA_Segment_Update (GLId,SegmentQualifier,SegmentValue) AS
(
SELECT
GLId,SegmentQualifier,SegmentValue
FROM
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
WHERE
gla.SegmentQualifier = '01'
)
UPDATE
#tmpPaidAdvice
SET
#tmpPaidAdvice.Company = GLA_Segment_Update.SegmentValue
WHERE
#tmpPaidAdvice.GLId = GLA_Segment_Update.GLId
WITH GLA_Segment_Update (GLId,SegmentQualifier,SegmentValue) AS
(
SELECT
GLId,SegmentQualifier,SegmentValue
FROM
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
WHERE
gla.SegmentQualifier = '02'
)
UPDATE
#tmpPaidAdvice
SET
#tmpPaidAdvice.BusinessUnit = GLA_Segment_Update.SegmentValue
WHERE
#tmpPaidAdvice.GLId = GLA_Segment_Update.GLId
And so on for each field to update.
I would really like to do the updates in one fell swoop. How to do?
Will this work?
WITH GLA_Segment_Update (GLId,SegmentQualifier,SegmentValue) AS
(
SELECT
GLId,SegmentQualifier,SegmentValue
FROM
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
WHERE
gla.SegmentQualifier IN ('01','02','03','04','06')
)
UPDATE
#tmpPaidAdvice
SET
#tmpPaidAdvice.Company = GLA_Segment_Update.SegmentValue,
#tmpPaidAdvice.BusinessUnit = GLA_Segment_Update.SegmentValue
#tmpPaidAdvice.Department = GLA_Segment_Update.SegmentValue
WHERE
#tmpPaidAdvice.GLId = GLA_Segment_Update.GLId
The primary key on GL table is GLId + SegmentValue so you could have:
GLID SegmentValue
01 01
01 02
01 03
01 04
And so on.
If defining multiple CTEs is the way, what are the advantage(s)? Is a CTE faster than a JOIN?
I'm a noobie to CTE so I hope I followed the rules of posting. (ie. Did not post table structs and inserts as I am thinking someone has already done this and will know how to help out)...
Thanks.
Howard
June 9, 2011 at 1:48 pm
Here are two possible scenarios. But since I don't have anything to test against regarding correct results and performance (e.g. comparing execution plans), you'd need to test both and compare.
As a side note: I removed te NOLOCK hint since I'd like to avoid dealing with "dirty reads".
If there's a reason you have it added, please elaborate and provide related information (table def including indexes and execution plan of the queries in question).
--alternative 1: use a CASE statement in the UPADTE part
UPDATE pa
SET
Company = CASE WHEN gla.SegmentQualifier = '01' THEN gla.SegmentValue ELSE Company END,
BusinessUnit = CASE WHEN gla.SegmentQualifier = '02' THEN gla.SegmentValue ELSE BusinessUnit END,
Department = CASE WHEN gla.SegmentQualifier = '03' THEN gla.SegmentValue ELSE Department END
FROM #tmpPaidAdvice pa
INNER JOIN GLAccounting.dbo.GL_AccountSegment gla
ON pa.GLID = gla.GLID AND gla.SegmentQualifier IN('01','02','03')
--alternative 2: PIVOT (or CrossTab) the gla.SegmentQualifier using a cte
;WITH cte_gla AS
(
SELECT
GLID,
MAX(CASE WHEN SegmentQualifier = '01' THEN SegmentValue ELSE NULL END) AS col01,
MAX(CASE WHEN SegmentQualifier = '02' THEN SegmentValue ELSE NULL END) AS col02,
MAX(CASE WHEN SegmentQualifier = '03' THEN SegmentValue ELSE NULL END) AS col03
FROM GL_AccountSegment
WHERE SegmentQualifier IN('01','02','03')
GROUP BY GLID
)
UPDATE pa
SET
Company = cte_gla.col01,
BusinessUnit = cte_gla.col02 ,
Department = cte_gla.col03
FROM #tmpPaidAdvice pa
INNER JOIN cte_gla
ON pa.GLID = cte_gla.GLID
Regarding your CTE questions
If defining multiple CTEs is the way, what are the advantage(s)? Is a CTE faster than a JOIN?
To start with the second one: a CTE as used in the given scenario is just a different way of writing a subquery, just better readability (at least to some of us 😉 ). So, in the given scenario it is not faster than the equivalent subquery.
The way you tried using a CTE won't really help performance, since the table in question still are queried more than once.
The big advantage of a CTE over a subquery is the option tu use the data recursive (aka recursive CTE). The description in BOL (BooksOnLine, the SQL Server help system usually installed together with SQL Server) has a good explanation as well as sample code to demonstrate it.
Regarding your last question
Will this work?
No, it won't, since the update won't differentiate between the different "interpreation" of the SegmentValue.
June 9, 2011 at 2:11 pm
Thanks Lutz.
DerBeDer, sometimes I am a doofii (plural of doofus). ie. My thought patterns are so linear. Don't know why I did not think of doing update like you did vs multiple.
I was just thinking of using a CTE to reduce the number of update(s). Your first solution did that. Since there is no *real* performance gain, I will use the single query.
For anyone interested:
When running multiple updates vs CTEs, the performance was the same. There's actually 5 of each but I am just posting 2 for brevity:
Both the update and CTEs ran in approximately 50ms.
UPDATE
pa
SET
Company = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = '01'
UPDATE
pa
SET
BusinessUnit = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = '02'
VS
;WITH GLA_Segment (GLId,SegmentValue) AS
(
SELECT
GLId,SegmentValue
FROM
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
WHERE
gla.SegmentQualifier = '01'
)
UPDATE
#tmpPaidAdvice
SET
Company = GLA_Segment.SegmentValue
FROM
GLA_Segment
WHERE
#tmpPaidAdvice.GLId = GLA_Segment.GLId
--AND
-- GLA_Segment.SegmentQualifier = '01'
;WITH GLA_Segment (GLId,SegmentValue) AS
(
SELECT
GLId,SegmentValue
FROM
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
WHERE
gla.SegmentQualifier = '02'
)
UPDATE
#tmpPaidAdvice
SET
BusinessUnit = GLA_Segment.SegmentValue
FROM
GLA_Segment
WHERE
#tmpPaidAdvice.GLId = GLA_Segment.GLId
--AND
-- GLA_Segment.SegmentQualifier = '02'
VS
Your query produced results in 30ms.
UPDATE
pa
SET
Company = CASE WHEN gla.SegmentQualifier = '01' THEN gla.SegmentValue ELSE Company END,
BusinessUnit = CASE WHEN gla.SegmentQualifier = '02' THEN gla.SegmentValue ELSE BusinessUnit END,
Department = CASE WHEN gla.SegmentQualifier = '03' THEN gla.SegmentValue ELSE Department END
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla
ON
pa.GLID = gla.GLID
AND
gla.SegmentQualifier IN ('01','02','03')
Some might think this is a nominal gain, but should scale appropriately.
Your CTE was actually a little slower at 87ms. Also a slightly higher degree of difficulty.
So, in closing, the single update had the best performance.
Thanks again,
Howard
June 9, 2011 at 3:06 pm
Glad I could help 😀
Did you verify the actual exection plan of the current solution to see if there might be an index to even speed it up further?
Regarding the duration itself: how many rows are in the GL_AccountSegment table?
June 9, 2011 at 3:32 pm
I can't add any indexes to database. I'm just a lowly contractor, so I have to get as much speed as possible with what I have. There's 16m or so records.
I am getting something odd here. Maybe it's just late in the day and I'm thick. The update you provided is not returning same data set as individual update/CTEs.
Here's the data from Account_Segment for specified GLIds:
2728063 A1 17
2728063 A3 20052005
2728063 A4 1106
2728063 A5 27T0
2728063 B2 00027
2728063 C1 5CB0MEA52005
2728063 D4 M00027
2728063 D6 4
2728063 F1 2200
2728063 J1 M0002705CB0MEA5
2728063 K6 AA
2728063 L1 067443
2728063 N1 2D
2868620 A3 TAC_N157
3345961 A1 17
3345961 A3 20052005
3345961 A4 1106
3345961 A5 27T0
3345961 B2 00027
3345961 C1 5CB0MFAK2005
3345961 D4 M00027
3345961 D6 4
3345961 F1 2500
3345961 J1 M0002705CB0MFAK
3345961 K6 AA
3345961 L1 067443
3345961 N1 2D
Your update is below:
UPDATE
pa
SET
Company = CASE WHEN gla.SegmentQualifier = 'C1' THEN gla.SegmentValue ELSE Company END,
BusinessUnit = CASE WHEN gla.SegmentQualifier = 'A1' THEN gla.SegmentValue ELSE BusinessUnit END,
Department = CASE WHEN gla.SegmentQualifier = 'K6' THEN gla.SegmentValue ELSE Department END,
NaturalAccount = CASE WHEN gla.SegmentQualifier = 'F1' THEN gla.SegmentValue ELSE NaturalAccount END,
Project = CASE WHEN gla.SegmentQualifier = 'A3' THEN gla.SegmentValue ELSE Project END
---SELECT pa.Glid,gla.Glid,gla.*, pa.*
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla
ON
pa.GLID = gla.GLID
AND
gla.SegmentQualifier IN ('C1','A1','K6','F1','A3')
WHERE
gla.Glid IN (2728063,2868620,3345961)
It updates the data to:
GLIdCompanyBusinessUnitDepartmentNaturalAccountProject
2728063NULL17NULLNULLNULL
2868620NULLNULLNULLNULLTAC_N157
2868620NULLNULLNULLNULLTAC_N157
2868620NULLNULLNULLNULLTAC_N157
2868620NULLNULLNULLNULLTAC_N157
3345961NULL17NULLNULLNULL
3345961NULLNULLAANULLNULL
3345961NULL17NULLNULLNULL
3345961NULLNULLAANULLNULL
3345961NULL17NULLNULLNULL
3345961NULLNULLAANULLNULL
3345961NULL17NULLNULLNULL
3345961NULLNULLAANULLNULL
3345961NULL17NULLNULLNULL
When I run individual CTE or Updates one at a time, it produces the following output:
UPDATE
pa
SET
Company = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = 'C1'
UPDATE
pa
SET
BusinessUnit = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = 'A1'
UPDATE
pa
SET
Department = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = 'K6'
UPDATE
pa
SET
NaturalAccount = gla.SegmentValue
FROM
#tmpPaidAdvice pa
INNER JOIN
GLAccounting.dbo.GL_AccountSegment gla WITH (NOLOCK)
ON
pa.GLID = gla.GLID AND gla.SegmentQualifier = 'F1'
CTEs give same results as below.
Data set:
GLIdCompanyBusinessUnitDepartmentNaturalAccountProject
27280635CB0MEA5200517AA2200NULL
2868620NULLNULLNULLNULLTAC_N157
2868620NULLNULLNULLNULLTAC_N157
2868620NULLNULLNULLNULLTAC_N157
2868620NULLNULLNULLNULLTAC_N157
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
33459615CB0MFAK200517AA2500NULL
Thanks.
Howard
June 9, 2011 at 5:06 pm
Ouch. My fault. That's what's happening if I start posting untested code...
Go for the 2nd version I posted (using the CTE and CrossTab approach).
If you're dealing with a multi-mill table make sure to compare the execution plans on a much larger sample than you're using currently (usually do the test based on a million row sample).
Making a decision for processing a mill rows based on just such a small sample might lead to false conclusions...
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply