February 7, 2013 at 10:52 am
I've been playing around with CTE's and when I run the code below it executes instantaneously.; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL
)
, Txn AS (
SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID
FROM dbo.etl_txn_Staging1
WHERE
tranid IN (SELECT tranid FROM PaymentsTxn)
)
SELECT * FROM Txn The IO stats are as follows:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Staging1'. Scan count 1, logical reads 5728, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Accounts'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Impressive and I like it 🙂 See attached execution plan "SSC_CTESelectExample"
When I go to actually use the data from the CTE's and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive 🙁 Here's the code:
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL
)
, Txn AS (
SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID
FROM dbo.etl_txn_Staging1
WHERE
tranid IN (SELECT tranid FROM PaymentsTxn)
)
UPDATE
dbo.etl_txn_Staging1
SET
TxnFlag = CASE WHEN TxnFlag1 = '1' THEN 'E'
WHEN tranid IN (SELECT TranID FROM PaymentsTxn) THEN 'P'
ELSE COALESCE(TxnFlag1, 'M')
END
WHERE TxnFlag IS NULL The results from IO stats are below and the updated execution plan is attached "SSC_CTEUpdateExample". The table "ETL_TXN_Staging1" is a rather wide (175 columns) and does have some "small" non-clustered indexes on it (<20MB), as well as a 2 columned clustered index - again, this is small (700MB)
Table 'etl_txn_Staging1'. Scan count 2, logical reads 6999519, physical reads 0, read-ahead reads 10048, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 1461317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Accounts'. Scan count 18, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There's about 500,000 records in the table. Even running a simple update takes over 10 secondsUPDATE etl_txn_Staging1 SET txnflag = NULL
My question is: Am I doing something wrong with the CTE in performing an update like this or would your suspicions tend to lean toward the performance issue being with the table being updated?
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
February 7, 2013 at 11:01 am
Since it looks like you are going over the network to another instance, use the CTE to populate a local temporary table and use that table in your update statement.
February 7, 2013 at 11:25 am
I opted to your temp tables, creating an index where needed and got the update down to about 23sec. - do you think I would see any further improvement by actually using a physical table? If not, I'll leave that one as is...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
February 8, 2013 at 9:14 am
You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 8, 2013 at 9:30 am
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */
CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))
CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL)
INSERT INTO #PaymentsTxn
SELECT * FROM PaymentsTxn
; WITH Txn AS (
SELECT TranID
FROM dbo.etl_txn_Staging
WHERE
tranid IN (SELECT tranid FROM #PaymentsTxn))
INSERT INTO #Txn
SELECT * FROM Txn
The run time for the entire query was 3+ mins, using the code below I have gotten it down to just over 1 minute...does anyone seem any further performance improvements?/* Prepare storage tables */
CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))
CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))
CREATE TABLE #PayDescs (TranID int, PayDesc char(80))
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL)
INSERT INTO #PaymentsTxn
SELECT * FROM PaymentsTxn
; WITH Txn AS (
SELECT TranID
FROM dbo.etl_txn_Staging
WHERE
tranid IN (SELECT tranid FROM #PaymentsTxn))
INSERT INTO #Txn
SELECT * FROM Txn
/* Create index on temp table to help with below queries */
CREATE NONCLUSTERED INDEX idx_TranID ON #PaymentsTxn([TraniD])
/* Perform Updates as needed */
UPDATE dbo.etl_txn_Staging
SET TxnFlag =
CASE WHEN TxnFlag1 = '1' THEN 'E'
WHEN tranid IN (SELECT TranID FROM #PaymentsTxn) THEN 'P'
ELSE COALESCE(TxnFlag1, 'M')
END
WHERE TxnFlag IS NULL
UPDATE dbo.etl_txn_Staging
SET TxnFlagDesc =
CASE WHEN txnflag IN ('0', '5', '3') THEN 'Card Transactions'
WHEN txnflag = '1' THEN 'Tire Programs'
WHEN txnflag = '2' THEN 'Plus Chek Transactions'
WHEN txnflag = '4' THEN 'Permit Transaction'
WHEN EDTxnFlag = '1' THEN 'ED'
END
WHERE TxnFlagDesc IS NULL
INSERT INTO #PayDescs
SELECT pp.tranid, yy.LutDescription
FROM [NAS2-DBR].COREISSUE.dbo.CCardLookUp AS yy WITH (READUNCOMMITTED)
INNER JOIN #PaymentsTxn AS pp WITH (READUNCOMMITTED)
ON yy.lutcode = pp.pmtpaymenttype
AND yy.lutid = 'pmtpaymenttype'
WHERE
pp.tranid IN (SELECT TranID FROM #Txn)
AND pp.TranID IS NOT NULL
UPDATE dbo.etl_txn_Staging
SET TransactionDescription =
CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)
THEN a.PayDesc
ELSE TransactionDescription
END
FROM #PayDescs a INNER JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID
UPDATE dbo.etl_txn_Staging
SET TxnFlagDesc =
CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)
THEN a.PayDesc
ELSE 'Miscellaneous Activity'
END
FROM #PayDescs a RIGHT JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID
/* Destroy objects in TempDB */
D-ROP TABLE #PaymentsTxn
D-ROP TABLE #Txn
D-ROP TABLE #PayDescs
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
February 8, 2013 at 10:20 am
MyDoggieJessie (2/8/2013)
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables/* Prepare storage tables */
CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))
CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL)
INSERT INTO #PaymentsTxn
SELECT * FROM PaymentsTxn
; WITH Txn AS (
SELECT TranID
FROM dbo.etl_txn_Staging
WHERE
tranid IN (SELECT tranid FROM #PaymentsTxn))
INSERT INTO #Txn
SELECT * FROM Txn
I haven't had a chance to review the code, but you're not gaining anything by using CTEs here. Since you're not filtering or joining the CTE in the final INSERT statement, the result of the outermost SELECT statement is equivalent to the result of the CTE, and you can just cut out the CTE.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 9, 2013 at 9:16 pm
drew.allen (2/8/2013)
You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.
The optimizer is very good at removing stuff that isn't referenced, so apart from the query text being larger than it needs to be, and a few extra microseconds of compilation time, there are no adverse effects from the redundant common table expression. The worktable mentioned in the STATISTICS IO output relates to the Eager Table Spool in the plan, required for Halloween Protection.
February 9, 2013 at 9:39 pm
MyDoggieJessie (2/7/2013)
When I go to actually use the data from the CTEs and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive 🙁
You can determine most of the reasons for the slow performance from the execution plan. I took the liberty of viewing the plans using the free Plan Explorer tool from SQL Sentry, because it makes certain things easier to see:
The first thing to focus on is the Remote Query:
This is executed 289,465 times (to return a grand total of 18 rows!), which does not compare well to the single trip to the remote server shown in the SELECT plan.
The second thing is the Eager Table Spool I mentioned to Drew. This may or may not be avoidable, since it is there to prevent incorrect results or an infinite loop due to the well-known Halloween Problem. Nevertheless, careful design can sometimes remove the need for HP, or allow a more efficient implementation than writing the full result set to a hidden tempdb worktable, before reading it all back again (which is what an Eager Table Spool does).
The third thing is the Sort before one of the nonclustered index updates:
This has a greater number of rows than estimated, which may mean the memory allocated for the sort operation turns out to be insufficient at run time. Memory grant is fixed and allocated before execution starts, and cannot be increased during execution regardless of the amount of free memory your instance may have available. If the memory turns out to be too small, one or more sort runs (often the full input set of rows) is spilled to physical tempdb disk. You are running SQL Server 2008 so you will not see any indication of a sort spill in the execution plan (SQL Server 2012 has new warnings for this) so you would need to use Profiler to monitor the Sort Warnings event class to determine if this was happening.
I would expect the huge number of remote query executions to be the biggest factor in the poor performance here.
February 9, 2013 at 10:18 pm
As far as optimizing the original query is concerned, I would be interested to see the execution plans and performance information for:
CREATE TABLE #Temp
(
TranID integer PRIMARY KEY
);
INSERT #Temp (TranID)
SELECT DISTINCT
p.TranID
FROM
[NAS2-DBR].COREISSUE.dbo.payments AS p WITH (READUNCOMMITTED)
WHERE
p.TranID IS NOT NULL
AND EXISTS
(
SELECT 1
FROM dbo.etl_txn_Accounts AS a
WHERE
a.FleetNumber = p.FleetNumber
)
OPTION (HASH JOIN, MERGE JOIN);
and then:
UPDATE Staging
SET TxnFlag =
CASE
WHEN EXISTS (SELECT 1 FROM #Temp AS t WHERE t.TranID = Staging.TranID) THEN 'P'
ELSE 'M'
END
FROM dbo.etl_txn_Staging1 AS Staging
WHERE
Staging.TxnFlag IS NULL;
/*
UPDATE Staging
SET TxnFlag = 'P'
FROM dbo.etl_txn_Staging1 AS Staging
JOIN #Temp AS t ON
t.TranID = Staging.TranID;
UPDATE Staging
SET TxnFlag = 'M'
WHERE
Staging.TxnFlag IS NULL;
*/
DROP TABLE #Temp;
You could try the separate UPDATEs shown commented out above instead of the single UPDATE; sometimes the difference can be worthwhile, and sometimes any differences in execution plans can be illuminating too. I don't promise to have the semantics exactly right though, so do please check that as well.
February 10, 2013 at 10:15 am
Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
February 10, 2013 at 1:12 pm
MyDoggieJessie (2/10/2013)
Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.
OK, but that piece of information changes the game completely - having the table locally will very likely completely change the way the optimizer finds an execution plan. Local tables have different performance characteristics and much better statistical information is available. The rewrite I provided primarily issues the remote access problem, though some of the other simplifications might benefit the query plan as well, I suppose. It's hard to know exactly what problem you're trying to solve here - and how you expect to address it well when your test environment bears little resemblance to production...?
February 11, 2013 at 7:52 am
I wasn't positive it was a linked server problem or if it was the way I was presenting the code (clearly the poor performance is the direct result of going across the pipe). I will get permission to move the project to the Production server and do some more testing with the code. I sincerely appreciate everyone's input! The execution plans for the code you posted are attached as well (just in case you're still interested in the results
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply