September 24, 2023 at 10:48 am
hello,
my query is select * into #TmpA from tblA (nolock). tblA contains 116000000 records.
This query sometime takes 6 minutes and some time takes less 2 minutes. I observed CXConsumer wait types mostly and sometimes CXPacket. On observing the execution plan, parallel execution is taking place.
kindly guide what can be done in this case
Regards,
Saumik Vora
September 25, 2023 at 11:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
September 25, 2023 at 3:05 pm
I don't think there is a pre-baked recipe for you, but FWIW, Brent Ozar has a few recommandations, which you can read about in this blog post:
https://blog.quest.com/demystifying-cxpacket-and-cxconsumer-wait-types-in-sql-server/
September 25, 2023 at 9:19 pm
Why are you selecting everything into a temp table? You should filter that down to just the columns you need - and only the rows that are needed.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 26, 2023 at 9:08 am
You could try the maxdop hint to avoid parallellism
September 30, 2023 at 5:05 pm
Jo Pattyn why on earth would you want to avoid parallellism as that is in every case that I am aware a majorly good thing that helps a Query complete faster provided the hardware on which the database resides is up to snuff and if they are dealing with that much data in a single table I surely hope their servers can properly handle parallellism.
Note: However Jo Pattyn (or anyone), if there is actually a solid reason for this please educate me as I am more than willing to learn something knew.
Also saum70 I would reiterate Jeffrey Williams why are you creating a temporary table that has all the data found within the primary table as that is completely redundant and a horrible use of resources. The purpose of a temp table is usually to work with a smaller subset of the primary table and NOT its entire contents.
October 1, 2023 at 6:10 am
Hello,
@Jeffery and @dennis, The reason i am creating temp table with all the columns is because it will be used multiple times in the stp and all the columns will be referred in the final output. also there is code to find max of serial no against each ordernumber (one ordernumber will have multiple rows).
So instead of referring the physical table again and again, temp table is created to avoid IO performance issue.
It was also observed that the mentioned query when executed in sql query analyzer takes 1.5 mins where when called from front end application (through stp) it takes avg 6 mins
What can be done in this case
Regards,
Saumik
October 1, 2023 at 3:33 pm
By copying all rows and all columns into a temp table - you remove any possibility of utilizing indexes on the source table. So every time you query that temp table you have to read all rows.
You are not avoiding any IO issues by doing this - in fact, you are most likely increasing the amount of IO needed to perform the tasks because you are constantly performing a full scan of that table every time it is referenced.
Without seeing all of the code - I really can't recommend anything to solve your issues. I would guess there are many opportunities to optimize the code, but no way that can be done without seeing that code.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
October 1, 2023 at 5:22 pm
A Corollary to what Jeffrey Williams said -- perhaps you might need additional Indexes upon your parent table.
However, copying that table into a temp table simply does just that, it copies the table into the temporary harddrive (or SSD) storage and as such hitting that temp table is actually the same thing as hitting that parent table of course minus any of the indexes that parent table might have unless you duplicate those indexes as well which again is pointless since you already have the table and its indexes on the harddrive.
Lastly I concur with Jeffrey Williams about us not being able to help you because you have presented no code or metrics analysis of the query. So we have no way of helping you diagnose your current probleme let alone specifically help you solve your current problem. You are asking us to pull the proverbial rabbit out of the proverbial hat and while many might think we are workers of magic that is simply an illusion created by others. 😉
October 2, 2023 at 11:29 am
Hi,
thanks for replying back. due to policies i will not be able to provide actual code but i can replicate with dummy tables. Following is the details for your ref
tblA --- master table with 18900000 rows. Primary key is OrderNumber
tblB -- details table with 116000000. Primary key is OrderNumber, TranSerialNo, MessageCode
tblD -- master table with 36k rows. Primary key is DealerCode
code in stp is as below
select * into #TmpA from tblA(nolock)
select * into #TmpB from tblB(nolock)
create index #idxA on #TmpA(nOrderNumber)
create index #idxB on #TmpA(nOrderNumber, nTranSerialNo)
select DealerId into #TmpD from tblD(Nolock)
select a.OrderNumber,Max(b.TranSerialNo)
into #TmpLatestOrder
from #TmpA a
inner join #TmpB b
on a.OrderNumber = b.OrderNumber
where a.ReferenceNum=0
group by a.OrderNumber
final output query
Select all columns #TmpA and #TmpB from
#TmpA a
inner join #TmpLatestOrder b
on a.OrderNumber = b.OrderNumber
inner join #TmpB c
on b.OrderNumber = c.OrderNumber
and b.TranSerialNo=c.TranSerialNo
inner join #TmpD d
on a.nUserId = d.DealerId
Hope above is helpful
Regards,
Saumik
October 2, 2023 at 1:01 pm
how many rows of tbl A (18900000 total) have the ReferenceNum = 0 ?
and why aren't you filtering them when doing the insert into the temp table?
October 3, 2023 at 6:36 pm
Jo Pattyn why on earth would you want to avoid parallelism as that is in every case that I am aware a majorly good thing
Saum is specifically reporting parallelism waits, attributing it as a problem. whether or not it is actually a problem, not enough information has been provided. Given that Saum is doing a full table scan without any filters and inserting into a temporary heap, it is pretty unlikely that parallelism would be beneficial.
Parallelism is not always good. Excessive parallelism in some cases can make a query run longer than it would if it ran serial. In very common cases, excess parallelism uses up CPU resources that could otherwise be used by something else in queries that do not benefit from the parallelism.
@saum, I don't really have anything to say about your performance problem, but would strongly recommend against using select * into for anything. This is long running DDL and creates schema locks that are not released until the entire statement is complete. If you want to be flippant, you can do a select top 0 into #temp and then use dynamic SQL to do the insert into select from, but to me it is an infinitesimally small cost to just write the SQL that you are going to write one time and then reuse for however long the process is in place.
October 4, 2023 at 9:17 pm
Okay looking at what you shared here would be a more efficient way of doing it -- although you would need to test the second SELECT to be sure it was catching only what you wanted but I am pretty sure without testing it that it should be really close. Then again if it is off I am sure someone else will chime in to correct any mistakes I made.
-- tblA master table with 18,900,000 rows Primary key is OrderNumber
-- tblB details table with 116,000,000 rows Primary key is OrderNumber, TranSerialNo, MessageCode
-- tblD master table with 36,000 rows Primary key is DealerCode
SELECT *
INTO #tmpA
FROM tblA(NOLOCK) AS [a]
INNER JOIN tblD(NOLOCK) AS [d]
ON [d].[DealerId] = [a].[nUserId]
WHERE [a].[ReferenceNum] = 0;
GO
CREATE INDEX #idxA
ON #tmpA( nOrderNumber );
GO
CREATE INDEX #idxB
ON #tmpA( nOrderNumber, nTranSerialNo );
GO
-- This also assumes that #tmpA will now be smaller than tblB
-- Now I believe the ".[TranSerialNo] = (" will work in the
-- following but if it does not then change it to
-- ".[TranSerialNo] IN ("
SELECT *
INTO #tmpB
FROM #tmpA AS [a]
INNER JOIN tblB(NOLOCK) AS
ON .[OrderNumber] = [a].[OrderNumber]
AND .[TranSerialNo] = ( SELECT MAX( TranSerialNo ) AS TranSerialNo
FROM tblB(NOLOCK) AS [b2]
WHERE [b2].[OrderNumber] = [a].[OrderNumber]
GROUP [b2].[OrderNumber]
);
GO
-- Personally I would select all the items from #tmpA and
-- then only those items in #tmpB that are not already
-- contained within #tmpA. However, the following does what
-- yours did which means you will have duplicated column
-- names which is usually a bad thing
SELECT [a].*
FROM #TmpA AS [a]
UNION ALL
SELECT .*
FROM #TmpB AS ;
GO
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply