June 4, 2012 at 2:02 pm
Marius.D (6/4/2012)
Maybe I'm not getting this, but tableB does not get updated.The big question mark continues to be, with or without more indexes, how come the sp runs anywhere between 30s and 4 hours? (and again, this is the sp, not run in SSMS, parameter sniffing excluded). I am thinking the blocking report I ran last time for a 5 minutes interval may noty catch something that blocks it? (I'm just afraid of running an update interval of 10 seconds over night). Does anybody think this is a possibility or should I cross that off?
It is not about more or less indexes, it is about the proper index. We understand that tableb does not get updated, but its indexes are still used in conjunction with the indexes from tablea to get a better performing update.
Jared
CE - Microsoft
June 4, 2012 at 2:15 pm
Yes, there is a massive row count difference between tableA and tableB. I don't know how to check on what it's waiting for, but I will try to figure out (any pointers are welcome). I will add these indexes, too, and thanks everybody!
June 4, 2012 at 3:25 pm
Maybe I'm not getting this, but tableB does not get updated.
No, but tableB provides a lookup table to get the value with which to update the other table. That index should reduce the overhead to get the proper lookup structure.
An index on tableA is not applicable, since the entire table is being examined anyway.
The big question mark continues to be, with or without more indexes, how come the sp runs anywhere between 30s and 4 hours? (and again, this is the sp, not run in SSMS, parameter sniffing excluded).
Agree with everyone else -- something else must be causing the delay. You would have to monitor the reason for the wait.
You're currently updating all of tableA, in one big transaction. Lots of different activities on tableA could force that update to wait.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 5, 2012 at 9:07 am
After adding/ modifying the indexes, changing the isNull to an Update, it ran for a record 5.8 hours last night. The changes seemed to have shaved off afew seconds when the query was run in SSMS, I believe they are beneficial. I know there are reports running on tableA at teh same time, that's probably what explains the long run time. Someone suggested running the update with (rowlock) to avoid locking the whole table, I will try that next.
June 5, 2012 at 9:14 am
I recommend not playing with locking hints. If SQL's chosing page locks by default (which it probably would with a query that updates a fair portion of the table) then forcing the locks to start as row instead of page is just going to cause lock escalation to happen sooner.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 5, 2012 at 9:51 am
Please try my last posted version.
It includes a check to avoid updating those rows that already have the value that would be assigned.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 12, 2012 at 11:28 am
Update: I noticed I had another sp doing the exact same thing, running for 1 minute or up to 4 - 5 hours. While re-running the query one morning, the same happened, this time in SSMS, it wouldn't finish. I ran a query that I found on sys.dm_exec_sessions amd sys.dm_exec_requests and dicovered it was waiting on CXPACKET. Started reading about this, the system's Parallelism settings were at MaxDOP = 2 and Cost Threshold for Parallelism = 5. I raised the cost to 25, and didn't have a problem with sp's ever since (last Friday)!
Now I see it is just happening again with some report run by a user in Cognos, I wonder if I should raise the threshold a little bit more?
June 12, 2012 at 11:33 am
Why is your server MAXDOP set to 2? Why is your threshold set to 25? I'm not saying that they shouldn't be... I'm asking if you know why they are set the way they are. You should probably start there.
Jared
CE - Microsoft
June 12, 2012 at 11:55 am
That's what I read - Microsoft recommends MaxDOP set to 2 for servers with 4-8 procs (ours has 4). The Cost Threshold set to 25 allows longer running queries only to use parallelism. Changing it from 5 to 25 has definitely helped. But, it looks like it still happens for queries such as those build by Cognos (where you have little to no control for user-ran reports). The report has ran for over 6 hours, wait_type is CXPACKET, wait_resource is blank, blocking_session_id is 0, if I look in Activity monitor it looks like it blocks itself (session id 79 is blocked by 79). So, I wonder if raising the cost threshold would help.. thank you!
June 12, 2012 at 12:25 pm
Marius.D (6/12/2012)
That's what I read - Microsoft recommends MaxDOP set to 2 for servers with 4-8 procs (ours has 4). The Cost Threshold set to 25 allows longer running queries only to use parallelism. Changing it from 5 to 25 has definitely helped. But, it looks like it still happens for queries such as those build by Cognos (where you have little to no control for user-ran reports). The report has ran for over 6 hours, wait_type is CXPACKET, wait_resource is blank, blocking_session_id is 0, if I look in Activity monitor it looks like it blocks itself (session id 79 is blocked by 79). So, I wonder if raising the cost threshold would help.. thank you!
Personally, I would leave the threshold at 5 and increase your MAXDOP to 4. Not sure where you got your info, but BOL says
For servers that have eight or less processors, use the following configuration where N equals the number of processors: MAXDOP=0 to N.
Now, of course every server and situation is different, but if you don't have a known reason to set yours lower than the amount of CPUs you have, why would you? Then, I would start changing the threshold. Of course, I would do that based on a reason... Knowing that some queries were using parallelism that I did not want to.
Jared
CE - Microsoft
June 12, 2012 at 12:49 pm
SQL server only resorts to doing parallel plan processing for an Update statement when there is a WHERE clause. I thought your orignial query did not even use a where clause and had the same problem. All change to the underlying data will be applied serially to keep transaction/log integrity.
I'd be looking at capturing the query plans used to isolate any differences between fast run times and slow run times. If the same execution plan is used only the run times are different then maybe you code is just an unlucky victim of other factors. This would be by running a trace out and doing analysis on it.
Also, Waitstats statistics data is cumulative since the last time the statistics were reset or the server was started. I'd make sure you reset the stats or at least collect and filter them to get a clearer picture. Capture before running code, then capture after running code the difference between the two would be actual thread waits going on during your test.
Cognos is a HEAP'in nightmare, I feal your pain.
June 12, 2012 at 1:09 pm
I was reading this post by Dave Pinal:
But, if you read Jonathan's post, it depends on the hardware, if it's NUMA based or not. The query:
SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks returns 0 and 64, suggesting it may have been setup to use NUMA.
So you are suggesting that using only 2 processors (out of 4) may be too few. This is a mixed system, mostly reporting, but also gets updated constantly. It has only 4 processors, but it's very fast (the rest of the setup is ideal, 94GB RAM available to SQL, mdf, ldf, tempdb's are all on separate drives. If anything, the bottleneck is the processors. Since my change to up the threshold from 5 to 25 was definitely beneficial, I thought that's the direction I need to keep going. Now I'm wondering if increasing the MAXDOP to 3 or 4 would help. This happened today, the query hanging for 6 hours, with the system almost idling. No deadlocks or anything else happening. Which is what I find puzzling...The query had some calculations (SUM), but no WHERE clause.
I will do more reading on this topic...
June 13, 2012 at 9:44 am
Can you post the wait types and query that you used to get them (the same spreadsheet from the other post the query I'm guessing is from Dave Pinal). i don't want to detract from this thread until we figure out the issue. You don't know for sure if the waits are the cause or a symptom.
Jared
CE - Microsoft
June 13, 2012 at 9:53 am
Okay, sorry for the mixup. I used this query from Jonathan Kehayias' 'A Performance Troubleshooting Methodology for SQL Server': http://www.simple-talk.com/sql/performance/a-performance-troubleshooting-methodology-for-sql-server/
query:
SELECT TOP 10
wait_type ,
max_wait_time_ms wait_time_ms ,
signal_wait_time_ms ,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )
AS percent_total_waits ,
100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )
AS percent_total_signal_waits ,
100.0 * ( wait_time_ms - signal_wait_time_ms )
/ SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM sys.dm_os_wait_stats
WHERE wait_time_ms > 0 -- remove zero wait_time
AND wait_type NOT IN -- filter out additional irrelevant waits
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',
'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',
'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',
'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',
'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',
'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DESC
June 13, 2012 at 9:59 am
No worries 🙂 Just easier to keep the whole story together, even if it is more reading. I am certainly not an expert in analyzing these waits, but I'm sure someone will look at them. What we do need to remember is that the percentages returned in the results are based on the TOP 10. If waits are not the real issue, the % of the top one means nothing anyway...
Jared
CE - Microsoft
Viewing 15 posts - 16 through 30 (of 32 total)
You must be logged in to reply to this topic. Login to reply