January 5, 2017 at 9:18 am
Suppose I have table with the following structure.
CREATE TABLE [TAB].[tblWorkOrderSegInvoices](
[CustomerNumber] [varchar](7) NOT NULL,
[WorkOrderNumber] [varchar](10) NOT NULL,
[SegmentNumber] [char](2) NOT NULL,
[PartsInvoiceAmount] [decimal](13, 2) NOT NULL,
[LaborInvoiceAmount] [decimal](13, 2) NOT NULL,
[MiscInvoiceAmount] [decimal](13, 2) NOT NULL,
[ETLDate] [datetime] NULL
) ON [PRIMARY]
GO
ALTER TABLE [TAB].[tblWorkOrderSegInvoices] ADD CONSTRAINT [DF_tblWorkOrderSegInvoices_ETLDate] DEFAULT (getdate()) FOR [ETLDate]
GO
There is a stored procedure that runs daily that does the following with this table.
TRUNCATE TABLE TAB.tblWorkOrderSegInvoices
IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'idxCustWOSeg')
DROP INDEX idxCustWOSeg ON TAB.tblWorkOrderSegInvoices WITH ( ONLINE = OFF )
--do inserts into table; around 450,000 records each day
INSERT INTO TAB.tblWorkOrderSegInvoices WITH (TABLOCK)
(CustomerNumber, WorkOrderNumber, SegmentNumber, PartsInvoiceAmount, LaborInvoiceAmount, MiscInvoiceAmount)
SELECT .....
CREATE CLUSTERED INDEX idxCustWOSeg ON TAB.tblWorkOrderSegInvoices
(
CustomerNumber ASC,
WorkOrderNumber ASC,
SegmentNumber ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Then after the truncate table, drop index, insert data, create index, I have a SELECT query that joins to this table on CustomerNumber, WorkorderNumber, and SegmentNumber.
All of last month, the query that joins to this table was running within an acceptable timeframe. But starting this month, it has taken 4 times longer. The only difference that I can detect is that the number of records being inserted into it daily has lessened by around 140,000. Also, the number of records being joined into it has decreased by 100,000 or so.
In trying to troubleshoot this, I've not applied the index and it runs within the normal acceptable timeframe. But when I add the index it takes 4x longer to run.
As a side note, we are about to move this database to a new server. We have it over on the new server and everything about this process performs as expected.
All of this leads me to believe this is some kind of index\plan\statistics problem. I'm not a DBA and don't claim to be. But I am responsible for this process and making sure it runs well. Any help would be great.
January 5, 2017 at 9:37 am
Can you post, as a .sqlplan attachment, the actual (as opposed to estimated) execution plan for "SELECT query that joins to this table", please? Thanks.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 5, 2017 at 10:03 am
Thank you Chris for offering to look at this. I'm running it now. I'm warning you now that its not pretty. I'll upload as soon as it completes.
January 5, 2017 at 10:37 am
When is the last time you rebuilt statistics on the other tables that you're joining to this one?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 5, 2017 at 11:52 am
It looks like it may have been a while.
Attached is the query plan. The main query that is the trouble is Query 3. As you can see, it is a monster. In order to put the necessary data together, we have to do some interesting things. I know it isn't the prettiest. Let me know if you have any questions.
--Update: also updated plan when I've dropped the index.
January 5, 2017 at 5:36 pm
LeeFAR (1/5/2017)
It looks like it may have been a while.Attached is the query plan. The main query that is the trouble is Query 3. As you can see, it is a monster. In order to put the necessary data together, we have to do some interesting things. I know it isn't the prettiest. Let me know if you have any questions.
--Update: also updated plan when I've dropped the index.
In that case, before you continue troubleshooting/tuning, you really should rebuild stats on the tables involved, which will also cause the procs that use the table to recompile. You might have nothing else to do after that.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 6, 2017 at 1:51 am
LeeFAR (1/5/2017)
It looks like it may have been a while.Attached is the query plan. The main query that is the trouble is Query 3. As you can see, it is a monster. In order to put the necessary data together, we have to do some interesting things. I know it isn't the prettiest. Let me know if you have any questions.
--Update: also updated plan when I've dropped the index.
Thanks Lee. Here are the steps I'd take next:
Update stats, as Jeff recommends.
Change the two table variables to #temp tables. They are seriously screwing up the rowcount estimates.
Grab the remote data into a #temp table, it's also screwing up estimates.
Post the actual plan, and also the modified query 3 (it's truncated in the plan file).
Some of the source tables are read multiple times - customers/stores/county as a group is read five times. If the query cannot be tweaked to reduce the duplication of effort then you might get some good mileage from putting this section of the query into a #temp table too.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 6, 2017 at 8:35 am
Thanks Jeff and Chris. I cannot thank you enough for taking your time to help me.
Chris, I did see that the table vars are causing the estimated rows to be stupid crazy. I don't know if you looked at the other execution plan, but that is from when I dropped the index on the one table. In that plan the estimated rows is in line. It is also where the query runs within is normal about of time. Its odd to me that dropped in the index would cause this behavior.
I will start with updating stats. Would I be better off running sp_updatestats in the database or UPDATE STATISTICS on each table involved with the query?
January 6, 2017 at 8:48 am
One other thing I noticed. I went back to see when the stats were last built on the TAB.tblWorkOrderSegInvoices table and it was today. I guess when I looked before, I didn't have the right object. With the status being built today makes sense based on my understanding of when stats are automatically updated. Because I'm dropping the index, inserting a large amount, and adding the index back, then the stats are current. My thinking is now that perhaps the stats are getting created out of whack.
SELECT name AS stats_name,
STATS_DATE(object_id, stats_id) AS statistics_update_date
FROM sys.stats
WHERE object_id = OBJECT_ID('TAB.tblWorkOrderSegInvoices')
order by statistics_update_date;
GO
stats_namestatistics_update_date
idxCustWOSeg1/6/17 4:09 AM
_WA_Sys_00000003_6991A7CB1/6/17 4:09 AM
_WA_Sys_00000002_6991A7CB1/6/17 4:09 AM
_WA_Sys_00000001_6991A7CB1/6/17 4:09 AM
January 6, 2017 at 8:53 am
I don't think you've posted the SQL select statement, but even though the final returned result is only 300k rows, just looking at the execution plan, I'm seeing table spool operations with row counts in hundreds of millions. It is by chance doing some type of cross join between tables and then filtering the result?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 6, 2017 at 9:08 am
Here is the t-sql in the store procedure that is running the query that is taking the time. The execution plan that is bad is what is produced base on this. If you do a find for --primary query; this is the one that is having the performance problem you will find the query that is taking the time. Again, if I don't create the idxCustWOSeg, it seems to run in the 12 mins it has been. But if I have the index, it takes over an hour. And prior to Jan 1, it was running with the index at an average of 12 mins.
January 6, 2017 at 9:19 am
LeeFAR (1/6/2017)
Here is the t-sql in the store procedure that is running the query that is taking the time. The execution plan that is bad is what is produced base on this. If you do a find for --primary query; this is the one that is having the performance problem you will find the query that is taking the time. Again, if I don't create the idxCustWOSeg, it seems to run in the 12 mins it has been. But if I have the index, it takes over an hour. And prior to Jan 1, it was running with the index at an average of 12 mins.
Get this...
FROM OPENQUERY(AS400,'
into a #temp table for starters, and time it. Then replace the OPENQUERY reference in your query, for the #temp table.
Edit: also, to vastly simplify your problem query, include whatever processing you can for the result set - this, for instance:
ISNULL(W.MachineManufacture,'Unknown') AS MachineManufacture, ISNULL(W.MachineFamily,'') AS MachineFamily, ISNULL(W.MachineModel,'') AS MachineModel,
ISNULL(W.MachineDisplayModel,'') AS MachineDisplayModel, ISNULL(W.ForeCastModel,'') AS ForeCastModel, ISNULL(W.ProductFamilyGroup,'') AS ProductFamilyGroup,
ISNULL(W.MachineIDNumber,'') AS MachineIDNumber, ISNULL(W.MachineManufactureYear,'') AS MachineManufactureYear,
ISNULL(W.AttachmentManufacture,'Unknown') AS AttachmentManufacture, ISNULL(W.AttachmentModel,'') AS AttachmentModel, ISNULL(W.AttachmentDisplayModel,'') AS AttachmentDisplayModel,
ISNULL(W.AttachmentDescription,'') AS AttachmentDescription, ISNULL(W.AttachmentIDNumber,'') AS AttachmentIDNumber, ISNULL(W.AttachmentManufactureYear,'') AS AttachmentManufactureYear,
ISNULL(SCA.Salesman1Name,'Unknown') AS Salesman1Name, ISNULL(SCA.Salesman1TypeDescription,'Unknown') AS Salesman1TypeDescription, ISNULL(SCA.Salesman1CategoryDescription,'Unknown') AS Salesman1CategoryDescription, SCA.Salesman1AssignDate,
ISNULL(SCA.Salesman2Name,'Unknown') AS Salesman2Name, ISNULL(SCA.Salesman2TypeDescription,'Unknown') AS Salesman2TypeDescription, ISNULL(SCA.Salesman2CategoryDescription,'Unknown') AS Salesman2CategoryDescription, SCA.Salesman2AssignDate,
can all be done at the #temp table creation stage.
Edit: as an aside, scalar functions such as dbo.fnFormatDBSToSQLDate will inhibit parallelism in all or part of the plan (all, in this case).
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 6, 2017 at 9:33 am
LeeFAR (1/6/2017)
Thanks Jeff and Chris. I cannot thank you enough for taking your time to help me.Chris, I did see that the table vars are causing the estimated rows to be stupid crazy. I don't know if you looked at the other execution plan, but that is from when I dropped the index on the one table. In that plan the estimated rows is in line. It is also where the query runs within is normal about of time. Its odd to me that dropped in the index would cause this behavior.
I will start with updating stats. Would I be better off running sp_updatestats in the database or UPDATE STATISTICS on each table involved with the query?
Update statistics on each table.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 6, 2017 at 9:44 am
An update. The first thing I did was to change the @CODA and @WarrPolCusts to temp tables instead of table vars. This took the query right at 10 mins to complete. This was with the index still in place on the TAB.tblWorkOrderSegInvoices. So it does appear these table vars had something to do with the problem.
Now, I'm certainly not discounting the fact there are some crappy things going on in this overall process. I've been trying to address a lot of it over time. And based on what you guys have been recommending, I have some good places to go. I welcome any additional advice you may have.
January 6, 2017 at 11:04 am
One thing I would look at is splitting the call to the function dates into temp tables.
e.g. get distinct values of the dates that are being passed to the functions, then call the function on that list and insert into a table with the date itself and the result of the function.
This woud get rid of both the len(..) = 8 and the function call.
Something like
insert the openquery onto a temp table as sugested. that one is a killer for sure.
then create a table as
select processdate
, case when len(dates.processdate) = 8 then dbo.fnFormatDBSToSQLDate(processdate) else null end as dbtosqldate
, case when len(dates.processdate) = 7 then dbo.fnFormatYearDaysToSQLDate(processdate) else null end as yeardaystosqldate)
into #dates
from (
select distinct dates.processdate
from newopenquerytable
outer apply (select headeropendate as processdate
union
select headerclosedate
union
select invoicedate
union
select reopendate
... add all required
) dates
where dates.processdate is not null
) t
-- eventually set the 2 fields above to be a single one if output of the 2 functions is the same data type - will make remmaining code cleaner and temp table smaller.
-- the above assumes that the data types of the input fields are the same - adapt as needed
create clustered index #dates_ix1 on #dates
(processdate
);
and
then use on remaining code
select ...
, d1.headeropendate
, d2.invoicedate
...
from newopenquerytable w
inner join #dates d1 -- or left join if nulls allowed on dates.
on d1.processdate = headeropendate
inner join #dates d2
on d2.processdate = invoicedate
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply