October 6, 2010 at 9:53 pm
I'm out of ideas on what else to try with this query and I'm very sure I'm missing something in reading the Showplan Statistics Profile output (attached). The query times are running from as "short" as ten minutes up to about an hour. I could use some suggestions on things to try to get these query times down to under a minute.
The overall goal of the query is to move data from a prior processing month into the current processing month for data that we have not gotten any updates for in the current month. If there is a failure, the overall process retries, therefore this piece of it has code to insure that it will not insert duplicates of data that may already be present. A duplicate can only be identified by a combination of two fields (ClaimNumber, ClaimAdjustedSequence). The data is inserted in blocks of 6 months of service in a loop that moves through the full set of service months to try and keep the inserts between 150k and 400k worth of records per insert to minimize log pressure. when done looping through service months, up to several million rows will have been inserted in total.
This is the only process running on the box, the box is a 2 processor dual core, 16gig of ram, running windows 2003 enterprize 64 bit, SQL Server 2005 enterprize 64 bit.
The table sp_help output is attached, there are only two tables involved in the query.
Side note: adding the "LagsDetailId > 91879813" to the outer query dramatically slowed the query. This was counterintuitive for me as there are lots of records that otherwise meet the filter criteria below that Id.
[font="Courier New"]
INSERT INTO dbo.tblLagsDetail_AsOfMonth (
AsOfMonth, AsOfMonthDate, LagsDetailId
, ServiceMonth
, PayMonth
, PBPCode, ClaimTypeId, StateRevenueEntity
, OOAFlag, RefundFlag, LeakageFlag, MMRFlag, FraudFlag, WriteOffFlag, IASISFlag
, ClaimHeaderFlag
, HealthPlanId, IPAId, LPOId, CompanyId, SegmentId
, TotalAdjustedAmount, AdjustedClaimCalculatedNet
, AdjustedClaimFlag, OriginalPayMonth, OriginalServiceMonth, OriginalClaimNumber )
SELECT @AsOfMonth, @i_AsOfMonthDate, LDA.LagsDetailId
, ISNULL(LDA.ServiceMonth, ((YEAR(LD.FromDOS)*100) + MONTH(LD.FromDOS)))
, ISNULL(LDA.PayMonth, ((YEAR(LD.DatePaid)*100) + MONTH(LD.DatePaid)))
, LDA.PBPCode, LDA.ClaimTypeId, LDA.StateRevenueEntity
, LDA.OOAFlag, LDA.RefundFlag, LDA.LeakageFlag, LDA.MMRFlag, LDA.FraudFlag, LDA.WriteOffFlag, LDA.IASISFlag
, LDA.ClaimHeaderFlag
, LDA.HealthPlanId, LDA.IPAId, LDA.LPOId, LDA.CompanyId, LDA.SegmentId
, LDA.TotalAdjustedAmount, LDA.AdjustedClaimCalculatedNet
, LDA.AdjustedClaimFlag, LDA.OriginalPayMonth, LDA.OriginalServiceMonth, LDA.OriginalClaimNumber
FROM dbo.tblLagsDetail_AsOfMonth LDA
INNER JOIN dbo.tblLagsDetail LD
ON LD.LagsDetailId = LDA.LagsDetailId
INNER JOIN ( SELECT LD.ClaimNumber, LD.ClaimAdjustedSequence
FROM dbo.tblLagsDetail_AsOfMonth LDA WITH(NOLOCK)
INNER JOIN dbo.tblLagsDetail LD WITH(NOLOCK)
ON LD.LagsDetailId = LDA.LagsDetailId
WHERE LDA.AsOfMonth = @PriorMonth
AND LDA.IPAId = @IPAId
AND LDA.ClaimHeaderFlag = 1
AND LDA.SegmentId = @i_SegmentId
AND LDA.LagsDetailId > 91879813
AND LDA.ServiceMonth BETWEEN @StartMonth AND @EndMonth
EXCEPT
SELECT LD.ClaimNumber, LD.ClaimAdjustedSequence
FROM dbo.tblLagsDetail_AsOfMonth LDA WITH(NOLOCK)
INNER JOIN dbo.tblLagsDetail LD WITH(NOLOCK)
ON LD.LagsDetailId = LDA.LagsDetailId
WHERE LDA.AsOfMonth = @AsOfMonth
AND LDA.ClaimHeaderFlag = 1
AND LDA.SegmentId = @i_SegmentId
AND LDA.ServiceMonth BETWEEN @StartMonth AND @EndMonth) EF
ON EF.ClaimNumber = LD.ClaimNumber
AND EF.ClaimAdjustedSequence = LD.ClaimAdjustedSequence
WHERE LDA.AsOfMonth = @PriorMonth
AND LDA.IPAId = @IPAId
AND LDA.ClaimHeaderFlag = 1
AND LDA.SegmentId = @i_SegmentId
AND LDA.ServiceMonth BETWEEN @StartMonth AND @EndMonth
[/font]
October 7, 2010 at 3:00 am
hahnorlaska (10/6/2010)
Side note: adding the "LagsDetailId > 91879813" to the outer query dramatically slowed the query. This was counterintuitive for me as there are lots of records that otherwise meet the filter criteria below that Id.
Probably you don't have the right index to check this predicate. Can you attach the query plan?
You may also take a look at this article[/url] on how to post performance problems.
A better description of your problem can lead to a better description of a solution.
-- Gianluca Sartori
October 7, 2010 at 6:45 am
Without the execution plan it's just guess work to what's going on. Remember, just because you see an index referred to in the execution plan doesn't mean it's being used well. You might be getting scans instead of seeks, you might be seeing RID or KEY lookup operations, it might be creating work tables... any number of things can be going wrong, but without the execution plan, there's just no way of knowing.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 7, 2010 at 7:56 am
The attached show plan statistics outputs gives detailed information on what was being done to process the query (the physical operator. logical operator, argument, objects involved (indexes), row counts, IO estimates, rows, executions of the part of the query, etc.). This is pretty much the same information that you get from a query plan, only in a tabular format easier to load and search.
All index activity is a seek, I don't see any index scans and I don't see any table scans in it.
On the indexing, the table has a clustered index on the AsOfMonth, IPAId, DetailLagsId. The distribution of data within the @PriorMonth data set for the @IPId that took over 10 hours to complete (the other 17 @IPAId values processed on average in 7 minutes) has only 6,056 records with LagsDetailId > 91879813 and has 3,816,995 with LagsDetailId < 91879814. The next largest @IPAId value has 1,453 for the greater than and 1,635,580 for the less than counts. That completed in 1 hour & 36 minutes.
Because of that data ditribution, I was shocked to see LagsDetailId > 91879813 slowing the query.
October 7, 2010 at 8:12 am
hahnorlaska (10/7/2010)
The attached show plan statistics outputs gives detailed information on what was being done to process the query (the physical operator. logical operator, argument, objects involved (indexes), row counts, IO estimates, rows, executions of the part of the query, etc.). This is pretty much the same information that you get from a query plan, only in a tabular format easier to load and search.
Maybe, but I find it a bit harder to read.
If possible, attach the query plan, and I'll be glad to give it a look.
-- Gianluca Sartori
October 7, 2010 at 11:24 am
Gianluca Sartori (10/7/2010)
hahnorlaska (10/7/2010)
The attached show plan statistics outputs gives detailed information on what was being done to process the query (the physical operator. logical operator, argument, objects involved (indexes), row counts, IO estimates, rows, executions of the part of the query, etc.). This is pretty much the same information that you get from a query plan, only in a tabular format easier to load and search.Maybe, but I find it a bit harder to read.
If possible, attach the query plan, and I'll be glad to give it a look.
Ditto. I hate trying to dig through those things.
By the way, looking at the spreadsheet, what jumps out at me are a very large number of work tables in the query. This suggests, pretty strongly, that you're not getting good index usage.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 8, 2010 at 7:46 am
Don't have time to dig into this, but I wonder if dynamic sql might not help here. date-range queries often benefit from having hard-coded values because the optimal query plan for a 1 day range is VASTLY different from the query plan for a 3 year date range.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 12, 2010 at 5:55 am
I suggest you break up the query in two parts using a temporary table.
First select the duplicates into a temporary table.
Then insert all data not included in the duplicate table.
The #duplicates table will probably almost always be empty so both generating it and checking it should be very quick.
Something like this:
SELECT LD.ClaimNumber,
LD.ClaimAdjustedSequence
INTO #duplicates
FROM dbo.tblLagsDetail_AsOfMonth LDA WITH(NOLOCK)
INNER JOIN dbo.tblLagsDetail LD WITH(NOLOCK)
ON LD.LagsDetailId = LDA.LagsDetailId
WHERE LDA.AsOfMonth = @AsOfMonth
AND LDA.ClaimHeaderFlag = 1
AND LDA.SegmentId = @i_SegmentId
AND LDA.ServiceMonth BETWEEN @StartMonth AND @EndMonth
INSERT INTO dbo.tblLagsDetail_AsOfMonth
-- removed for brevity
SELECT
-- removed for brevity
FROM dbo.tblLagsDetail_AsOfMonth LDA
INNER JOIN dbo.tblLagsDetail LD
ON LD.LagsDetailId = LDA.LagsDetailId
WHERE NOT EXISTS(SELECT * FROM #duplicates d WHERE d.ClaimNumber = LD.ClaimNumber AND d.ClaimAdjustedSequence = LD.ClaimAdjustedSequence)
AND LDA.AsOfMonth = @PriorMonth
AND LDA.IPAId = @IPAId
AND LDA.ClaimHeaderFlag = 1
AND LDA.SegmentId = @i_SegmentId
AND LDA.ServiceMonth BETWEEN @StartMonth AND @EndMonth
If you still have performance problems it should be much easier to optimize each part of the query by itself.
/SG
October 12, 2010 at 10:10 am
The date range is a fixed number of months of data (6 months) for all but the first iteration of the loop. That first iteration date wise is much larger, but data wise is going to be among the lowest record counts. The amount of data that falls into that range varies widely based on which IPAId the loop is processing and where in the years covered the date block is.
If I move to making this dynamic SQL where I hard code the date range instead of using a variable, would that be expected to make a significant difference give the date range is 6 months for the vast majority of the loops?
I could use an indexed temp table, but that temp table will hold millions of rows of data in the very near future. The data volumes involved has made me hesitant to move to that type of a solution. That and the data in the temp table is already the first two fields of an index on the permanent table.
I can't run an execution plan until I have large window to clear data (takes close to a full business day to delete to prep for starting the process over), the main issue is the first data set which makes me suspect something changes as the code moves through iterations (statistics definitely, the query plan possibly). I don't know that a query plan with the loaded data will be the same as what ran for the query iteration that took 10 hours to complete.
October 12, 2010 at 11:09 am
hahnorlaska (10/12/2010)
I could use an indexed temp table, but that temp table will hold millions of rows of data in the very near future. The data volumes involved has made me hesitant to move to that type of a solution. That and the data in the temp table is already the first two fields of an index on the permanent table.
If you are talking about my solution, I think you might have misunderstood what it is doing.
The temp table will only hold duplicate rows. If everything goes well the temp table will be empty. (If I have understood your situation correctly). The temp table does not need any indexes.
I am fairly sure that generating the temp table will just take a few seconds for each iteration, and then inserting the resulting rows will also just take a few seconds. (Assuming that we are talking about up to 400k rows in each iteration)
It might be possible to speed up the original query, but since it is much more complex there is a much larger risk that something goes wrong with the optimization.
Splitting the query is easy to do, and I think it will work perfectly.
October 12, 2010 at 5:21 pm
Yes, I had misunderstood.
The issue with the Except is that it only exists to handle the rare (relatively) occasion where something went wrong as the process is being re-run. Most of the time the Except returns everything with nothing being excluded. Your suggestion is to change it to treat it as an exclusion where the majority expected case is to have no records excluded.
October 13, 2010 at 1:09 am
hahnorlaska (10/12/2010)
Yes, I had misunderstood.The issue with the Except is that it only exists to handle the rare (relatively) occasion where something went wrong as the process is being re-run. Most of the time the Except returns everything with nothing being excluded. Your suggestion is to change it to treat it as an exclusion where the majority expected case is to have no records excluded.
Exactly!
If you try this method, please let me know how it worked.
/SG
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply