December 17, 2014 at 3:30 am
I have a table of about 1.8 million violations that I need to associate with a sanction. The sanctions table has about 48000 sanctions on it. Each order in my violation table may have many (and I mean many) violations associated with it and each order may have many sanctions. The violations do not always have an end date hence the use of the start date as a date parameter. The sanction is triggered when the violations hit certain totals and there is no pattern to when these totals are reached. When a sanction is triggered the violations are reset and do not count towards any following sanctions. On the sample tables, violations 10052, 10053 and 18572 are dealt with on sanction 45222, violations 19624, 19755, 19854 and 25052 are dealt with on sanction 45224 and finally the remaining three violations are handled by sanction 45328.
What I need to find is the next sanction that was raised after each set of violations. I'm using the code below to do this and it does exactly what I need. The problem is it's horribly slow over the full table. Does anybody have any better ideas? I think a Windowing Function is the way to go but I'm not sure how.
create table #violation
(
OrderID int ----Many orders in the full table
,StartDate datetime2
,EndDate datetime2 null
,ViolationID int primary key --- Not consecutive for individual orders
)
create table #sanction
(
OrderID int ---Many orders in the full table
,IdentifiedDate datetime2
,SanctionID int primary key --- Not consecutive for individual orders
)
insert into #violation
select 111111,'2014-09-27 19:00:00','2014-09-27 19:11:00',10049 union all
select 222222,'2014-09-27 20:00:00','2014-09-28 03:16:00',10051 union all
select 222222,'2014-09-28 03:17:00','2014-09-28 08:00:00',10052 union all
select 111111,'2014-09-28 23:05:00','2014-09-28 23:25:00',10053 union all
select 111111,'2014-09-29 20:05:00',null,18572 union all
select 111111,'2014-10-01 19:25:00','2014-10-01 21:18:00',19624 union all
select 111111,'2014-10-04 20:45:00','2014-10-04 21:01:00',19755 union all
select 111111,'2014-10-06 19:00:00','2014-10-06 19:01:00',19854 union all
select 111111,'2014-10-15 20:09:00',null,25052 union all
select 111111,'2014-11-05 00:58:00','2014-11-06 06:56:00',26350 union all
select 111111,'2014-12-14 01:00:00','2014-12-14 05:15:00',27852 union all
select 111111,'2014-12-16 19:00:00','2014-12-16 23:11:00',28536
insert into #sanction
select 222222,'2014-09-29 11:16:00',44566 union all
select 111111,'2014-09-30 14:00:00',45222 union all
select 111111,'2014-10-16 09:00:00',45224 union all
select 100001,'2014-12-17 10:55:00',45328
select
s.orderid
,v.startdate
,v.enddate
,v.violationid
,s.identifieddate
,s.sanctionID
from #violation v
join #sanction s on s.orderid = v.orderid
where
s.identifieddate = (select min(s2.identifieddate) from #sanction s2
where s2.orderid = s.orderid
and s2.identifieddate > v.startdate)
drop table #violation, #sanction
This is the edited highlights of the full code but it's the SELECT MIN(date) in the sub-query that's slowing things down.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
December 17, 2014 at 4:19 am
Instead of MIN why not try a TOP 1 with an ORDER BY to get the dates in the correct order. You should even be able to have an index set in that order as well.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 17, 2014 at 5:15 am
Thanks Grant, I've tried the TOP 1 approach and replaced the CTE's I was using with indexed temp tables but it's still not returned anything after half an hour.
Is there any way to lose that sub-query completely I wonder? I've a suspicion this isn't going to be quick no matter what I do.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
December 17, 2014 at 7:00 am
Not seeing the execution plan it's hard to make suggestions as to where exactly it's running slow. Are you sure you have indexes in place that support the query you're running.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 17, 2014 at 7:10 am
I've put a clustered index on the Violations table that uses the StartDate and one on the Sanctions table using the dateidentified. I'll be honest though and say what I know about indexes and execution plans can be written on the back of cigarette packet. Will the execution plan for one order be the same as that for the whole set? I've not yet got the query to run to completion so as far as I know I can't produce the execution plan. If a sample set will produce the same plan I'll post one ASAP.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
December 17, 2014 at 7:21 am
You can get the estimated execution plan without ever running the query at all. That's the place to start.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 17, 2014 at 7:47 am
Of course I can :doze:
It makes repeated reference to creating an index on the TimeViolation table but that's read-only. The rest of it is so much hieroglyphics to me so if you could have a look over it please I'd appreciate it.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
December 17, 2014 at 9:25 am
First, you're estimated to be moving 2.9 million rows into temp tables and that's coming from four scans against your tables. That's just to load #temp_violation. That's going to be a giant performance hit and time suck, right there. You have a WHERE clause on the query, but it doesn't seem to do any filtering of the data at all. Based on the cardinality of the tables, you're not dealing with 2.9 million rows from each table, but you're multiplying them through this set of JOINs to arrive at that. This is the first thing I'd check. Also, a very common code smell is the DISTINCT operator. That usually indicates there is a problem with data, your structure, or your code putting together the data and the structure (or on really fun days, all three).
The next insert estimate is that it's moving one row, but it's still going from a couple of scans. There are suggested indexes on both these that are worth exploring. Hard to say how much this is hurting you. The estimate is only for a single row, but I don't trust that.
You might want to try building and loading the temp tables, then adding the clustered indexes to them. Right now, by having the indexes defined, you're having to maintain the index and the statistics while the data loads. That's also going to be a hefty performance hit.
You then have a whole series of functions. Looks like scalar functions for the most part. Depending on what these do, it's all processing time in the main query.
Query 24 is the SELECT statement. The WHERE clause is not leading to any filtering because most of the data access is scans. The worse one is the scan against the FilteredOrders table that is part of a nested loops join where it's going to scan 39k rows for every row that comes out of the rest of the query. That's not good. The temp tables are all being scanned too.
Pretty much, you need to work on the indexing. That's the worst thing going on at the moment. Your code isn't filtering in a way that the indexes support (or, your indexes don't support your filters).
First index I'd add, OrderID on the FilteredOrders table. But, that's the same as the suggestion for the missing indexes.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 17, 2014 at 9:26 am
First thing is to get the best clustering:
Cluster the violation table on ( OrderID, StartDate )
Cluster the sanction table on ( OrderID, IdentifiedDate )
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 17, 2014 at 9:41 am
Stupid f"£$ing RDP connection has shut itself down and I've lost everything I did previously!!!!!! I'll now have to pretty much start again but that may not be a bad thing. Thanks for your input gents, at least the starting again will have the improvements in there.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
December 17, 2014 at 10:00 am
Here's a possible re-write of the first temp table INSERT:
insert into #temp_violation
select
tv.OrderID
,tv.StartDate
,tv.EndDate
,ca1.HourID
from EMSTBLSN.dbo.TimeViolation tv
cross apply (
select distinct fh.HourID
from G4SFEP1.dbo.Service fs
join G4SFEP1.dbo.Hours fhon fh.ServiceID= fs.ServiceID
where
fs.LocationID = tv.SiteID and
exists(
select 1
from test.dbo.FilteredOrders fo
where fo.OrderId = fs.OrderID
)
) as ca1
where
tv.ExcludeFromATV= 0
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 31, 2014 at 2:55 am
I've finally got back to this after the Christmas break and a brief sojourn in proper work.
Thanks for your advice gents. I learned something very useful about execution plans yesterday, in that you can edit the code used to create them from a saved plan. That meant I could resurrect the original code I'd used after losing it in the RDP incident. I've also picked up a fair bit about indexing too, I put the indexing in place and it did speed things up a little. Then I had a flash of inspiration and I finally realised why a particular bit of code was used by others. It struck me that I didn't need to join to the Violations table to populate my Breach table. I needed to find the breaches for the order numbers that were in the violations table but I didn't need to link each breach to each individual violation. I used
where so.eai_orderno_int in (select orderno from #temp_violation)
rather than the join and it now runs in fifty or so seconds rather than fifty or so minutes. That construct always seemed a little pointless to me (a complete noob) but now it makes total sense.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
December 31, 2014 at 5:17 am
Can you post the execution plan?
Have you considered using EXISTS
WHERE EXISTS (SELECT 1 FROM #temp_violation t WHERE t.orderno = so.eai_orderno_int)
rather than IN
where so.eai_orderno_int in (select orderno from #temp_violation)
- performance is unlikely to be significantly different
- EXISTS allows you to use more than one correlating column between the inner and outer query
- extrapolating a little, you're less likely to get bitten by the NOT IN / NULL issue[/url]
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 31, 2014 at 5:37 am
ChrisM@Work (12/31/2014)
Can you post the execution plan? DoneHave you considered using EXISTS No I hadn't, see my earlier comment about being a noob 🙂 The fact it ran at all opened up another can of worms which needed work.
WHERE EXISTS (SELECT 1 FROM #temp_violation t WHERE t.orderno = so.eai_orderno_int)
rather than IN
where so.eai_orderno_int in (select orderno from #temp_violation)
- performance is unlikely to be significantly different Anything that returns a result in less than an hour will be considered a success!!
- EXISTS allows you to use more than one correlating column between the inner and outer query Not likely to be necessary for this but I'll bear it in mind
- extrapolating a little, you're less likely to get bitten by the NOT IN / NULL issue[/url] This is for ad-hoc runs on static data so it's unlikely to be a problem in this case but again I'll bear it mind
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 2, 2015 at 2:07 am
Thanks.
Query 1 (insert into #temp_violation etc.) has an odd filter:
and
(
cast(fh.startdate as date)<= cast(tv.startdate as date)
and
cast(fh.startdate as time)<= cast(tv.startdate as time)
)
This will capture datetimes from fh where they are less than or equal to tv.startdate (as date), but for any fh rows which match - which could be different days - then times after cast(tv.startdate as time) will be eliminated. I'm not sure this is what you want. Shouldn't you just be comparing the two datetimes directly?
[/fh.startdate <= tv.startdate code]
Can you create any indexes on the tables in query 1?
Query 2 (insert into #temp_breach) has an expensive key lookup to table IncidentExtensionBase. Can you add the columns which the key lookup collects to index IX_VI_OUTRES, or alternatively, create a new index which contains just the columns you need from the table? Better still, since you're only using a small number of columns from the view
(eai_orderno_int, vf_subjectorderid and vf_subjectid_lk from Vf_subjectorderExtensionBase)
do you actually need to reference the view at all?
- convert the scalar function EM_Exit.dbo.G4STime() to an iTVF.
Query 3 would be helped with an additional index on #temp_breach: orderid and DateIdentified
Edit:
Here's the missing index hint for Query 3. Table TimeViolation isn't referenced in the query.
/*
Missing Index Details from Ex_Plan.sqlplan
The Query Processor estimates that implementing the following index could improve the query cost by 59.6517%.
*/
/*
USE [EMSTBLSN]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[TimeViolation] ([ExcludeFromATV])
INCLUDE ([SiteID],[StartDate],[EndDate])
GO
*/
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply