February 26, 2009 at 9:37 am
Hi,
I have the schema of a table specified below, its part of a ETL application, so data is being loaded onto the table and queried are being executed from the web to pull records out.
I have been able to tune it so that Queries are quite fast, but the problem I am having at the moment is as follows:
Sorting slows down the queries
There are variations in the way users can request sorting, it can be by [ServiceDate] ASC or [ServiceDate] DESC, but DESC is more common as users want the latest data.
I am using a view, as a partitioned table would have been more suitable, but due to licencing constraints, this couldnt be implemented, the view was used so that when old records are to be removed, the corresponding monthly table can be removed and filegroup dropped, then the view updated, otherwise we would have had a situation where records would need to be deleted, database shrunk etc just to release some space.
CREATE TABLE [dbo].[PersonURN_200901](
[PersonURN] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ContactDN] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CampaignOut] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CampaignIn] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[InstanceURN] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ServiceURN] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ServiceDate] [datetime] NOT NULL,
[Location_id] [int] NULL,
[Client_id] [int] NULL
CONSTRAINT [PK_PersonURN_200901] PRIMARY KEY CLUSTERED
(
[ServiceDate] DESC,
[PersonURN] DESC
)WITH (IGNORE_DUP_KEY = ON) ON [Marketing_200901]
) ON [Marketing_200901]
CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_ORDER_BY_KEYS_PersonURN_200901] ON [dbo].[PersonURN_200901]
(
[Client_id] DESC,
[ServiceURN] DESC,
[InstanceURN] DESC,
[ContactDN] DESC,
[ServiceDate] DESC
)WITH (IGNORE_DUP_KEY = OFF) ON [Marketing_200901]
CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_sort_ServiceDate] ON [dbo].[PersonURN_200901]
(
[ServiceDate] DESC
)WITH (IGNORE_DUP_KEY = OFF) ON [Marketing_200901]
GO
The view schema is as follows
Create View PersonURN
as
Select * from [PersonURN_200901]
union all
Select * from [PersonURN_200902]
Typical Queries are as follows, please bear in mind that each table can hold about 150m rows.
SELECT [InstanceURN] ,
[ServiceURN] ,
CONVERT(VARCHAR(20),[ServiceDate],113),
[ContactDN]
FROM PersonURN
where CLIENT_ID = 106
AND [ServiceDate] BETWEEN '2009-01-05 17:06:00' AND '2009-02-21 17:06:00'
ORDER BY [ServiceDate] desc or ASC
OR
SELECT [InstanceURN] ,
[ServiceURN] ,
CONVERT(VARCHAR(20),[ServiceDate],113),
[ContactDN]
FROM PersonURN (nolock)
where CLIENT_ID = 106
and [InstanceURN] like '782245144%'
AND [ServiceDate] BETWEEN '2009-01-05 17:06:00' AND '2009-02-21 17:06:00'
ORDER BY [ServiceDate] desc or ASC
February 26, 2009 at 9:51 am
I would dump the second index you create, and replace the first one with one that started with service date, then client ID, and then use Include on the other columns you want. Would look like this:
CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_ORDER_BY_KEYS_PersonURN_200901] ON [dbo].[PersonURN_200901]
(
[ServiceDate] DESC
[Client_id] )
INCLUDE
(
[ServiceURN] DESC,
[InstanceURN] DESC,
[ContactDN] DESC,
)WITH (IGNORE_DUP_KEY = OFF) ON [Marketing_200901]
Try that index, see if it improves your results.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 26, 2009 at 10:09 am
cheers for the reply, I have already done this before and it didnt seem to make much difference.
Basically, on looking at the query plan, what I need to do is get rid of the sort operation, what i also noticed is that. If running a query spanning just one month, it doesnt so a sort. But if the Query spans over 2 or more months, a sort is used. This is what i need to get rid off, if i remove the order by in the query, the performance is awesome, but this is needed as data has to be sorted before being returned to the webpage.
Also, I only put in the sedond NC index thinking that this was going to help with the sorting, as if there is an index, I thought there wouldnt be any need for a sort operating in the plan.
February 26, 2009 at 11:50 am
Please post the execution plans for the query with the indexes you already have, and with the index I proposed (on each table).
The Union operator is what is going to force the sort. It can't rely on the index sequence, because what happens if a date from the second table is in between two dates from the first table (for example)?
If you have a check constraint on the date column in both tables, limiting them to a certain range (which would make sense in this case), it might be able to take that into account and get rid of the sort step. Not sure. Probably worth a try.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 26, 2009 at 12:14 pm
GSquared (2/26/2009)
CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_ORDER_BY_KEYS_PersonURN_200901] ON [dbo].[PersonURN_200901]
(
[ServiceDate] DESC
[Client_id] )
INCLUDE
(
[ServiceURN] DESC,
[InstanceURN] DESC,
[ContactDN] DESC,
)WITH (IGNORE_DUP_KEY = OFF) ON [Marketing_200901]
I'd do the key columns the other way around. ClientID is an equality match and ServiceDate is an inequality and the sort column. If serviceDate is first then SQL can only seek on that column and it'll have to do a secondary filter to eliminate rows where the ClientID doesn't match. If clientID is first, then the operation is a single seek to find the start of the range.
The Union operator is what is going to force the sort.
It shouldn't. It's a UNION ALL so no sort and no distinct.
PersonURN_200901
PersonURN_200902
What determines which table a row goes into (assuming this is a manual partitioning)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 26, 2009 at 12:23 pm
Gail, my point about the sort being caused by the Union (All) wasn't because of the operator directly, it's because the query is sorting the results of a union. That's why I suggested trying a check constraint on the date range in each table, which will let the server know that there's no overlap. Not sure if it will help, but it might.
On the sequence of the columns in the index, yeah, I got that one backwards. Was looking at the sort and thinking it would help if the index were sorted on that, but forgot the equality test. Of course, depending on the distribution, it may or may not matter, but better to assume it might.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 26, 2009 at 12:40 pm
GSquared (2/26/2009)
That's why I suggested trying a check constraint on the date range in each table, which will let the server know that there's no overlap. Not sure if it will help, but it might.
Dunno if the optimiser's that smart. If it was a union and there was a constraint on the dates, optimiser would know it didn't have to sort. I don't know if it's smart enough to realise it doesn't have to when there's an explicit ORDER BY. Worth a try, won't do any harm.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 26, 2009 at 4:03 pm
Hi Guys,
I forgot to say that the tables have a check constraint.
ALTER TABLE [dbo].[Marketing_200901] WITH CHECK ADD CONSTRAINT [CHK_Marketing_200901] CHECK (([ServiceDate]>='2009-01-01 00:00:00.000' AND [ServiceDate]<='2009-01-31 23:59:59.000'))
GO
ALTER TABLE [dbo].[Marketing_200901] CHECK CONSTRAINT [CHK_Marketing_200901]
GO
February 26, 2009 at 4:09 pm
(([ServiceDate]>='2009-01-01 00:00:00.000' AND [ServiceDate]<='2009-01-31 23:59:59.000'))
This will miss records with a datetime > '2009-01-31 23:59:59.000' and < '2009-02-01 00:00:00.000'.
The upper bound of this constraint really should be [ServiceDate] < '2009-02-01 00:00:00.000'.
March 2, 2009 at 3:10 am
Hi Guys,
I have tried the index with the INCLUDE statement, and although it doesnt seem to make much diffrence to be honest, in fact when used in the view, it seems to take a bit longer compared to using a covering index to cover all columns required.
But its still doing a sort and that is still slowing things down, especially when the query date spans between 2 or more months.
CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_ORDER_BY_KEYS_PersonURN_200901] ON [dbo].[PersonURN_200901]
(
[Client_id],
[ServiceDate] DESC
)
INCLUDE
(
[ServiceURN] DESC,
[InstanceURN] DESC,
[ContactDN] DESC,
)WITH (IGNORE_DUP_KEY = OFF) ON [Marketing_200901]
March 2, 2009 at 3:40 am
Dean Jones (3/2/2009)
I have tried the index with the INCLUDE statement, and although it doesnt seem to make much diffrence to be honest, in fact when used in the view, it seems to take a bit longer compared to using a covering index to cover all columns required.
Please post exec plan.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 2, 2009 at 5:01 am
you might notice some variations in the column, but its still thesame thing.
March 2, 2009 at 5:58 am
Dean Jones (3/2/2009)
Hi Guys,I have tried the index with the INCLUDE statement, and although it doesnt seem to make much diffrence to be honest, in fact when used in the view, it seems to take a bit longer compared to using a covering index to cover all columns required.
But its still doing a sort and that is still slowing things down, especially when the query date spans between 2 or more months.
CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_ORDER_BY_KEYS_PersonURN_200901] ON [dbo].[PersonURN_200901]
(
[ServiceDate] DESC
[Client_id] )
INCLUDE
(
[ServiceURN] DESC,
[InstanceURN] DESC,
[ContactDN] DESC,
)WITH (IGNORE_DUP_KEY = OFF) ON [Marketing_200901]
I've noticed in the above CREATE INDEX statement that you still have [ServiceDate] first. If your queries are looking for a specific [Client_id] within a specific [ServiceDate] range as you indicated in a previous post as a typical query, have you tried reversing the [ServiceDate] and [Client_id] in your index?
March 2, 2009 at 7:12 am
It's the union all operation that's slowing it down. Or so it seems to me.
Try dumping the data into a single table, in a test environment, with that index (with the first two reversed, as Gail suggested). See how that performs.
It may very well be that the solution is going towards a more standard table structure, instead of the dynamic tables you're currently using.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 2, 2009 at 7:24 am
GSquared (3/2/2009)
Try dumping the data into a single table, in a test environment, with that index (with the first two reversed, as Gail suggested). See how that performs.
I'd like to see the resulting plan with the index column order reversed first. The union all shouldn't be causing that much of a problem. It just concatenates two resultsets. The sorts are the big problem and as long as there's an ORDER BY, those aren't going to go away.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply