Indexed View ... Slow !?!?

Question

Post reply

Indexed View ... Slow !?!?

SeanNerd

Old Hand

Points: 369
More actions
January 16, 2009 at 11:42 am

#205536

I have a process that is very slow, and thought indexed views would be the solution. The indexed view seems to have made things even slower, much to my surprise. I'm hoping I made a silly error and someone can spot it, or suggest a way to make this faster.
The scenario: The transact table has sales for admission tickets for particular start dates. There are quantity limits on each show, so the show won't get over-sold. During the sale, we are providing the user with the quantity remaining. Some of the rules of Max4Sale have restrictions down to the item level, some restrict from the department or category level (So, you might have a departement called 'IMAX' and category of 'BUGS 11:30' and items like 'ADULT' and 'CHILD' - but you can only sell 500 seats of any kind - so the limit of 500 is on the category 'BUGS 11:30'.
Tables (simplified):
StaticDates (cDate CHAR(8))
Transact (Department CHAR(10), Category CHAR(10), Item CHAR(10), Quantity INT, Start_date DATETIME)
Max4Sale (ID INT, Department CHAR(10), Category CHAR(10), Item CHAR(10), Limit INT, Type TINYINT, StartTime DATETIME, EndTime DATETIME)
StaticDates is filled with one record for each calendar date for the next ten years - like '20090201', '20090202', etc.
Max4Sale.Type is 1 for department level limit, 2 for category level and 3 for item level.
So - I made this view to see what is available:
CREATE VIEW [dbo].[_v_Max4Sale] WITH SCHEMABINDING
AS
SELECT sd.CDate, m.id AS Max4SaleID
, SUM(t.quantity) AS Quantity
, COUNT_BIG(*) AS Records
FROM dbo.max4sale m
CROSS JOIN dbo.StaticDates sd
JOIN dbo.transact t
ON m.department = t.department
AND m.category = CASE WHEN m.type > 1 THEN t.category ELSE m.category END
AND m.item = CASE WHEN m.type > 2 THEN t.item ELSE m.item END
-- Had to use DATEADD amd DATEPART because CONVERT caused "imprecise" conversion errors.
AND t.start_date BETWEEN DATEADD(ss, DATEPART(ss, m.start_time),
DATEADD(mi, DATEPART(mi, m.start_time),
DATEadd(hh, datepart(hh, m.start_time), sd.Date)
)
)
AND DATEADD(ss, DATEPART(ss, m.end_time),
DATEADD(mi, DATEPART(mi, m.end_time),
DATEadd(hh, datepart(hh, m.end_time), sd.Date)
)
)
GROUP BY sd.CDate, m.id
GO
CREATE UNIQUE CLUSTERED INDEX [PK_cl_u_CDateMax4SaleID] ON [dbo].[_v_Max4Sale] ([CDate] ASC, [Max4SaleID] ASC)
GO
-- (Took 34 minutes to create index - which makes me think I've done it right)
ANSI NULLS and QUOTED_IDENTIFIERS and ARITHABORT are ON. So - what I expect is to be able to do something like this to get the availability for the day for all shows:
SELECT *
FROM _v_Max4Sale
WHERE cDate = '20081227'
Everything works, but it takes nearly 5 minutes to get the results - which is telling me that it isn't using the indexed view the way I expect. I've added additional non-clustered indexes on just the CDate portion (which I though was not necessary - but I was trying anything at that point).
Can anyone spot something I'm totally missing?
Thanks.
- Sean

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

John Rowan SSC Guru Points: 56440 More actions · Answer 1

Why the CROSS JOIN on your dates table? How many rows are in each of the tables?

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

SeanNerd Old Hand Points: 369 More actions · Answer 2

The CROSS JOIN is necessary to get a record for every day. So - say for instance you had an IMAX show for next Monday, but hadn't sold any tickets yet, then there wouldn't be any records in the transact table for that date - yet there would be 500 seats available. So - the cross join provides a way to get a result for a day where no sales have been made yet.

It also provides a way to get results for max4sale rules that span multiple days, but only the start date and end date are stored - so if you want results for the middle day of a three day limit, there will be a date to join on.

There are about 5000 rows in dates, about 600 rows in Max4Sale and millions of rows in transact.

- Sean

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 3

Can you post the execution plan for the original query?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 4

We need to see the execution plan to verify, but here are a couple of things:

1) cDate in StaticDates should be stored as a datetime instead of char(8).

2) The index you created won't help the query at all, since the query does not use the cDate column or the ID column to filter the results.

3) You are using functions in the join clause - this is guaranteed to prevent the usage of any indexes on those columns if they exist.

You should be able to modify that to:

AND t.start_date >= sd.Date + m.start_time

AND t.start_date < sd.DAte + m.end_time

This might allow for usage of indexes - but, if it doesn't it could be rewritten to filter on just the dates first, then use functions to filter on the times (e.g. get the time for start_date and see if that is >= the time from start_time and less than the time from end_time).

Also - I see the table definition has StaticDates as cDate - but, you are using sd.Date. Is this Date column a datetime column in StaticDates?

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

John Rowan SSC Guru Points: 56440 More actions · Answer 5

Jeffrey Williams (1/16/2009)
We need to see the execution plan to verify, but here are a couple of things:
1) cDate in StaticDates should be stored as a datetime instead of char(8).
2) The index you created won't help the query at all, since the query does not use the cDate column or the ID column to filter the results.
3) You are using functions in the join clause - this is guaranteed to prevent the usage of any indexes on those columns if they exist.
You should be able to modify that to:
AND t.start_date >= sd.Date + m.start_time
AND t.start_date < sd.DAte + m.end_time
This might allow for usage of indexes - but, if it doesn't it could be rewritten to filter on just the dates first, then use functions to filter on the times (e.g. get the time for start_date and see if that is >= the time from start_time and less than the time from end_time).
Also - I see the table definition has StaticDates as cDate - but, you are using sd.Date. Is this Date column a datetime column in StaticDates?

Just a note on #1 above, it is becomming a fairly common practice (at least in dimensional modeling for DW apps) to store the dates dimension with that YYYYMMDD format. I would, however, store it as an int, not char(8).

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 6

John Rowan (1/16/2009)
Just a note on #1 above, it is becomming a fairly common practice (at least in dimensional modeling for DW apps) to store the dates dimension with that YYYYMMDD format. I would, however, store it as an int, not char(8).

Even so, I would have to disagree that it is a good thing - even if it is becoming a common practice. Storing as either char(8) or int is going to force implicit conversion to datetime (or date in SQL Server 2008) - which is going to have an impact on your queries.

In most cases - an implicit conversion to datetime is going to prevent the usage of any indexes for the query and could cause the query to take much longer - and more resources to process.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

John Rowan SSC Guru Points: 56440 More actions · Answer 7

True. Looking back at the static dates table in this example, I see that this would be the case. What we do (for our DW) is use the YYYYMMDD int as the clustered PK on the dates dimension for more efficient joins with our large fact tables. The main difference is that we have a datetime column in our dates dimension that we can then use for any datetime comparison functions. That piece is missing here as the dates table in the example only consists of the char(8) column.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

SeanNerd Old Hand Points: 369 More actions · Answer 8

Jeffrey, Grant, John - thanks for your responses.

I've made some typos in translating this from the real situation to a more easily postable question, and messed up some field names. Sorry about that. I'll try to correct it.

1. Why is CHAR(8) wrong? Is it for space concerns? I used char(8) because in order to use the view, I need to do something like 'SELECT * FROM _v_max4Sale where cDate = convert(char(8), someDateTime, 112)' I figured that since I need to truncate the time portion, that I may as well store the other part of the comparison in CHAR format so I'd be using only one convert instead of two.

2. There is no where clause in the view - but there is a where clause in the SELECT on the view - as in my example - "SELECT * FROM _v_Max4Sale WHERE cDate = '20081227'" - isn't the index on the view to facilitate that?

3. Hmm. I think I may have a warped idea of how indexed views are supposed to work. Maybe you guys can straighten me out. Here is what I thought using a much more simple example:

If I have a transaction table that has millions of rows for a given year, and I want to report on daily totals for the month of December, I could say

SELECT CONVERT(CHAR(8), t.TransactionDate, 112) as ISODate, SUM(t.Quantity) as Total

FROM transact t

WHERE t.TransactionDate BETWEEN '20081201' AND '20081231 23:59:59.997'

GROUP BY CONVERT(CHAR(8), t.TransactionDate, 112)

But if that is slow, I can make an indexed view using that same select statement (without the WHERE clause), and create a clustered index on that view on the ISODate. Then I can say:

SELECT ISODate, Total

FROM _v_MyTransactView

WHERE ISODate between '20081201' and '20081231'

and I should get instantaneous results - because an indexed view is supposed to do some on-disk-magic when the index is created. Is that incorrect? Are the sums for quantity in this example not kept on the disk? I thought that was the point of indexed views ?! If not, then I should stop working on the view, and focus only on making the SELECT statement better.

Thanks for looking at this.

- Sean

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

I'd still use the datetime data type in my date dimension in my data warehouse. I'd just be sure the time portion was 00:00:00.000. If using SQL Server 2008 I'd change that to the date type.

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 10

Sean Lee (1/16/2009)1. Why is CHAR(8) wrong? Is it for space concerns? I used char(8) because in order to use the view, I need to do something like 'SELECT * FROM _v_max4Sale where cDate = convert(char(8), someDateTime, 112)' I figured that since I need to truncate the time portion, that I may as well store the other part of the comparison in CHAR format so I'd be using only one convert instead of two.

This is bad because it prevents usage of an index on the column 'someDateTime'. Instead of trying to remove the time portion, modify your query to do the following:

WHERE someDateColumn >= '20081231 00:00:00.000'

AND someDateColumn < '20090101 00:00:00.000'

This will include everything on 20081231 and use an index on someDateColumn if it exists.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

John Rowan SSC Guru Points: 56440 More actions · Answer 11

Lynn Pettis (1/16/2009)
I'd still use the datetime data type in my date dimension in my data warehouse. I'd just be sure the time portion was 00:00:00.000. If using SQL Server 2008 I'd change that to the date type.

Date datatype is an option in 2008 also, but....

1. Alternatively, you may wish to use the new DATE built-in data type as the surrogate key type for the date dimension. However, be aware that seeks into the fact table are not normally supported for operations that use date functions. For best performance, use =, BETWEEN, or IN ( … list of keys … ) to filter fact table rows by date, regardless of the type of the column used as the surrogate key for date.

Reference: http://technet.microsoft.com/en-us/library/cc719182.aspx

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

John Rowan (1/16/2009)
Lynn Pettis (1/16/2009)
I'd still use the datetime data type in my date dimension in my data warehouse. I'd just be sure the time portion was 00:00:00.000. If using SQL Server 2008 I'd change that to the date type.
Date datatype is an option in 2008 also, but....
1. Alternatively, you may wish to use the new DATE built-in data type as the surrogate key type for the date dimension. However, be aware that seeks into the fact table are not normally supported for operations that use date functions. For best performance, use =, BETWEEN, or IN ( … list of keys … ) to filter fact table rows by date, regardless of the type of the column used as the surrogate key for date.
Reference: http://technet.microsoft.com/en-us/library/cc719182.aspx

Never said I'd use functions on the date dimension table. I have learned over the past few years about quering data based on dates (with or without time values).

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 13

There is no need for a cross-join on the date table to get all of the dates... it should be an outer join, instead. That is likely your performance problem. Heh... well... that an storing dates in something other than the DATETIME datatype. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)