Indexed View ... Slow !?!?

  • I have a process that is very slow, and thought indexed views would be the solution. The indexed view seems to have made things even slower, much to my surprise. I'm hoping I made a silly error and someone can spot it, or suggest a way to make this faster.

    The scenario: The transact table has sales for admission tickets for particular start dates. There are quantity limits on each show, so the show won't get over-sold. During the sale, we are providing the user with the quantity remaining. Some of the rules of Max4Sale have restrictions down to the item level, some restrict from the department or category level (So, you might have a departement called 'IMAX' and category of 'BUGS 11:30' and items like 'ADULT' and 'CHILD' - but you can only sell 500 seats of any kind - so the limit of 500 is on the category 'BUGS 11:30'.

    Tables (simplified):

    StaticDates (cDate CHAR(8))

    Transact (Department CHAR(10), Category CHAR(10), Item CHAR(10), Quantity INT, Start_date DATETIME)

    Max4Sale (ID INT, Department CHAR(10), Category CHAR(10), Item CHAR(10), Limit INT, Type TINYINT, StartTime DATETIME, EndTime DATETIME)

    StaticDates is filled with one record for each calendar date for the next ten years - like '20090201', '20090202', etc.

    Max4Sale.Type is 1 for department level limit, 2 for category level and 3 for item level.

    So - I made this view to see what is available:

    CREATE VIEW [dbo].[_v_Max4Sale] WITH SCHEMABINDING

    AS

    SELECT sd.CDate, m.id AS Max4SaleID

    , SUM(t.quantity) AS Quantity

    , COUNT_BIG(*) AS Records

    FROM dbo.max4sale m

    CROSS JOIN dbo.StaticDates sd

    JOIN dbo.transact t

    ON m.department = t.department

    AND m.category = CASE WHEN m.type > 1 THEN t.category ELSE m.category END

    AND m.item = CASE WHEN m.type > 2 THEN t.item ELSE m.item END

    -- Had to use DATEADD amd DATEPART because CONVERT caused "imprecise" conversion errors.

    AND t.start_date BETWEEN DATEADD(ss, DATEPART(ss, m.start_time),

    DATEADD(mi, DATEPART(mi, m.start_time),

    DATEadd(hh, datepart(hh, m.start_time), sd.Date)

    )

    )

    AND DATEADD(ss, DATEPART(ss, m.end_time),

    DATEADD(mi, DATEPART(mi, m.end_time),

    DATEadd(hh, datepart(hh, m.end_time), sd.Date)

    )

    )

    GROUP BY sd.CDate, m.id

    GO

    CREATE UNIQUE CLUSTERED INDEX [PK_cl_u_CDateMax4SaleID] ON [dbo].[_v_Max4Sale] ([CDate] ASC, [Max4SaleID] ASC)

    GO

    -- (Took 34 minutes to create index - which makes me think I've done it right)

    ANSI NULLS and QUOTED_IDENTIFIERS and ARITHABORT are ON. So - what I expect is to be able to do something like this to get the availability for the day for all shows:

    SELECT *

    FROM _v_Max4Sale

    WHERE cDate = '20081227'

    Everything works, but it takes nearly 5 minutes to get the results - which is telling me that it isn't using the indexed view the way I expect. I've added additional non-clustered indexes on just the CDate portion (which I though was not necessary - but I was trying anything at that point).

    Can anyone spot something I'm totally missing?

    Thanks.

    - Sean

  • Why the CROSS JOIN on your dates table? How many rows are in each of the tables?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • The CROSS JOIN is necessary to get a record for every day. So - say for instance you had an IMAX show for next Monday, but hadn't sold any tickets yet, then there wouldn't be any records in the transact table for that date - yet there would be 500 seats available. So - the cross join provides a way to get a result for a day where no sales have been made yet.

    It also provides a way to get results for max4sale rules that span multiple days, but only the start date and end date are stored - so if you want results for the middle day of a three day limit, there will be a date to join on.

    There are about 5000 rows in dates, about 600 rows in Max4Sale and millions of rows in transact.

    - Sean

  • Can you post the execution plan for the original query?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • We need to see the execution plan to verify, but here are a couple of things:

    1) cDate in StaticDates should be stored as a datetime instead of char(8).

    2) The index you created won't help the query at all, since the query does not use the cDate column or the ID column to filter the results.

    3) You are using functions in the join clause - this is guaranteed to prevent the usage of any indexes on those columns if they exist.

    You should be able to modify that to:

    AND t.start_date >= sd.Date + m.start_time

    AND t.start_date < sd.DAte + m.end_time

    This might allow for usage of indexes - but, if it doesn't it could be rewritten to filter on just the dates first, then use functions to filter on the times (e.g. get the time for start_date and see if that is >= the time from start_time and less than the time from end_time).

    Also - I see the table definition has StaticDates as cDate - but, you are using sd.Date. Is this Date column a datetime column in StaticDates?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams (1/16/2009)


    We need to see the execution plan to verify, but here are a couple of things:

    1) cDate in StaticDates should be stored as a datetime instead of char(8).

    2) The index you created won't help the query at all, since the query does not use the cDate column or the ID column to filter the results.

    3) You are using functions in the join clause - this is guaranteed to prevent the usage of any indexes on those columns if they exist.

    You should be able to modify that to:

    AND t.start_date >= sd.Date + m.start_time

    AND t.start_date < sd.DAte + m.end_time

    This might allow for usage of indexes - but, if it doesn't it could be rewritten to filter on just the dates first, then use functions to filter on the times (e.g. get the time for start_date and see if that is >= the time from start_time and less than the time from end_time).

    Also - I see the table definition has StaticDates as cDate - but, you are using sd.Date. Is this Date column a datetime column in StaticDates?

    Just a note on #1 above, it is becomming a fairly common practice (at least in dimensional modeling for DW apps) to store the dates dimension with that YYYYMMDD format. I would, however, store it as an int, not char(8).

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (1/16/2009)


    Just a note on #1 above, it is becomming a fairly common practice (at least in dimensional modeling for DW apps) to store the dates dimension with that YYYYMMDD format. I would, however, store it as an int, not char(8).

    Even so, I would have to disagree that it is a good thing - even if it is becoming a common practice. Storing as either char(8) or int is going to force implicit conversion to datetime (or date in SQL Server 2008) - which is going to have an impact on your queries.

    In most cases - an implicit conversion to datetime is going to prevent the usage of any indexes for the query and could cause the query to take much longer - and more resources to process.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • True. Looking back at the static dates table in this example, I see that this would be the case. What we do (for our DW) is use the YYYYMMDD int as the clustered PK on the dates dimension for more efficient joins with our large fact tables. The main difference is that we have a datetime column in our dates dimension that we can then use for any datetime comparison functions. That piece is missing here as the dates table in the example only consists of the char(8) column.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Jeffrey, Grant, John - thanks for your responses.

    I've made some typos in translating this from the real situation to a more easily postable question, and messed up some field names. Sorry about that. I'll try to correct it.

    1. Why is CHAR(8) wrong? Is it for space concerns? I used char(8) because in order to use the view, I need to do something like 'SELECT * FROM _v_max4Sale where cDate = convert(char(8), someDateTime, 112)' I figured that since I need to truncate the time portion, that I may as well store the other part of the comparison in CHAR format so I'd be using only one convert instead of two.

    2. There is no where clause in the view - but there is a where clause in the SELECT on the view - as in my example - "SELECT * FROM _v_Max4Sale WHERE cDate = '20081227'" - isn't the index on the view to facilitate that?

    3. Hmm. I think I may have a warped idea of how indexed views are supposed to work. Maybe you guys can straighten me out. Here is what I thought using a much more simple example:

    If I have a transaction table that has millions of rows for a given year, and I want to report on daily totals for the month of December, I could say

    SELECT CONVERT(CHAR(8), t.TransactionDate, 112) as ISODate, SUM(t.Quantity) as Total

    FROM transact t

    WHERE t.TransactionDate BETWEEN '20081201' AND '20081231 23:59:59.997'

    GROUP BY CONVERT(CHAR(8), t.TransactionDate, 112)

    But if that is slow, I can make an indexed view using that same select statement (without the WHERE clause), and create a clustered index on that view on the ISODate. Then I can say:

    SELECT ISODate, Total

    FROM _v_MyTransactView

    WHERE ISODate between '20081201' and '20081231'

    and I should get instantaneous results - because an indexed view is supposed to do some on-disk-magic when the index is created. Is that incorrect? Are the sums for quantity in this example not kept on the disk? I thought that was the point of indexed views ?! If not, then I should stop working on the view, and focus only on making the SELECT statement better.

    Thanks for looking at this.

    - Sean

  • I'd still use the datetime data type in my date dimension in my data warehouse. I'd just be sure the time portion was 00:00:00.000. If using SQL Server 2008 I'd change that to the date type.

  • Sean Lee (1/16/2009)1. Why is CHAR(8) wrong? Is it for space concerns? I used char(8) because in order to use the view, I need to do something like 'SELECT * FROM _v_max4Sale where cDate = convert(char(8), someDateTime, 112)' I figured that since I need to truncate the time portion, that I may as well store the other part of the comparison in CHAR format so I'd be using only one convert instead of two.

    This is bad because it prevents usage of an index on the column 'someDateTime'. Instead of trying to remove the time portion, modify your query to do the following:

    WHERE someDateColumn >= '20081231 00:00:00.000'

    AND someDateColumn < '20090101 00:00:00.000'

    This will include everything on 20081231 and use an index on someDateColumn if it exists.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Lynn Pettis (1/16/2009)


    I'd still use the datetime data type in my date dimension in my data warehouse. I'd just be sure the time portion was 00:00:00.000. If using SQL Server 2008 I'd change that to the date type.

    Date datatype is an option in 2008 also, but....

    1. Alternatively, you may wish to use the new DATE built-in data type as the surrogate key type for the date dimension. However, be aware that seeks into the fact table are not normally supported for operations that use date functions. For best performance, use =, BETWEEN, or IN ( … list of keys … ) to filter fact table rows by date, regardless of the type of the column used as the surrogate key for date.

    Reference: http://technet.microsoft.com/en-us/library/cc719182.aspx

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (1/16/2009)


    Lynn Pettis (1/16/2009)


    I'd still use the datetime data type in my date dimension in my data warehouse. I'd just be sure the time portion was 00:00:00.000. If using SQL Server 2008 I'd change that to the date type.

    Date datatype is an option in 2008 also, but....

    1. Alternatively, you may wish to use the new DATE built-in data type as the surrogate key type for the date dimension. However, be aware that seeks into the fact table are not normally supported for operations that use date functions. For best performance, use =, BETWEEN, or IN ( … list of keys … ) to filter fact table rows by date, regardless of the type of the column used as the surrogate key for date.

    Reference: http://technet.microsoft.com/en-us/library/cc719182.aspx

    Never said I'd use functions on the date dimension table. I have learned over the past few years about quering data based on dates (with or without time values).

  • There is no need for a cross-join on the date table to get all of the dates... it should be an outer join, instead. That is likely your performance problem. Heh... well... that an storing dates in something other than the DATETIME datatype. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply