Do I need to index this table?

  • Hi,

    I've got a T-SQL query that is run in an sp on SQL2000. Recently, there was a complaint that the front-end was timing out before results were returned. I've determined that the problem is a table called Attendance with almost 400,000 rows. Here's the query:

    DECLARE @CurrDate AS nvarchar(10)

    SELECT @CurrDate = '2009-02-02'

    DECLARE @Dept AS int

    SELECT @Dept = 15

    (SELECT E.EmployeeID, E.FirstName, E.LastName, D.Department, @CurrDate

    FROM dbo.Employees E inner join Departments D

    ON E.DepartmentID = D.DepartmentID

    WHERE D.ATDRequired = 1

    AND E.DepartmentID = @Dept

    AND (EmployeeID NOT IN (Select EmployeeID From EmployeesToExclude WHERE EmployeeID IS NOT NULL) )

    AND (CONVERT(nvarchar(10), DateHired, 20) <= @CurrDate OR DateHired IS NULL)

    AND (EmployeeID NOT IN

    (SELECT UserEID

    FROM dbo.Attendance

    WHERE (Mode = '001') AND (UserEID IS NOT NULL) AND ((CONVERT(nvarchar(10), AttendanceDateTime, 20)) >= @CurrDate) AND (CONVERT(nvarchar(10), AttendanceDateTime, 20) < DATEADD(day,1,@CurrDate)))))

    Note that this query is simplified, but I'm sure the Attendance table is the issue - it will take almost 30 seconds to run just this. In the real world, it's running on a range of dates, so there's a loop with a different @CurrDate each time. Running the entire query for Feb. 2 to Feb. 17 takes 9 minutes to return 473 rows. The tables Employees, EmployeesToExclude and Departments all have primary keys. If I exclude the last condition ("EmployeeID NOT IN (SELECT UserEID....") it runs in a second or less. The Attendance table has only 3 fields: UserEID nvarchar(6), Mode nvarchar(3), and AttendanceDateTime datetime. This table is not indexed, has no primary or foreign keys and grows quite a bit every day. I suspect that what I need to do is index the table, but I'm not sure. Can anyone suggest anything?

    Thanks!

  • What is the data type for this column: AttendanceDateTime.

  • Here is my first shot with minor changes. I would also look at moving some of the subqueries into left joins in the FROM clause, but I'll wait a bit before doing that.

    DECLARE @CurrDate AS datetime

    SELECT @CurrDate = '2009-02-02'

    DECLARE @Dept AS int

    SELECT @Dept = 15

    SELECT

    E.EmployeeID,

    E.FirstName,

    E.LastName,

    D.Department,

    @CurrDate

    FROM

    dbo.Employees E

    inner join Departments D

    ON E.DepartmentID = D.DepartmentID

    WHERE

    D.ATDRequired = 1

    AND E.DepartmentID = @Dept

    AND (EmployeeID NOT IN (Select EmployeeID From EmployeesToExclude WHERE EmployeeID IS NOT NULL))

    AND (DateHired <= @CurrDate OR DateHired IS NULL)

    AND (EmployeeID NOT IN

    (SELECT

    UserEID

    FROM

    dbo.Attendance

    WHERE

    (Mode = '001') AND

    (UserEID IS NOT NULL)

    AND (AttendanceDateTime >= @CurrDate)

    AND (AttendanceDateTime < DATEADD(day,1,@CurrDate))))

  • And this is my code changes using left outer joins. Of course, with no tables and sample data, I can't test it:

    DECLARE @CurrDate AS datetime

    SELECT @CurrDate = '2009-02-02'

    DECLARE @Dept AS int

    SELECT @Dept = 15

    SELECT

    E.EmployeeID,

    E.FirstName,

    E.LastName,

    D.Department,

    @CurrDate

    FROM

    dbo.Employees E

    inner join Departments D

    ON E.DepartmentID = D.DepartmentID

    left outer join dbo.EmployeesToExclude ete

    ON E.EmployeeID = ete.EmployeeID

    left outer join dbo.Attendance att

    on (E.EmployeeID = att.UserEID

    and att.Mode = '001'

    and att.UserEID is not null

    and att.AttendanceDateTime >= @CurrDate

    and att.AttendanceDateTime < DATEADD(day,1,@CurrDate))

    WHERE

    D.ATDRequired = 1

    AND E.DepartmentID = @Dept

    AND ete.EmployeeID is null

    -- AND (EmployeeID NOT IN (Select EmployeeID From EmployeesToExclude WHERE EmployeeID IS NOT NULL))

    AND (DateHired <= @CurrDate OR DateHired IS NULL)

    AND att.UserEID is null

    -- AND (EmployeeID NOT IN

    -- (SELECT

    -- UserEID

    -- FROM

    -- dbo.Attendance

    -- WHERE

    -- (Mode = '001') AND

    -- (UserEID IS NOT NULL)

    -- AND (AttendanceDateTime >= @CurrDate)

    -- AND (AttendanceDateTime < DATEADD(day,1,@CurrDate))))

  • Lynn's suggested changes are a great way to start. What he failed to mention is... why. Even if you added a pot wad of indexes to the original code, the best you would be able to do is INDEX SCANS and the worst would be INDEX SCANS and maybe some Book Mark lookups. No indexes would probably work better than any indexes on the original code. The reason why is because most of the date-centric columns in the criteria have conversion formulas. That virtually guarantees no INDEX SEEKS because it must read each value, THEN do the conversion so see if it "fits".

    The WHERE NOT IN's are a lot less of a problem. They may have improved outer joins with NULL lookups recently, but the testing I did in 2004 showed than WHERE NOT IN had a bit of a performance advantage in complex code over outer joins with NULL lookups.

    The other question I would ask is, when will DateHired ever actually be a NULL? I can see it for something like DateTerminated, but not DateHired. That would be one of the things I'd check... can DateHired ever be NULL?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah... another "tweek"....

    WHERE EmployeeID IS NOT NULL is a bit slower than WHERE EmployeeID > 0 assuming, of course, that you have no EmployeeID's <= 0... sometimes, quite a bit if it forces an index scan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/17/2009)


    Lynn's suggested changes are a great way to start. What he failed to mention is... why. Even if you added a pot wad of indexes to the original code, the best you would be able to do is INDEX SCANS and the worst would be INDEX SCANS and maybe some Book Mark lookups. No indexes would probably work better than any indexes on the original code. The reason why is because most of the date-centric columns in the criteria have conversion formulas. That virtually guarantees no INDEX SEEKS because it must read each value, THEN do the conversion so see if it "fits".

    The WHERE NOT IN's are a lot less of a problem. They may have improved outer joins with NULL lookups recently, but the testing I did in 2004 showed than WHERE NOT IN had a bit of a performance advantage in complex code over outer joins with NULL lookups.

    The other question I would ask is, when will DateHired ever actually be a NULL? I can see it for something like DateTerminated, but not DateHired. That would be one of the things I'd check... can DateHired ever be NULL?

    Only failed to explain why as my break at work got called short and I wasn't able to add the additional info. Not sure about the IN and NOT IN, I guess testing would determine which would be better in each case. Never hurts to try something and if it doesn't work switch back.

    Of course, the OP still hasn't answered my question yet either so it may be a moot point if it turns out the dates aren't stored as datetime values.

  • The AttendanceDateTime field is of type datetime, as I said in my initial post. I haven't had a chance to try any of your suggestions out yet, having just got into the office, but I'll take a look and let everybody know what the results are. I really, really appreciate everyone's input. 😀

  • I read your post several times and never caught the reference to AttendenceDateTime until you pointed out that you had mentioned it, if it had been a snake...

    Next question I have for you, are most of your queries against dbo.Attendance date related? You may want to put a clustered index on the AttendanceDateTime field. Ranged searches will benefit from the clustered index if that is one of the primary access paths on dbo.Attendance.

  • Lynn, your suggested query runs in a second, as opposed to minutes for mine. THANK YOU!!! 😀 I just have one question: I thought because DateHired and AttendanceDateTime are datetime fields and @CurrDate is an nvarchar that I had to use a CONVERT to compare them. Clearly I don't, but I don't understand why not. Can you explain this to me?

    Jeff, as to your suggestions, believe it or not UserEID can be NULL. This is because we have a Paris office that doesn't always send us all the info they should. 😉 As to EmployeeID, I believe the new LEFT OUTER JOIN that Lynn suggested means I do have to check for NULLs there, but correct me if I'm wrong.

    Thank you so much both of you for all your help!

  • Forgot to mention your question on the clustered index for AttendanceDateTime in the Attendance table. I'll research this; I really don't know. This is an app I inherited from someone who's no longer here. Aren't all the apps we work with in that category, though? 😛 I suspect, though, that it will turn out to be the case that AttendanceDateTime is used in almost all queries on that table, so I'll probably end up acting on your suggestion and put a clustered index on that field. Again, thanks tons.

  • Melanie Peterson (2/18/2009)


    Lynn, your suggested query runs in a second, as opposed to minutes for mine. THANK YOU!!! 😀 I just have one question: I thought because DateHired and AttendanceDateTime are datetime fields and @CurrDate is an nvarchar that I had to use a CONVERT to compare them. Clearly I don't, but I don't understand why not. Can you explain this to me?

    Jeff, as to your suggestions, believe it or not UserEID can be NULL. This is because we have a Paris office that doesn't always send us all the info they should. 😉 As to EmployeeID, I believe the new LEFT OUTER JOIN that Lynn suggested means I do have to check for NULLs there, but correct me if I'm wrong.

    Thank you so much both of you for all your help!

    First, if you take a close look at my code, you'll see that I changed the data type of @CurrDate to datetime. Also, if you compare a datetime field to a char/nchar field, SQL Server will do an implicit conversion of the char/nchar field to datetime. If the value can't be converted to a datetime value, you will get an error message.

  • Got it, thanks!

  • thanks

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply