indexing Datetime column

  • I have table with about 400 columns and 4 million rows.

    the only purpose of this table to be used by a reporting tool. this table is refreshed(dropped and recreated) every night via scheduled Job. so no update/insert/delete.

    there is a Date column with Datetime as datatype. I have created a clustered index on this date column but it only seemed to help a little.(there wont be any other conditions on where clause so I haven't included any other columns in the index)

    the query send by reporting tool is like

    select *(all columns listed)

    from mytable

    where date>='01/01/2010' and date <='12/01/2010'

    it takes about 10 mins to retrieve all that falls under above date range which is about a million rows.

    I need to get this under a minute if I can or the best I can.

    if I can get some idea that might help me to achieve this . I would greatly appreciate it.

    I have tried following but no significant performance gain.

    -change datatype to 'Date'/'varchar'/'int' from 'Datetime'

    -create nonclustered index on same column

    -create clustered/nonclustered index including other columns to make it unique

  • Try changing the select * to just the columns you need.

  • I would if I could but the requirement is to get all the columns.

  • sql84 (12/10/2015)


    I have table with about 400 columns and 4 million rows.

    the only purpose of this table to be used by a reporting tool. this table is refreshed(dropped and recreated) every night via scheduled Job. so no update/insert/delete.

    there is a Date column with Datetime as datatype. I have created a clustered index on this date column but it only seemed to help a little.(there wont be any other conditions on where clause so I haven't included any other columns in the index)

    the query send by reporting tool is like

    select *(all columns listed)

    from mytable

    where date>='01/01/2010' and date <='12/01/2010'

    it takes about 10 mins to retrieve all that falls under above date range which is about a million rows.

    I need to get this under a minute if I can or the best I can.

    if I can get some idea that might help me to achieve this . I would greatly appreciate it.

    I have tried following but no significant performance gain.

    -change datatype to 'Date'/'varchar'/'int' from 'Datetime'

    -create nonclustered index on same column

    -create clustered/nonclustered index including other columns to make it unique

    I doubt you're going to get it any faster... Not without moving to faster (solid-state) hard drives, more RAM & faster network connection.

    The index is correct for the query, but with 400 columns... That could end up being a lot of data to gather up and send across a network connection.

    You say that there are 4M rows in the table but we don't know how many of those are being encompassed by an 11 month time span...

    Plus... We don't know how much of that time is being consumed by the processing of the receiving application.

  • sql84 (12/10/2015)


    I would if I could but the requirement is to get all the columns.

    Have you tried it? If it's faster you can always take that to whoever gave you the requirement and either change the requirement or tell them it's not going to get better.

  • Jason A. Long (12/10/2015)


    sql84 (12/10/2015)


    I have table with about 400 columns and 4 million rows.

    the only purpose of this table to be used by a reporting tool. this table is refreshed(dropped and recreated) every night via scheduled Job. so no update/insert/delete.

    there is a Date column with Datetime as datatype. I have created a clustered index on this date column but it only seemed to help a little.(there wont be any other conditions on where clause so I haven't included any other columns in the index)

    the query send by reporting tool is like

    select *(all columns listed)

    from mytable

    where date>='01/01/2010' and date <='12/01/2010'

    it takes about 10 mins to retrieve all that falls under above date range which is about a million rows.

    I need to get this under a minute if I can or the best I can.

    if I can get some idea that might help me to achieve this . I would greatly appreciate it.

    I have tried following but no significant performance gain.

    -change datatype to 'Date'/'varchar'/'int' from 'Datetime'

    -create nonclustered index on same column

    -create clustered/nonclustered index including other columns to make it unique

    I doubt you're going to get it any faster... Not without moving to faster (solid-state) hard drives, more RAM & faster network connection.

    The index is correct for the query, but with 400 columns... That could end up being a lot of data to gather up and send across a network connection.

    You say that there are 4M rows in the table but we don't know how many of those are being encompassed by an 11 month time span...

    Plus... We don't know how much of that time is being consumed by the processing of the receiving application.

    12 months would return about 1 million rows.(stated in the original post)

    the stat is from direct database query (SSMS) not from the application.

  • Jason A. Long (12/10/2015)


    sql84 (12/10/2015)


    I have table with about 400 columns and 4 million rows.

    the only purpose of this table to be used by a reporting tool. this table is refreshed(dropped and recreated) every night via scheduled Job. so no update/insert/delete.

    there is a Date column with Datetime as datatype. I have created a clustered index on this date column but it only seemed to help a little.(there wont be any other conditions on where clause so I haven't included any other columns in the index)

    the query send by reporting tool is like

    select *(all columns listed)

    from mytable

    where date>='01/01/2010' and date <='12/01/2010'

    it takes about 10 mins to retrieve all that falls under above date range which is about a million rows.

    I need to get this under a minute if I can or the best I can.

    if I can get some idea that might help me to achieve this . I would greatly appreciate it.

    I have tried following but no significant performance gain.

    -change datatype to 'Date'/'varchar'/'int' from 'Datetime'

    -create nonclustered index on same column

    -create clustered/nonclustered index including other columns to make it unique

    I doubt you're going to get it any faster... Not without moving to faster (solid-state) hard drives, more RAM & faster network connection.

    The index is correct for the query, but with 400 columns... That could end up being a lot of data to gather up and send across a network connection.

    You say that there are 4M rows in the table but we don't know how many of those are being encompassed by an 11 month time span...

    Plus... We don't know how much of that time is being consumed by the processing of the receiving application.

    He did mention that the query for that date range returned about 1 million rows, which just confirms that expectations have to change, either about what the query needs to return, or about how quickly it "should" run.

    If you're returning 1 million rows, 400 columns each, you're going to be doing a LOT of reading (unless all the other columns are bits, which I somehow doubt, and even then that's a 50 MB result set :-)). What does sp_spaceused return for that table?

    I'm guessing that the size of 1 million rows in that table will be enough that even if the data were all in memory you'd be looking at a lengthy response just for the reads. As Jason pointed out, that's not even considering the time it will take for a client application to receive and process that result set.

    In all likelihood, something will have to give. Either the existing query will have to be changed to return less data, or if for some reason it is determined that it is absolutely necessary to return what it currently returns, then expectations about its performance will have to be seriously revised.

    You might be able to throw some beefy hardware at it, but depending on how big those 1 million rows are, it could be prohibitively expensive to solve with hardware, landing us back in the camp of managing expectations.

    It's not the happiest situation, but as ZZartin and Jason have already said, it sounds like that's where you're at.

    Cheers!

  • ZZartin (12/10/2015)


    sql84 (12/10/2015)


    I would if I could but the requirement is to get all the columns.

    Have you tried it? If it's faster you can always take that to whoever gave you the requirement and either change the requirement or tell them it's not going to get better.

    if I select only 200 columns it takes half the time it took for 400 columns. so is this just a data volume issue?

  • sql84 (12/10/2015)


    12 months would return about 1 million rows.(stated in the original post)

    the stat is from direct database query (SSMS) not from the application.

    Jacob Wilkins (12/10/2015)


    He did mention that the query for that date range returned about 1 million rows,

    Doh!... I read right over it... Sorry guys.

  • sql84 (12/10/2015)


    if I select only 200 columns it takes half the time it took for 400 columns. so is this just a data volume issue?

    Yes. Your query isn't doing anything funky and you have the correct index in place. There's really nothing left to optimize, at least from the code standpoint. Now it all about disk I/O, memory and network bandwidth.

    If you're using SQL Server Enterprise Edition, you could define the index with DATA_COMPRESSION = PAGE and get a nice little boost, but nothing that's going to take you from 10 mins down to 1 min.

  • The clustered index is definitely correct. Date will be mildly more efficient than datetime, but it's not enough of a difference to be a big deal.

    Make sure the fillfactor on the table is 100% if the table is never updated. If very rarely updated, you can use either 99% or 100%. [At most very rare updates seems very likely since the table is scratched and rebuilt every time.] I've seen default fillfactors as low as 70%(!), which wastes at least 30% of the space. [But even the "typical" fill of 90% would require 10% more reads to get the whole table. Which, btw, is why you should never rely on the fillfactor but explicitly set it for each index. And your default fill really should be higher than 90 -- most large tables shouldn't have 10+% freespace in them.]

    Also look at the average row size. With 400 columns, it's likely to be very wide. Make sure it doesn't exceed 8000 bytes if at all possible, because, if it does, SQL will force 1 or more [n]varchar columns to overflow pages, and those are extremely slow to read. If it's just over 4000 bytes, see if you can get it down to where 2 rows will fit on a page instead of 1. Similarly, if it's just large enough over 2000 bytes to prevent 4 rows to a page, see if you can reduce it to get 4.

    As part of that, and if you still need reductions, see if you can encode any columns. That is, the columns stores standard strings like "Complete", "In Progress", etc., see if you can change them to codes to reduce the table size, although of course that won't reduce the size returned to the user, as you'll have to expand the codes again.

    Edit: Corrected typos in wording.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply