More efficient way

  • Here is a question from one of my developers. Is there a faster way to do this?

     

    select  datepart(yyyy, start_date) as dmyear

    from Mydatabase.adm.activitylog

    GROUP BY datepart(yyyy, start_date)

    order by datepart(yyyy, start_date) desc

    Basically, it returns a list of only the years that exist in that table.  The problem is that the activitylog is 5.7 million records long, and growing, so this simple query takes about 20 seconds to return the list of 2005, 2004, 2003, & 2002.  Any more efficient ways to arrive at this list?

     

    Thanks!!!

  • Untested, but may be more efficient:

    select  Distinct datepart(yyyy, start_date) as dmyear

    from Mydatabase.adm.activitylog

    order by datepart(yyyy, start_date) desc

     

    [Edit] Tested with a small data sample, execution plan is the same.

    If this is really an issue, maybe you could create an indexed (materialized) view on the table with just " datepart(yyyy, start_date) as dmyear" as the only column in the view.

    Alternatively an index on thetable, with start_date as the 1st column in the index might help performance.

  • You may also want to think about making a reference table which consist of your distinct years.  How you will keep that table current should be thought through as updating it could be accomplished in a Trigger which checks the DATEPART(yyyy, start_date) for existence in the reference table.   

    I wasn't born stupid - I had to study.

  • It looks lke it runs a little quicker! Thanks for the help.

  • I dont know if this helps to know but it is likely this that is slowing down your qry:

    order by datepart(yyyy, start_date) desc

    Theoretically, if the clustered index is on date you can get away without it.

    the bigger your table gets the worse it gets. I agree with the others on aggregating the data in another source. I would presume anything before 2005 is static anyway.

  • In addition to "select distinct", put an index on the start_date column, if you don't already have one.  That might help it go a little faster.

     

  • Since you're just getting the year, would the year() function perform any better? Probably a wash, but maybe worth trying in addition to other suggestions.

     

     

  • I like this method as well which will take less space than an indexing method will take.


  • Since you're just getting the year, would the year() function perform any better? Probably a wash, but maybe worth trying in addition to other suggestions.


    Year() simply calls Datepart - so you are correct - it is a wash.

  • This is a case where brute force is your friend:

    select Year

    from (select 2003 as Year,

      cast('2003-01-01' as smalldatetime) as StartYear,

      cast('2004-01-01' as smalldatetime) as EndYear

     union all

      select 2004 as Year,

      cast('2004-01-01' as smalldatetime),

      cast('2005-01-01' as smalldatetime)

     union all

      select 2005 as Year,

      cast('2005-01-01' as smalldatetime),

      cast('2006-01-01' as smalldatetime)

     union all

      select 2006 as Year,

      cast('2006-01-01' as smalldatetime),

      cast('2007-01-01' as smalldatetime)

     union all

      select 2007 as Year,

      cast('2007-01-01' as smalldatetime),

      cast('2008-01-01' as smalldatetime)

     /**/) as Years

    where exists

     (select 1 from Mydatabase.adm.activitylog

      where start_date >= Years.StartYear and start_date < Years.EndYear )

    Notes:

    Add as many years to the union-all-derived-table Years as you think will be reasonably useful. Or, enough to last until you're no longer employed there.

    The union-all-derived-table trick only works up to, like 250 rows or something.

    If you have an Integers table, you can use that for the "years generating" part. This is left as an exercise for the reader.

    If you don't have an Integers table and always wanted one, there's one buried in the spt_values table. Hint: type = 'P  ' .

    If there is no index on start_date, performance will suck anyway.

    If you use smalldatetime, this will fail to work after 2079. Please let me know when that happens and I'll come up with an alternative.

    The /**/ is inserted to prevent idioticon from inserting itself in this post.

    If you expect dates to show up from the 1880's or 2525, this obviously won't work. Set tricorders on loop diagnostics to check for tacheon emissions if time travel is a problem for your organization.


    Chris Hofland

  • Or, try this. I would guess performance will be the same, but it's less typing.

    select top 1 2003 from adm.activitylog

    where start_date >= '2003-01-01' and start_date < '2004-01-01'

    union all

    select top 1 2004 from adm.activitylog

    where start_date >= '2004-01-01' and start_date < '2005-01-01'

    union all

    select top 1 2005 from adm.activitylog

    where start_date >= '2005-01-01' and start_date < '2006-01-01'

    union all

    select top 1 2006 from adm.activitylog

    where start_date >= '2006-01-01' and start_date < '2007-01-01'

    -- etc...

    Note: "union all" doesn't return a different result than "union" would, it's just a hint to the optimizer saying not to bother de-duplicating ('cause you know there can't be any dupes).


    Chris Hofland

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply