February 2, 2005 at 10:41 am
Here is a question from one of my developers. Is there a faster way to do this?
select datepart(yyyy, start_date) as dmyear
from Mydatabase.adm.activitylog
GROUP BY datepart(yyyy, start_date)
order by datepart(yyyy, start_date) desc
Basically, it returns a list of only the years that exist in that table. The problem is that the activitylog is 5.7 million records long, and growing, so this simple query takes about 20 seconds to return the list of 2005, 2004, 2003, & 2002. Any more efficient ways to arrive at this list?
Thanks!!!
February 2, 2005 at 10:47 am
Untested, but may be more efficient:
select Distinct datepart(yyyy, start_date) as dmyear
from Mydatabase.adm.activitylog
order by datepart(yyyy, start_date) desc
[Edit] Tested with a small data sample, execution plan is the same.
If this is really an issue, maybe you could create an indexed (materialized) view on the table with just " datepart(yyyy, start_date) as dmyear" as the only column in the view.
Alternatively an index on thetable, with start_date as the 1st column in the index might help performance.
February 2, 2005 at 10:53 am
You may also want to think about making a reference table which consist of your distinct years. How you will keep that table current should be thought through as updating it could be accomplished in a Trigger which checks the DATEPART(yyyy, start_date) for existence in the reference table.
I wasn't born stupid - I had to study.
February 2, 2005 at 11:09 am
It looks lke it runs a little quicker! Thanks for the help.
February 3, 2005 at 8:44 am
I dont know if this helps to know but it is likely this that is slowing down your qry:
order by datepart(yyyy, start_date) desc
Theoretically, if the clustered index is on date you can get away without it.
the bigger your table gets the worse it gets. I agree with the others on aggregating the data in another source. I would presume anything before 2005 is static anyway.
February 3, 2005 at 8:51 am
In addition to "select distinct", put an index on the start_date column, if you don't already have one. That might help it go a little faster.
February 3, 2005 at 8:56 am
Since you're just getting the year, would the year() function perform any better? Probably a wash, but maybe worth trying in addition to other suggestions.
February 3, 2005 at 8:57 am
I like this method as well which will take less space than an indexing method will take.
February 3, 2005 at 10:29 am
Since you're just getting the year, would the year() function perform any better? Probably a wash, but maybe worth trying in addition to other suggestions.
Year() simply calls Datepart - so you are correct - it is a wash.
February 3, 2005 at 12:33 pm
This is a case where brute force is your friend:
select Year
from (select 2003 as Year,
cast('2003-01-01' as smalldatetime) as StartYear,
cast('2004-01-01' as smalldatetime) as EndYear
union all
select 2004 as Year,
cast('2004-01-01' as smalldatetime),
cast('2005-01-01' as smalldatetime)
union all
select 2005 as Year,
cast('2005-01-01' as smalldatetime),
cast('2006-01-01' as smalldatetime)
union all
select 2006 as Year,
cast('2006-01-01' as smalldatetime),
cast('2007-01-01' as smalldatetime)
union all
select 2007 as Year,
cast('2007-01-01' as smalldatetime),
cast('2008-01-01' as smalldatetime)
/**/) as Years
where exists
(select 1 from Mydatabase.adm.activitylog
where start_date >= Years.StartYear and start_date < Years.EndYear )
Notes:
Add as many years to the union-all-derived-table Years as you think will be reasonably useful. Or, enough to last until you're no longer employed there.
The union-all-derived-table trick only works up to, like 250 rows or something.
If you have an Integers table, you can use that for the "years generating" part. This is left as an exercise for the reader.
If you don't have an Integers table and always wanted one, there's one buried in the spt_values table. Hint: type = 'P ' .
If there is no index on start_date, performance will suck anyway.
If you use smalldatetime, this will fail to work after 2079. Please let me know when that happens and I'll come up with an alternative.
The /**/ is inserted to prevent idioticon from inserting itself in this post.
If you expect dates to show up from the 1880's or 2525, this obviously won't work. Set tricorders on loop diagnostics to check for tacheon emissions if time travel is a problem for your organization.
Chris Hofland
February 3, 2005 at 1:11 pm
Or, try this. I would guess performance will be the same, but it's less typing.
select top 1 2003 from adm.activitylog
where start_date >= '2003-01-01' and start_date < '2004-01-01'
union all
select top 1 2004 from adm.activitylog
where start_date >= '2004-01-01' and start_date < '2005-01-01'
union all
select top 1 2005 from adm.activitylog
where start_date >= '2005-01-01' and start_date < '2006-01-01'
union all
select top 1 2006 from adm.activitylog
where start_date >= '2006-01-01' and start_date < '2007-01-01'
-- etc...
Note: "union all" doesn't return a different result than "union" would, it's just a hint to the optimizer saying not to bother de-duplicating ('cause you know there can't be any dupes).
Chris Hofland
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply