More efficient way

Question

Post reply

More efficient way

Chris Smalley

SSC Veteran

Points: 274
More actions
February 2, 2005 at 10:41 am

#74855

Here is a question from one of my developers. Is there a faster way to do this?

select datepart(yyyy, start_date) as dmyear
from Mydatabase.adm.activitylog
GROUP BY datepart(yyyy, start_date)
order by datepart(yyyy, start_date) desc
Basically, it returns a list of only the years that exist in that table. The problem is that the activitylog is 5.7 million records long, and growing, so this simple query takes about 20 seconds to return the list of 2005, 2004, 2003, & 2002. Any more efficient ways to arrive at this list?

Thanks!!!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

Untested, but may be more efficient:

select Distinct datepart(yyyy, start_date) as dmyear

from Mydatabase.adm.activitylog

order by datepart(yyyy, start_date) desc

[Edit] Tested with a small data sample, execution plan is the same.

If this is really an issue, maybe you could create an indexed (materialized) view on the table with just " datepart(yyyy, start_date) as dmyear" as the only column in the view.

Alternatively an index on thetable, with start_date as the 1st column in the index might help performance.

Farrell Keough SSCoach Points: 17414 More actions · Answer 2

You may also want to think about making a reference table which consist of your distinct years. How you will keep that table current should be thought through as updating it could be accomplished in a Trigger which checks the DATEPART(yyyy, start_date) for existence in the reference table.

I wasn't born stupid - I had to study.

Chris Smalley SSC Veteran Points: 274 More actions · Answer 3

It looks lke it runs a little quicker! Thanks for the help.

michanne SSCrazy Points: 2329 More actions · Answer 4

I dont know if this helps to know but it is likely this that is slowing down your qry:

order by datepart(yyyy, start_date) desc

Theoretically, if the clustered index is on date you can get away without it.

the bigger your table gets the worse it gets. I agree with the others on aggregating the data in another source. I would presume anything before 2005 is static anyway.

Edwin Frydendall SSCrazy Points: 2361 More actions · Answer 5

In addition to "select distinct", put an index on the start_date column, if you don't already have one. That might help it go a little faster.

dmbaker SSCertifiable Points: 5140 More actions · Answer 6

Since you're just getting the year, would the year() function perform any better? Probably a wash, but maybe worth trying in addition to other suggestions.

Antares686 SSC Guru Points: 125444 More actions · Answer 7

I like this method as well which will take less space than an indexing method will take.

michanne SSCrazy Points: 2329 More actions · Answer 8

Since you're just getting the year, would the year() function perform any better? Probably a wash, but maybe worth trying in addition to other suggestions.

Year() simply calls Datepart - so you are correct - it is a wash.

sql_servant Old Hand Points: 315 More actions · Answer 9

This is a case where brute force is your friend:

select Year

from (select 2003 as Year,

cast('2003-01-01' as smalldatetime) as StartYear,

cast('2004-01-01' as smalldatetime) as EndYear

union all

select 2004 as Year,

cast('2004-01-01' as smalldatetime),

cast('2005-01-01' as smalldatetime)

union all

select 2005 as Year,

cast('2005-01-01' as smalldatetime),

cast('2006-01-01' as smalldatetime)

union all

select 2006 as Year,

cast('2006-01-01' as smalldatetime),

cast('2007-01-01' as smalldatetime)

union all

select 2007 as Year,

cast('2007-01-01' as smalldatetime),

cast('2008-01-01' as smalldatetime)

/**/) as Years

where exists

(select 1 from Mydatabase.adm.activitylog

where start_date >= Years.StartYear and start_date < Years.EndYear )

Notes:

Add as many years to the union-all-derived-table Years as you think will be reasonably useful. Or, enough to last until you're no longer employed there.

The union-all-derived-table trick only works up to, like 250 rows or something.

If you have an Integers table, you can use that for the "years generating" part. This is left as an exercise for the reader.

If you don't have an Integers table and always wanted one, there's one buried in the spt_values table. Hint: type = 'P ' .

If there is no index on start_date, performance will suck anyway.

If you use smalldatetime, this will fail to work after 2079. Please let me know when that happens and I'll come up with an alternative.

The /**/ is inserted to prevent idioticon from inserting itself in this post.

If you expect dates to show up from the 1880's or 2525, this obviously won't work. Set tricorders on loop diagnostics to check for tacheon emissions if time travel is a problem for your organization.

Chris Hofland

sql_servant Old Hand Points: 315 More actions · Answer 10

Or, try this. I would guess performance will be the same, but it's less typing.

select top 1 2003 from adm.activitylog

where start_date >= '2003-01-01' and start_date < '2004-01-01'

union all

select top 1 2004 from adm.activitylog

where start_date >= '2004-01-01' and start_date < '2005-01-01'

union all

select top 1 2005 from adm.activitylog

where start_date >= '2005-01-01' and start_date < '2006-01-01'

union all

select top 1 2006 from adm.activitylog

where start_date >= '2006-01-01' and start_date < '2007-01-01'

-- etc...

Note: "union all" doesn't return a different result than "union" would, it's just a hint to the optimizer saying not to bother de-duplicating ('cause you know there can't be any dupes).

Chris Hofland

More efficient way

Cookies on SQLServerCentral