Query to find data over range of years

Question

Post reply

Query to find data over range of years

DB_Newbie2007

SSCarpal Tunnel

Points: 4449
More actions
January 25, 2008 at 8:01 am

#178563

I seem to be mind-numb today, as I cannot figure out a good way to do the following query. I am trying to find what customers have been with the company since 2003,
select distinct CustNumber from dbo.InvoicesView
where CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2003')
and CustNumber in (select CustNumber
from InvoicesView where Year(InvoiceDate) = '2004')
and CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2005')
and CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2006')
and CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2007')
There must be a more elegant way to run this query... I am just not thinking of it at the moment! 🙁
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Adam Haines SSC-Insane Points: 23217 More actions · Answer 1

How about?

select distinct CustNumber

from dbo.InvoicesView

where Year(InvoiceDate) >= 2003

My blog: http://jahaines.blogspot.com

Lowell SSC Guru Points: 323518 More actions · Answer 2

since YEAR() function returns an integer, you could use where Year() > 2003 instead:

select distinct CustNumber

from dbo.InvoicesView

where CustNumber in

( select distinct CustNumber

from dbo.InvoicesView

where Year(InvoiceDate) >=2003)

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Mark Cowne One Orange Chip Points: 26952 More actions · Answer 3

Are you after continuous employment, i.e. no breaks?

In which case try

select CustNumber

from dbo.InvoicesView

where Year(InvoiceDate) between 2003 and 2007 -- 5 years

group by CustNumber

having count(distinct Year(InvoiceDate))=5

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 4

Unfortunately, I believe that query will give me all customers in the database. I specifically need to find customers that have invoices in 2003, 2004, 2005, 2006 and 2007.

So, if a customer has an invoice in year 2007 they will match the criteria in the query

select distinct CustNumber

from dbo.InvoicesView

where Year(InvoiceDate) >= 2003

🙂

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 5

It's hardly optimal, but you could:

SELECT...

FROM...

WHERE YEAR(OrderDate) IN (2001,2002,2003,2007)

You could add those values to a temp table & join on it. That would probably be better than than in IN statement.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 6

I like Mark's because it will only return those customers who have orders in all the included years. Much more precise.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 7

Just in case anyone is interested:

My query using all of the derived tables took 1 hour, 13 minutes to run.

Mark's query using the group/having clause took 14 minutes, 32 seconds.

The database that is being queried has 89+ Million records and is a very "WIDE" table (lots of fields per record).

Thanks to Mark for the suggestion! 🙂 🙂

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

GSquared SSC Guru Points: 260824 More actions · Answer 8

The simplest solution I can think of uses derived tables or CTEs and joins to them. But you said you don't want to do that. Any particular reason to not use those?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Jack Corbett SSC Guru Points: 184394 More actions · Answer 9

DB_Newbie2007 (1/25/2008)
Just in case anyone is interested:
My query using all of the derived tables took 1 hour, 13 minutes to run.
Mark's query using the group/having clause took 14 minutes, 32 seconds.
The database that is being queried has 89+ Million records and is a very "WIDE" table (lots of fields per record).
Thanks to Mark for the suggestion! 🙂 🙂

If this is going to be a regularly run query you may want to consider modifying your table to have an indexed year column as using the Year() function is limiting use of indexes (or indices whichever you prefer). What does your query plan look like?

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

GSquared SSC Guru Points: 260824 More actions · Answer 10

The other thing that's probably hurting the performance is the Distinct operator in the derived tables. That's a lot of extra work for the database engine, but doesn't actually affect the end result of the query.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 11

To answer GSquared question, my experience is that derived tables (like the ones I used) or CTEs are not as fast or efficient being more "creative" with the query (as you see from my last post, the "creative" query from Mark ran much faster). 🙂

To answer Jack, only a few queries will be run using this data (some data mining). If I was planning on running this daily or weekly, I would look into optimizing the query, test out some indexes and statistics, etc., look into the cost of the index versus disk space and other query performance, etc. (I hope I am learning from this site!!!!) 😉

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

GSquared SSC Guru Points: 260824 More actions · Answer 12

Just ran some tests.

Mark's solution was much faster than derived tables. Tried it on 50k rows of data. Difference of 14 seconds average (derived tables) vs a maximum run time of 128 milliseconds (group by).

Tried adding a computed, persisted column on the year of the date, querying against that cut the execution time in half (five runs with, five without). Scans was 1 in each case, both ways, computed column cut logical reads by c. 10%.

Added an index on the customer number and the computed year columns, cut execution time in half again. Average run time at this point of 28 milliseconds.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 13

When you want data from a date range, it is usually better to do the selection in the form of:

where MyDate >= @StartDatetime and MyDate < @EndDatetime

This will allow the query to take advantage of any index on the datetime column.

For your query, I would try this:

select

a.CustNumber

from

(

select

aa.CustNumber

from

dbo.InvoicesView aa

where

aa.InvoiceDate >= '20030101' and

aa.InvoiceDate < '20080101'

group by

aa.CustNumber,

year(aa.InvoiceDate)

) a

group by

a.CustNumber

having

count(*) = 5

order by

a.CustNumber