January 25, 2008 at 8:01 am
I seem to be mind-numb today, as I cannot figure out a good way to do the following query. I am trying to find what customers have been with the company since 2003,
select distinct CustNumber from dbo.InvoicesView
where CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2003')
and CustNumber in (select CustNumber
from InvoicesView where Year(InvoiceDate) = '2004')
and CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2005')
and CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2006')
and CustNumber in (select distinct CustNumber
from dbo.InvoicesView where Year(InvoiceDate) = '2007')
There must be a more elegant way to run this query... I am just not thinking of it at the moment! 🙁
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)
January 25, 2008 at 8:06 am
How about?
select distinct CustNumber
from dbo.InvoicesView
where Year(InvoiceDate) >= 2003
January 25, 2008 at 8:07 am
since YEAR() function returns an integer, you could use where Year() > 2003 instead:
select distinct CustNumber
from dbo.InvoicesView
where CustNumber in
( select distinct CustNumber
from dbo.InvoicesView
where Year(InvoiceDate) >=2003)
Lowell
January 25, 2008 at 8:14 am
Are you after continuous employment, i.e. no breaks?
In which case try
select CustNumber
from dbo.InvoicesView
where Year(InvoiceDate) between 2003 and 2007 -- 5 years
group by CustNumber
having count(distinct Year(InvoiceDate))=5
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 25, 2008 at 8:15 am
Unfortunately, I believe that query will give me all customers in the database. I specifically need to find customers that have invoices in 2003, 2004, 2005, 2006 and 2007.
So, if a customer has an invoice in year 2007 they will match the criteria in the query
select distinct CustNumber
from dbo.InvoicesView
where Year(InvoiceDate) >= 2003
🙂
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)
January 25, 2008 at 9:02 am
It's hardly optimal, but you could:
SELECT...
FROM...
WHERE YEAR(OrderDate) IN (2001,2002,2003,2007)
You could add those values to a temp table & join on it. That would probably be better than than in IN statement.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 25, 2008 at 9:04 am
I like Mark's because it will only return those customers who have orders in all the included years. Much more precise.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 25, 2008 at 10:04 am
Just in case anyone is interested:
My query using all of the derived tables took 1 hour, 13 minutes to run.
Mark's query using the group/having clause took 14 minutes, 32 seconds.
The database that is being queried has 89+ Million records and is a very "WIDE" table (lots of fields per record).
Thanks to Mark for the suggestion! 🙂 🙂
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)
January 25, 2008 at 11:10 am
The simplest solution I can think of uses derived tables or CTEs and joins to them. But you said you don't want to do that. Any particular reason to not use those?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 25, 2008 at 11:29 am
DB_Newbie2007 (1/25/2008)
Just in case anyone is interested:My query using all of the derived tables took 1 hour, 13 minutes to run.
Mark's query using the group/having clause took 14 minutes, 32 seconds.
The database that is being queried has 89+ Million records and is a very "WIDE" table (lots of fields per record).
Thanks to Mark for the suggestion! 🙂 🙂
If this is going to be a regularly run query you may want to consider modifying your table to have an indexed year column as using the Year() function is limiting use of indexes (or indices whichever you prefer). What does your query plan look like?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 25, 2008 at 11:38 am
The other thing that's probably hurting the performance is the Distinct operator in the derived tables. That's a lot of extra work for the database engine, but doesn't actually affect the end result of the query.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 25, 2008 at 11:41 am
To answer GSquared question, my experience is that derived tables (like the ones I used) or CTEs are not as fast or efficient being more "creative" with the query (as you see from my last post, the "creative" query from Mark ran much faster). 🙂
To answer Jack, only a few queries will be run using this data (some data mining). If I was planning on running this daily or weekly, I would look into optimizing the query, test out some indexes and statistics, etc., look into the cost of the index versus disk space and other query performance, etc. (I hope I am learning from this site!!!!) 😉
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)
January 25, 2008 at 12:05 pm
Just ran some tests.
Mark's solution was much faster than derived tables. Tried it on 50k rows of data. Difference of 14 seconds average (derived tables) vs a maximum run time of 128 milliseconds (group by).
Tried adding a computed, persisted column on the year of the date, querying against that cut the execution time in half (five runs with, five without). Scans was 1 in each case, both ways, computed column cut logical reads by c. 10%.
Added an index on the customer number and the computed year columns, cut execution time in half again. Average run time at this point of 28 milliseconds.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 25, 2008 at 1:34 pm
When you want data from a date range, it is usually better to do the selection in the form of:
where MyDate >= @StartDatetime and MyDate < @EndDatetime
This will allow the query to take advantage of any index on the datetime column.
For your query, I would try this:
select
a.CustNumber
from
(
select
aa.CustNumber
from
dbo.InvoicesView aa
where
aa.InvoiceDate >= '20030101' and
aa.InvoiceDate < '20080101'
group by
aa.CustNumber,
year(aa.InvoiceDate)
) a
group by
a.CustNumber
having
count(*) = 5
order by
a.CustNumber
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply