August 17, 2010 at 7:43 pm
Hello, I just had an interview where I was asked to write a query that only returned records when a member id occured 3 or more times.
Example: A table contains records of customer complaints with name, email, comments etc.
Return results where the customer name has appeared in the table 3 or more times.
I tried:
SELECT name, email, comments,
COUNT(name) AS NumOccurrences
FROM custComplaint
GROUP BY name, email, comments
HAVING ( COUNT(name) > 2 )
but NumOccurrances always equals 1 so no results.
If I change to HAVING ( COUNT(name) = 1 )
it returns all records.
Am I taking the wrong approach? Or is there something missing in my query?
August 17, 2010 at 10:15 pm
This approach seems correct. the NumOfOccurences always showing 1 means there are no duplicate names in your table. try and insert 2 duplicate values(in dev environment) and then re-run the query. it will return rows that have more than 3 occurences.
try below:
declare @t table (name varchar(255), email varchar(255), comments varchar(255))
insert into @t
select 'abc' as name, 'abc.abc' as email, 'Duplicate' as comments
insert into @t
select 'abc' as name, 'abc.abc' as email, 'Duplicate' as comments
insert into @t
select 'abc' as name, 'abc.abc' as email, 'Duplicate' as comments
insert into @t
select 'cba' as name, 'cba.cba' as email, 'Single' as comments
SELECT name, email, comments,
COUNT(name) AS NumOccurrences
FROM @t
GROUP BY name, email, comments
HAVING (COUNT(name) > 2)
August 17, 2010 at 11:21 pm
try to use ROWNUMBER() function , it will give you more flexibilty
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
August 18, 2010 at 12:12 pm
Thanks for the reply,
When I paste your query it works perfectly, but when I change it to apply to one of my existing tables NumOccurrences always = 1 even if there are obviuos duplicates.
The table I am using is :
tblContacts
With Columns:
CONID int
Type nvarchar(10)
LID int
CID int
ContactCode int
Notes text
Date datetime
EmpID int
Query:
SELECT CONID, Type, LID, CID, ContactCode,
COUNT(CID) AS NumOccurrences
FROM tblContacts
GROUP BY CONID, Type,LID, CID, ContactCode, Date, EmpID
HAVING (COUNT(CID) > 2 )
Returns an empty table.
When I change > 2 to = 1 I see why.
The results I am getting:
CONID TYPE LID CID ContactCode NumOccurrences
30772 Group 12 1 0 1
31249 Group 12 1 3 1
31840 Group 15 3 0 1
31511 Group 10 7 0 1
31515 Group 10 7 3 1
31251 Group 13 8 3 1
31061 Group 13 8 0 1
31374 Group 13 8 3 1
As you can see there are many instances where CID duplicates occur and yet the NumOccurrences = 1
August 18, 2010 at 12:46 pm
The problem here is that you are grouping by "CONID, Type,LID, CID, ContactCode, Date, EmpID", so only repetitions of same data for all those rows would "add" to the count you are doing.
You probably want to aggregate the data first and then display those and the other columns you want to display.
August 18, 2010 at 2:24 pm
gspeedtech
To assist you in understanding the problem (Or at least I hope this help)
Execute this command
SET SHOWPLAN_ALL ON
GO
1. Execute your query
2. Examine the execution plan. Note the Query Optimizer has elected to use a "Stream Aggregate Operator".
3. Using Books On Line subject "Stream Aggregate Showplan Operator " execute the example given.
4. Review what this implies;(Emphasis added by the poster)
the query optimizer uses the Stream Aggregate operator to group the rows in the SalesPerson table by the TerritoryID column and then calculate the values for the AVG(Bonus) and SUM(SalesYTD) columns based on that grouping.
I hope this assists you in gaining an understanding of what is happening "under the covers" - so to speak.
If you are further interested in learning what value an execution plan can be to understanding what is happening vist:
http://www.sqlservercentral.com/articles/books/65831/
The above link will allow you to download in PDF format an excellent book by Grant Fritchey.
If the above does not assist you in understand what is happening post again and some one more familiar with T-SQL might be able to assist you further.
August 18, 2010 at 2:33 pm
Eureka! I Found it.
I created a temp table and Inner Joined it to produce the desired results.
SELECT CID,
COUNT(CID) AS NumOccurrences
Into tblCID FROM tblContacts
GROUP BY CID
HAVING (COUNT(CID) > 2)
order by CID
Select CONID, Type, LID, tblContacts.CID, ContactCode
from tblcontacts
inner Join tblCID
on tblContacts.CID = tblCID.CID
order by tblContacts.CID
Drop Table tblCID
Please let me know if you can see why my approach would be ill advised or offer inconsistant results.
Thanks to all for the help.
August 18, 2010 at 2:34 pm
bitbucket-25253
I will definitely look into this to get a better understanding.
Thanks for the reply!
August 18, 2010 at 5:19 pm
Review the effect of your "SELECT INTO"
From BOL:
Note that your last posting of the T-SQL that works:
The SELECT INTO statement creates a new table and populates it with the result set of the SELECT statement
IF, and that is a big assumption on my part.... looking at your last posted T_SQL the table
SELECT CID,
COUNT(CID) AS NumOccurrences
Into tblCID FROM tblContacts
table "tblCID" does not exist until your T-SQL executes and since the query optimizer is attempting to process the data in the most efficient manner - BEFORE the query is actually executed it of course can not find the nonexistent table. You might want to use a CREATE TABLE statement and then insert the data into it... Check CREATE TABLE in BOL and by the way read up on a true temporary table defined as #tablename .. or #tblCid - which must be created and then populated.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply