December 3, 2013 at 10:32 am
I can figure a number of Big Hammer methods of doing this job, but I"m trying to improve myself and stop hitting things with hammers.
Given:
USE [tempdb]
GO
CREATE TABLE [Person]
(
IPCode INT
);
CREATE TABLE [Profile]
(
IPCode INT
, ID NVARCHAR(32)
);
INSERT INTO [Person] VALUES (1);
INSERT INTO [Person] VALUES (2);
INSERT INTO [Person] VALUES (3);
INSERT INTO [Person] VALUES (4);
INSERT INTO [Profile] VALUES (1,'AAAA');
INSERT INTO [Profile] VALUES (1,'AAAA');
INSERT INTO [Profile] VALUES (2,'BBBB');
INSERT INTO [Profile] VALUES (2,'BBBB');
INSERT INTO [Profile] VALUES (2,'BBBC');
INSERT INTO [Profile] VALUES (3,'CCCC');
INSERT INTO [Profile] VALUES (3,'CCCC');
INSERT INTO [Profile] VALUES (3,'CCCC');
INSERT INTO [Profile] VALUES (4,'DDDD');
INSERT INTO [Profile] VALUES (4,'DDDA');
INSERT INTO [Profile] VALUES (4,'DDDB');
INSERT INTO [Profile] VALUES (4,'DDDD');
INSERT INTO [Profile] VALUES (4,'DDDD');
SELECT *
FROM [Person] AS p
JOIN [Profile] AS pr
ON [p].[IPCode] = [pr].[IPCode];
DROP TABLE [Person];
DROP TABLE [Profile];
How would I make the query return only the rows from [Profile] with the same [IPCode] and different [ID]'s. i.e. - I should get back:
2,BBBB
2,BBBC
4,DDDD
4,DDDA
4,DDDB
In the application involved [Person] has about 500K rows, about 10K of which have rows in [Profile] and there are rarely more than 10 rows in [Profile] for a [Person].
[Meta-question: is this an effective way to ask this question in this community?]
December 3, 2013 at 11:21 am
This should get you what you need:
WITH
xxx AS
(
SELECTpe.IPCode AS pe_ip,
pr.IPCode AS pr_ip,
pr.ID,
DENSE_RANK() OVER (PARTITION BY pr.IPCode ORDER BY ID) AS x
FROM Person pe
CROSS APPLY [Profile] pr
WHERE pe.IPCode=pr.IPCode
),
qualifiers AS
(
SELECT DISTINCT pr_ip
FROM xxx
WHERE x>1
)
SELECT q.pr_ip , xxx.ID
FROM qualifiers q
CROSS APPLY xxx
WHERE q.pr_ip=xxx.pr_ip
GROUP BY q.pr_ip, ID
Both tables should also have clustered indexes and non-clustered index on the join keys (e.g. on IP_Code). You would also want to include NULL constraints; it's a best practice to make all your columns not-nullable whenever possible.
[Meta-question: is this an effective way to ask this question in this community?]
Including DDL and sample data is super-helpful. It's good to include any constraints and indexes that are present too (if you have not done so).
-- Itzik Ben-Gan 2001
December 3, 2013 at 11:31 am
OK, that's going to take some looking at :-D. I can feel self-improvement coming on.
Many thanks.
December 3, 2013 at 11:54 am
This might do the trick as well.
SELECT DISTINCT IPCode, ID
FROM Profile p
WHERE EXISTS( SELECT IPCode
FROM Profile x
WHERE p.IPCode = x.IPCode
GROUP BY IPCode HAVING COUNT( DISTINCT ID) > 1)
December 3, 2013 at 1:48 pm
I ran both solutions on your sample data and Luis' solution was WAY faster, almost double. But the solution provided by Alan uses so many different methods to arrive at the same answer it is worth the study to see how it all comes together. 2 ctes, 1 Dense_Rank, and 2 cross applys on a small dataset provide an excellent learning opportunity.
Run the first cte by itself and study its output with cross apply. Then add the 2nd cte. Try it without the x>1 conditional to see why that is important.
Great stuff here.
__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/
December 3, 2013 at 2:49 pm
It's a great thing that you found an answer that will perform correctly and will be fast and simple to understand and an answer that will help you to learn more about different techniques.
I'm sure there are many possible variations to obtain what you need and that's the greatest thing about SQL.:-D
December 3, 2013 at 3:04 pm
I'm pretty happy with the responses! I look forward to learning from both of them.
December 3, 2013 at 5:37 pm
And another slightly different way for the sake of it...
;with data as
(
SELECT p.IPCode,pr.ID,count(*) over(partition by p.IPCode) as UID
FROM [Person] AS p
JOIN [Profile] AS pr
ON [p].[IPCode] = [pr].[IPCode]
group by p.IPCode,pr.ID
)
select IPCode,ID
from data
where UID>1
Got it slightly wrong, I realised...
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 3, 2013 at 5:54 pm
Here's another way that avoids DISTINCT:
SELECT a.IPCode, ID
FROM Profile a
JOIN
(
SELECT IPCode
FROM Profile
GROUP BY IPCode
HAVING MAX(ID) <> MIN(ID)
) b ON a.IPCode = b.IPCode
GROUP BY a.IPCode, ID;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 6, 2013 at 1:15 pm
I can't rival the other guys on this board, but I am using this as practice for myself and would like (constructive but friendly) feedback on the solution I came up with if anyone is willing. 🙂
WITH CTE1 AS
(
SELECT IPCode, ID
FROM [Profile]
GROUP BY IPCode, ID
),
CTE2 AS
(
SELECT IPCode
FROM CTE1
GROUP BY IPCode
HAVING COUNT(*) > 1
)
SELECT IPCode, ID
FROM [Profile]
WHERE IPCode IN
(SELECT IPCode FROM CTE2)
GROUP BY IPCode, ID
December 10, 2013 at 5:36 pm
autoexcrement (12/6/2013)
I can't rival the other guys on this board, but I am using this as practice for myself and would like (constructive but friendly) feedback on the solution I came up with if anyone is willing. 🙂
Actually you have done quite well my young Jedi! Witness the following 1,000,000 row test harness incorporating the 4 solutions suggested.
CREATE TABLE [Person]
(
IPCode INT
);
CREATE TABLE [Profile]
(
IPCode INT
, ID NVARCHAR(32)
);
--CREATE INDEX id1 ON [Profile](IPCode, ID);
--INSERT INTO [Person] VALUES (1);
--INSERT INTO [Person] VALUES (2);
--INSERT INTO [Person] VALUES (3);
--INSERT INTO [Person] VALUES (4);
WITH Tally (n) AS
(
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO [Person]
SELECT n
FROM Tally;
--INSERT INTO [Profile] VALUES (1,'AAAA');
--INSERT INTO [Profile] VALUES (1,'AAAA');
--INSERT INTO [Profile] VALUES (2,'BBBB');
--INSERT INTO [Profile] VALUES (2,'BBBB');
--INSERT INTO [Profile] VALUES (2,'BBBC');
--INSERT INTO [Profile] VALUES (3,'CCCC');
--INSERT INTO [Profile] VALUES (3,'CCCC');
--INSERT INTO [Profile] VALUES (3,'CCCC');
--INSERT INTO [Profile] VALUES (4,'DDDD');
--INSERT INTO [Profile] VALUES (4,'DDDA');
--INSERT INTO [Profile] VALUES (4,'DDDB');
--INSERT INTO [Profile] VALUES (4,'DDDD');
--INSERT INTO [Profile] VALUES (4,'DDDD');
WITH Tally (n) AS
(
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO [Profile]
SELECT a.n, RIGHT(10000+b.n/20, 4)
FROM Tally a
CROSS APPLY
(
SELECT n
FROM Tally
WHERE n <= 100
) b;
PRINT 'Alan.B';
SET STATISTICS IO, TIME ON;
WITH
xxx AS
(
SELECTpe.IPCode AS pe_ip,
pr.IPCode AS pr_ip,
pr.ID,
DENSE_RANK() OVER (PARTITION BY pr.IPCode ORDER BY ID) AS x
FROM Person pe
CROSS APPLY [Profile] pr
WHERE pe.IPCode=pr.IPCode
),
qualifiers AS
(
SELECT DISTINCT pr_ip
FROM xxx
WHERE x>1
)
SELECT q.pr_ip , xxx.ID
FROM qualifiers q
CROSS APPLY xxx
WHERE q.pr_ip=xxx.pr_ip
GROUP BY q.pr_ip, ID;
SET STATISTICS IO, TIME OFF;
PRINT 'Luis Cazares';
SET STATISTICS IO, TIME ON;
SELECT DISTINCT IPCode, ID
FROM Profile p
WHERE EXISTS( SELECT IPCode
FROM Profile x
WHERE p.IPCode = x.IPCode
GROUP BY IPCode HAVING COUNT( DISTINCT ID) > 1)
SET STATISTICS IO, TIME OFF;
PRINT 'Dwain.C';
SET STATISTICS IO, TIME ON;
SELECT a.IPCode, ID
FROM Profile a
JOIN
(
SELECT IPCode
FROM [Profile]
GROUP BY IPCode
HAVING MAX(ID) <> MIN(ID)
) b ON a.IPCode = b.IPCode
GROUP BY a.IPCode, ID;
SET STATISTICS IO, TIME OFF;
PRINT 'Autoexcrement';
SET STATISTICS IO, TIME ON;
WITH CTE1 AS
(
SELECT IPCode, ID
FROM [Profile]
GROUP BY IPCode, ID
),
CTE2 AS
(
SELECT IPCode
FROM CTE1
GROUP BY IPCode
HAVING COUNT(*) > 1
)
SELECT IPCode, ID
FROM [Profile]
WHERE IPCode IN
(SELECT IPCode FROM CTE2)
GROUP BY IPCode, ID;
SET STATISTICS IO, TIME OFF;
GO
DROP TABLE [Person];
DROP TABLE [Profile];
Without the INDEXI've proposed the results are as follows:
Alan.B
(60000 row(s) affected)
Table 'Person'. Scan count 10, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4461 ms, elapsed time = 4203 ms.
Luis Cazares
(60000 row(s) affected)
Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2073 ms, elapsed time = 938 ms.
Dwain.C
(60000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3526 ms, elapsed time = 1330 ms.
Autoexcrement
(60000 row(s) affected)
Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2199 ms, elapsed time = 892 ms.
With the INDEX, the results are this:
Alan.B
(60000 row(s) affected)
Table 'Person'. Scan count 10, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4791 ms, elapsed time = 3098 ms.
Luis Cazares
(60000 row(s) affected)
Table 'Profile'. Scan count 2, logical reads 7508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 405 ms, elapsed time = 470 ms.
Dwain.C
(60000 row(s) affected)
Table 'Profile'. Scan count 10, logical reads 7776, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 998 ms, elapsed time = 584 ms.
Autoexcrement
(60000 row(s) affected)
Table 'Profile'. Scan count 2, logical reads 7508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 405 ms, elapsed time = 456 ms.
So yours is the fastest in both cases, no doubt due to the fact that you aggregate the aggregate results.
Nicely done!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 10, 2013 at 7:07 pm
Holy cow!
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply