Filter a table to get unique rows in a certain logic

Question

Filter a table to get unique rows in a certain logic

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 1

mister.magoo (1/24/2011)
Dave,
I haven't tested yours, but can see that it does not meet with Sir Jeff's strict guidelines....

BWAA-HAA!!!... I'm just the humble tester here. The OP posted the original guidelines. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 2

Jeff Moden (1/24/2011)
mister.magoo (1/24/2011)
Dave,
I haven't tested yours, but can see that it does not meet with Sir Jeff's strict guidelines....
BWAA-HAA!!!... I'm just the humble tester here. The OP posted the original guidelines. 😛

Thought you might like that!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

WayneS SSC Guru Points: 95461 More actions · Answer 3

mister.magoo (1/24/2011)
TABLOCKX on mvts knock about 0.5 second off the time (14%) for a Pool of 10,000
I will add this tweak to the code when I run the performance tests later.
Good call Wayne!

I have to admit... the problem initially had me thinking "quirky update" until I realized that it wouldn't handle it. But the QU stuck in my head, and with the talk about the partitioning and the maxdop 1, it popped back up to the top, and I started wondering if any of the hints in that process would help out here. I'm glad it makes a difference, and I'm interested to see the results of it on the larger set. (Just for the heck of it - try it on the temp table also!)

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

mister.magoo SSC-Forever Points: 47174 More actions · Answer 4

WayneS (1/24/2011)
I have to admit... the problem initially had me thinking "quirky update" until I realized that it wouldn't handle it. But the QU stuck in my head, and with the talk about the partitioning and the maxdop 1, it popped back up to the top, and I started wondering if any of the hints in that process would help out here. I'm glad it makes a difference, and I'm interested to see the results of it on the larger set. (Just for the heck of it - try it on the temp table also!)

I also started out thinking QU but soon realised that the potentially massive recursion required would be a double-hard problem for that.

What is needed is a "Double-Barrelled Quirky Update" or an "Ultimate Quirky Update" or something....

I will try the TABLOCKX on both tables and on larger sets.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 5

Here is another approach. Instead of searching for "no duplicate" I keep track of which values that have been used.

SET NOCOUNT ON

CREATE TABLE#Sample

(

ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

idMvt1 INT NOT NULL,

idMvt2 INT NOT NULL,

idMvt3 INT NOT NULL

)

INSERT#Sample

(

idMvt1,

idMvt2,

idMvt3

)

SELECTTOP(1000000)

ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt1,

ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt2,

ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt3

FROMsys.All_Columns ac1,

sys.All_Columns ac2

-- Solution here

CREATE TABLE#Values

(

idMvt INT PRIMARY KEY NONCLUSTERED

)

INSERT#Values

(

idMvt

)

SELECT DISTINCTu.idMvt

FROM#Sample AS s

UNPIVOT(

idMvt

FOR Col IN (s.idMvt1, s.idMvt2, s.idMvt3)

) AS u

CREATE TABLE#Result

(

ID INT PRIMARY KEY CLUSTERED

)

DECLARE@CurrID INT,

@OldID INT,

@idMvt1 INT,

@idMvt2 INT,

@idMvt3 INT

SELECT TOP(1)@CurrID = ID,

@idMvt1 = idMvt1,

@idMvt2 = idMvt2,

@idMvt3 = idMvt3

FROM#Sample

ORDER BYID

INSERT#Result

(

ID

)

VALUES(

@CurrID

)

DELETE

FROM#Values

WHEREidMvt IN (@idMvt1, @idMvt2, @idMvt3)

WHILE @CurrID IS NOT NULL

BEGIN

SELECT@OldID = @CurrID,

@CurrID = NULL

SELECT TOP(1)@CurrID = s.ID,

@idMvt1 = s.idMvt1,

@idMvt2 = s.idMvt2,

@idMvt3 = s.idMvt3

FROM#Sample AS s

WHEREEXISTS(SELECT * FROM #Values AS v WHERE v.idMvt = s.idMvt1)

AND EXISTS(SELECT * FROM #Values AS v WHERE v.idMvt = s.idMvt2)

AND EXISTS(SELECT * FROM #Values AS v WHERE v.idMvt = s.idMvt3)

AND s.ID > @OldID

ORDER BYs.ID

IF @CurrID IS NOT NULL

BEGIN

INSERT#Result

(

ID

)

VALUES(

@CurrID

)

DELETE

FROM#Values

WHEREidMvt IN (@idMvt1, @idMvt2, @idMvt3)

END

SELECTID

FROM#Result

DROP TABLE#Sample,

#Values,

#Result

N 56°04'39.16"
E 12°55'05.25"

mister.magoo SSC-Forever Points: 47174 More actions · Answer 6

Look at us, all doing WHILE loops ! Someone should take a photograph to preserve the moment 😛

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 7

Performance Stats as promised

First my solution WITH and WITHOUT TABLOCKX on MVTS and #accepted:

And next a comparison with SwePeso's solution which shows an interesting variance between the two solutions for small and large sets....

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

How about this as a solution?

SET NOCOUNT ON

CREATE TABLE #Sample

(

ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

idMvt1 INT NOT NULL,

idMvt2 INT NOT NULL,

idMvt3 INT NOT NULL

)

INSERT #Sample

(

idMvt1,

idMvt2,

idMvt3

)

SELECT TOP(1000000)

ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt1,

ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt2,

ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt3

FROM sys.all_columns ac1,

sys.all_columns ac2;

-- Solution here

set statistics io on;

set statistics time on;

with MyIDs (

Mvt,

ID

) as (

select

idMvt1,

ID

from

#Sample

union

select

idMvt2,

ID

from

#Sample

union

select

idMvt3,

ID

from

#Sample

),

UniqueMvtIDs (

Mvt,

ID

) as (

select

Mvt,

min(ID)

from

MyIDs

group by

Mvt

)

select distinct

s.*

from

#Sample s

inner join UniqueMvtIDs umi

on (s.ID = umi.ID)

;

set statistics time off;

set statistics io off;

drop table #Sample;

Simple testing using statistics time on and io on:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Sample_____________________________________________________________________________________________________________000000000050'. Scan count 4, logical reads 9403, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 5875 ms, elapsed time = 6341 ms.

!!!! Nevermind, doesn't work !!!!

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 9

Lynn Pettis (1/24/2011)
How about this as a solution?
SET NOCOUNT ON
CREATE TABLE #Sample
(
ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
idMvt1 INT NOT NULL,
idMvt2 INT NOT NULL,
idMvt3 INT NOT NULL
)
INSERT #Sample
(
idMvt1,
idMvt2,
idMvt3
)
SELECT TOP(1000000)
ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt1,
ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt2,
ABS(CHECKSUM(NEWID())) % 10000 + 1 AS idMvt3
FROM sys.all_columns ac1,
sys.all_columns ac2;
-- Solution here
set statistics io on;
set statistics time on;
with MyIDs (
Mvt,
ID
) as (
select
idMvt1,
ID
from
#Sample
union
select
idMvt2,
ID
from
#Sample
union
select
idMvt3,
ID
from
#Sample
),
UniqueMvtIDs (
Mvt,
ID
) as (
select
Mvt,
min(ID)
from
MyIDs
group by
Mvt
)
select distinct
s.*
from
#Sample s
inner join UniqueMvtIDs umi
on (s.ID = umi.ID)
;
set statistics time off;
set statistics io off;
drop table #Sample;
Simple testing using statistics time on and io on:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sample_____________________________________________________________________________________________________________000000000050'. Scan count 4, logical reads 9403, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5875 ms, elapsed time = 6341 ms.
!!!! Nevermind, doesn't work !!!!

The problem is, it doesn't work correctly. I changed the number of rows to 20 and the random number generators to 10 and here's what I got very first time out...

ID idMvt1 idMvt2 idMvt3

----------- ----------- ----------- -----------

1 8 9 1

2 10 3 10

3 10 1 4

4 5 2 6

7 7 9 6

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 10

Agh! Sorry, Lynn, I missed where you said...

!!!! Nevermind, doesn't work !!!!

... at the bottom of your post.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

Jeff Moden (1/24/2011)
Agh! Sorry, Lynn, I missed where you said...
!!!! Nevermind, doesn't work !!!!
... at the bottom of your post.

That's okay, I think I fixed it!

Here is the corrected code:

with MyIDs (

Mvt,

ID

) as (

select

idMvt1,

ID

from

#Sample

union all

select

idMvt2,

ID

from

#Sample

union all

select

idMvt3,

ID

from

#Sample

),

UniqueMvtIDs (

Mvt,

ID

) as (

select

Mvt,

min(ID)

from

MyIDs

group by

Mvt

)

select distinct

s.*

into

#Results

from

#Sample s

where

s.idMvt1 in (select umi.Mvt from UniqueMvtIDs umi where s.ID <= umi.ID)

and s.idMvt2 in (select umi.Mvt from UniqueMvtIDs umi where s.ID <= umi.ID)

and s.idMvt3 in (select umi.Mvt from UniqueMvtIDs umi where s.ID <= umi.ID)

;

Use the original test suite from my previous post.

And here is the simple timing of the code with output to the screen:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Sample_____________________________________________________________________________________________________________000000000059'. Scan count 30, logical reads 31020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 22015 ms, elapsed time = 12492 ms.

And writing to a temp table (#Results):

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Sample_____________________________________________________________________________________________________________00000000005C'. Scan count 30, logical reads 31020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 21873 ms, elapsed time = 11964 ms.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 12

Excellent solution. Good Job Mr. Magoo

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

mister.magoo SSC-Forever Points: 47174 More actions · Answer 13

CirquedeSQLeil (1/26/2011)
Excellent solution. Good Job Mr. Magoo

Thank you - it would be good to know how the OP got on..hint.hint...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 14

Okay, tested my solution against that of mister.magoo and his won. Mine returns the same result set with the same sample data.

I guess I'm going to have to start playing more with pivot and unpivot.

Well done, sir!

moadh.bs SSC Veteran Points: 299 More actions · Answer 15

Hi everyone !!

like I've said I opted for a CLR solution, even if we do prefer a sql one. it was basically because I was short in time. but right now I'm working on these solutions and I have to say The results were quite a surprise!

I attached the set I'm using for testing, it contains 64178 rows, this is far from a set with millions of rows but I have the expected results here

I tested 4 solutions :

- Jeff Moden & Peso's solution within 12 sec it throws a max recursivity exception

- the first solution of Mister Magoo and Lynn Pettis's solutions were good but doesn't return the expected result

- ColdCoffee's first solutio turn in 608 ms but it returns only 158 rows out of the expected 163

- Mister Magoo's final solution was perfect (even if I couldn't turn it on temporay table or table variable) it returns the 163 combinations in 3 seconds

I will shortly post the result on a larger scale.