Filter a table to get unique rows in a certain logic

Question

Filter a table to get unique rows in a certain logic

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 1

ColdCoffee (1/22/2011)
@Jeff, u are absolutely right.. i mis-read the requirement; from morning (its morning in INDIA now, as i write) i was thinking about that only.. So i sat and put together my piece of code to find the unique row . Hope this turns out to be correct
; WITH ROW_NUMBERED_LIST AS
(
SELECT ROW_NUM = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) ,
idMvt1,idMvt2,idMvt3
FROM MVTS
),
Unpivoted_List AS
(
SELECT ROW_NUM , ValInCol
FROM
( SELECT ROW_NUM ,idMvt1,idMvt2,idMvt3 FROM ROW_NUMBERED_LIST ) AS PIVOT_TABLE
UNPIVOT
( ValInCol FOR ColName IN (idMvt1,idMvt2,idMvt3) ) AS PIVOT_HANDLE
--ORDER BY ROW_NUM
) ,
Ranked_List AS
(
SELECT ROW_NUM , ValInCol,
RANKING = ROW_NUMBER() OVER( PARTITION BY ValInCol ORDER BY ROW_NUM )
FROM UNPIVOTED_LIST
) ,
UNIQUE_IDS AS
(
SELECT ROW_NUM
FROM Ranked_List
GROUP BY ROW_NUM
HAVING COUNT( DISTINCT RANKING ) = 1
)
SELECT R_N_LST.ROW_NUM, R_N_LST.idMvt1,R_N_LST.idMvt2,R_N_LST.idMvt3
FROM ROW_NUMBERED_LIST R_N_LST
INNER JOIN UNIQUE_IDS UNIQ
ON R_N_LST.ROW_NUM = UNIQ.ROW_NUM

Very cool, CC... that correctly produces rows 1 and 4 on the original test data. Now I have to study what you did because, frankly, I couldn't get my cold-medicine-induced head around this one. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 2

Gah! Sorry CC... although your code works with the original data, it doesn't work when we insert a new row. Try yours with the following data...

DROP TABLE MVTS

CREATE TABLE MVTS

(

ID INT IDENTITY(1,1),

idMvt1 int,

idMvt2 int,

idMvt3 int

);

INSERT INTO MVTS

(idMvt1,idMvt2,idMvt3)

SELECT 271, 204, 136 UNION ALL --1 This will always show up because it's the "first"

SELECT 271, 202, 136 UNION ALL --2 This should NOT show because 271 and 136 is contained in 1 (which is the only "active" row at this point)

SELECT 268, 253, 204 UNION ALL --3 This should NOT show because 204 is contained in 1 (which is the only "active" row at this point)

SELECT 268, 001, 204 UNION ALL [font="Arial Black"]--New row added to see if Cold Coffee's algo works. It doesn't.[/font]

SELECT 268, 253, 202 UNION ALL --4 This SHOULD show because even though 268 and 253 are in the previous row,

-- none of the elements live in 1 which is the only active row at this point.

SELECT 268, 256, 204 UNION ALL --5 Should NOT show because at least 1 element lives in 1 or 4 which are "active"

SELECT 268, 256, 202 --6 Should NOT show because at least 1 element lives in 4 which is "active"

Heh... the good part about this is that I don't feel like such an idiot on this problem, now. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 3

Can I throw in a bone too?

--===== Peso's code

;WITH cteSource(ID, SequenceID)

AS (

SELECTu.ID,

ROW_NUMBER() OVER (PARTITION BY u.theValue ORDER BY u.ID) AS SequenceID

FROMdbo.Mvts AS s

UNPIVOT(

theValue

FOR theCol IN (s.idMvt1, s.idMvt2, s.idMvt3)

) AS u

)

SELECTID

FROMcteSource

GROUP BYID

--HAVINGSUM(SequenceID) = MIN(SequenceID) * COUNT(*)

HAVINGMIN(SequenceID) = MAX(SequenceID)

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 4

SwePeso (1/23/2011)
Can I throw in a bone too?
--===== Peso's code
;WITH cteSource(ID, SequenceID)
AS (
SELECTu.ID,
ROW_NUMBER() OVER (PARTITION BY u.theValue ORDER BY u.ID) AS SequenceID
FROMdbo.Mvts AS s
UNPIVOT(
theValue
FOR theCol IN (s.idMvt1, s.idMvt2, s.idMvt3)
) AS u
)
SELECTID
FROMcteSource
GROUP BYID
--HAVINGSUM(SequenceID) = MIN(SequenceID) * COUNT(*)
HAVINGMIN(SequenceID) = MAX(SequenceID)

Yep. It's just the bone is missing a bit of marrow. With the following data, it should return ID's 1 AND 5 (labeled as "4" in the comments because I added an extra row to break CC's code).

DROP TABLE MVTS

CREATE TABLE MVTS

(

ID INT IDENTITY(1,1),

idMvt1 int,

idMvt2 int,

idMvt3 int

);

INSERT INTO MVTS

(idMvt1,idMvt2,idMvt3)

SELECT 271, 204, 136 UNION ALL --1 This will always show up because it's the "first"

SELECT 271, 202, 136 UNION ALL --2 This should NOT show because 271 and 136 is contained in 1 (which is the only "active" row at this point)

SELECT 268, 253, 204 UNION ALL --3 This should NOT show because 204 is contained in 1 (which is the only "active" row at this point)

SELECT 268, 001, 204 UNION ALL --New row added to see if Cold Coffee's algo works. It doesn't.

SELECT 268, 253, 202 UNION ALL --4 This SHOULD show because even though 268 and 253 are in the previous row,

-- none of the elements live in 1 which is the only active row at this point.

SELECT 268, 256, 204 UNION ALL --5 Should NOT show because at least 1 element lives in 1 or 4 which are "active"

SELECT 268, 256, 202 --6 Should NOT show because at least 1 element lives in 4 which is "active"

--===== Peso's code

;WITH cteSource(ID, SequenceID)

AS (

SELECT u.ID,

ROW_NUMBER() OVER (PARTITION BY u.theValue ORDER BY u.ID) AS SequenceID

FROM dbo.Mvts AS s

UNPIVOT (

theValue

FOR theCol IN (s.idMvt1, s.idMvt2, s.idMvt3)

) AS u

)

SELECT ID

FROM cteSource

GROUP BY ID

--HAVING SUM(SequenceID) = MIN(SequenceID) * COUNT(*)

HAVING MIN(SequenceID) = MAX(SequenceID)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 5

DECLARE@Sample TABLE

(

ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

idMvt1 int,

idMvt2 int,

idMvt3 int

)

INSERT@Sample

(

idMvt1,

idMvt2,

idMvt3

)

SELECT271, 204, 136 UNION ALL --1 This will always show up because it's the "first"

SELECT271, 202, 136 UNION ALL --2 This should NOT show because 271 and 136 is contained in 1 (which is the only "active" row at this point)

SELECT268, 253, 204 UNION ALL --3 This should NOT show because 204 is contained in 1 (which is the only "active" row at this point)

SELECT268, 001, 204 UNION ALL --4 New row added to see if Cold Coffee's algo works. It doesn't.

SELECT268, 253, 202 UNION ALL --5 This SHOULD show because even though 268 and 253 are in the previous row,

-- none of the elements live in 1 which is the only active row at this point.

SELECT268, 256, 204 UNION ALL --6 Should NOT show because at least 1 element lives in 1 or 4 which are "active"

SELECT268, 256, 202 --7 Should NOT show because at least 1 element lives in 4 which is "active"

--===== Peso's code

;WITH cteSource

AS (

SELECTID,

idMvt1,

idMvt2,

idMvt3

FROM@Sample

WHEREID = 1

UNION ALL

SELECTID,

idMvt1,

idMvt2,

idMvt3

FROM(

SELECTs.ID,

s.idMvt1,

s.idMvt2,

s.idMvt3,

ROW_NUMBER() OVER (ORDER BY s.ID) AS YakID

FROM@Sample AS s

INNER JOINcteSource AS c ON c.ID < s.ID

WHEREs.idMvt1 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

AND s.idMvt2 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

AND s.idMvt3 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

) AS d

WHEREYakID = 1

)

SELECTID,

idMvt1,

idMvt2,

idMvt3

FROMcteSource

N 56°04'39.16"
E 12°55'05.25"

mister.magoo SSC-Forever Points: 47174 More actions · Answer 6

SwePeso (1/23/2011)
...

Sorry Peso, but while testing mine for more complicated datasets, I came across this one that your latest does not cope with...

INSERT INTO MVTS

(idMvt1,idMvt2,idMvt3)

SELECT 1, 2, 3 UNION ALL

SELECT 4, 5, 6 UNION ALL

SELECT 2, 1, 3 -- this row should NOT be selected.

Hope you can work around this because mine is way too slooooowww! and ugly!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 7

Darned good try, Peter. But it doesn't hold up for other data ({edit} Magoo beat me to it while I was writing this post). I know... it's a tough problem to do even with a RBAR loop. Even a "Lasagna" loop is going to take a long time across, say, 5 million rows.

DECLARE @Sample TABLE

(

ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

idMvt1 int,

idMvt2 int,

idMvt3 int

);

--===== Build the test data

INSERT INTO @Sample

(idMvt1, idMvt2, idMvt3)

SELECT TOP 20

idMvt1 = ABS(CHECKSUM(NEWID()))%10+1,

idMvt2 = ABS(CHECKSUM(NEWID()))%10+1,

idMvt3 = ABS(CHECKSUM(NEWID()))%10+1

FROM sys.All_Columns ac1,

sys.All_Columns ac2

;

-----------------------------------------------------------------------------------------------------------------------

--===== Peso's code

;WITH cteSource

AS (

SELECT ID,

idMvt1,

idMvt2,

idMvt3

FROM @Sample

WHERE ID = 1

UNION ALL

SELECT ID,

idMvt1,

idMvt2,

idMvt3

FROM (

SELECT s.ID,

s.idMvt1,

s.idMvt2,

s.idMvt3,

ROW_NUMBER() OVER (ORDER BY s.ID) AS YakID

FROM @Sample AS s

INNER JOIN cteSource AS c ON c.ID < s.ID

WHERE s.idMvt1 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

AND s.idMvt2 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

AND s.idMvt3 NOT IN (c.idMvt1, c.idMvt2, c.idMvt3)

) AS d

WHERE YakID = 1

)

SELECT ID,

idMvt1,

idMvt2,

idMvt3

FROM cteSource

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 8

mister.magoo (1/23/2011)
Hope you can work around this because mine is way too slooooowww! and ugly!

Glad to hear that my code isn't the only slow and ugly one on this problem. :blush:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 9

Jeff Moden (1/23/2011)
mister.magoo (1/23/2011)
Hope you can work around this because mine is way too slooooowww! and ugly!
Glad to hear that my code isn't the only slow and ugly one on this problem. :blush:

At the moment, my code works but has been running for three minutes and not completed on a 1M row set....:pinch:

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 10

Jeff Moden (1/23/2011)
mister.magoo (1/23/2011)
Hope you can work around this because mine is way too slooooowww! and ugly!
Glad to hear that my code isn't the only slow and ugly one on this problem. :blush:

Ok, this may be ugly, but it processes 1M rows in 1 second and correctly processes all the test cases posted so far.

IF OBJECT_ID ('tempdb..#accepted') IS NOT NULL DROP TABLE #accepted;

CREATE TABLE #accepted

(

ID INT IDENTITY(1,1) ,

VALUE INT

);

CREATE INDEX ix1 ON #accepted(VALUE);

--= allow identity inserts - this is key to the solution

SET IDENTITY_INSERT #accepted ON;

--= pop the first row in

INSERT #accepted(ID,VALUE)

SELECT ID,VALUE

FROM mvts AS o

UNPIVOT (VALUE FOR c IN (idMvt1,idMvt2,idMvt3)) upvt

WHERE id=1;

--== now do a yucky WHILE loop to process the rows one at a time into the accepted table.

--== ROWCOUNT=3 because I unpivot the values

WHILE @@ROWCOUNT=3

INSERT #accepted(ID,VALUE)

SELECT ID, VALUE

FROM (

SELECT TOP 1 ID,idMvt1,idMvt2,idMvt3

FROM mvts AS o

WHERE NOT EXISTS (SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt1)

AND NOT EXISTS ( SELECT1 FROM #accepted AS a WHEREa.value = o.idMvt2)

AND NOT EXISTS ( SELECT 1 FROM #accepted AS a WHERE a.value = o.idMvt3)

AND id>SCOPE_IDENTITY() --= using SCOPE_IDENTITY to tell me the last ID I stored in the table.

ORDER BY id

) NextRow(ID,idMvt1,idMvt2,idMvt3)

UNPIVOT (

VALUE FOR c IN (idMvt1,idMvt2,idMvt3)

) upvt

SET IDENTITY_INSERT #accepted OFF;

SELECT mvts.id,mvts.idMvt1,mvts.idMvt2,mvts.idMvt3

FROM mvts AS mvts

WHERE EXISTS (SELECT 1 FROM #accepted AS a WHERE a.id=mvts.id)

ORDER BY mvts.id

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 11

Ugly is OK... especially if Ugly works fast. I'll give it a try.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 12

OMG! How did the old cartoon series used to say it? "Ahhhh, Magoo... You've done it again!". Well done. NOW I can go to bed! 😛

Even on my 9 year old, single 1.8Ghz P4 CPU, it still only takes 14 seconds on a million rows. Freakin' awesome job, Magoo. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 13

Ingenious use. 🙂

One word of warning though set MAXDOP 1 , scope_identity can return incorrect results in a parallel query.

Clear Sky SQL
My Blog[/url]

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 14

Dave Ballantyne (1/23/2011)
Ingenious use. 🙂
One word of warning though set MAXDOP 1 , scope_identity can return incorrect results in a parallel query.

And using partitions, right?

N 56°04'39.16"
E 12°55'05.25"

moadh.bs SSC Veteran Points: 299 More actions · Answer 15

Thx everyone I would never hoped for this number of solutions, I will shrotly post the results.