February 12, 2009 at 9:14 am
I have table with structure as
srnoidentity(1,1) int
Fname varchar(30),
lname varchar(30),
dob datetime,
batch varchar(10),
Id1int
Id2 int,
Id3 int
Now If the data exists as
1A B1/1/2000ABC99999999900
2A B1/1/2000ABC00999999999
(Above set will fall in duplicate as one of the ids is notnullable and that matches with the other id ex: Id1 in 1st rec matches Id3 in 2nd rec so a duplicate)
3A B1/1/2005ABC88888888800
4A B1/1/2005ABC000
(Above set wont be under duplicate as although fname, lname, dob and batch is same but id of 3rd rec which is not nullable doesnt match up with any ids in 4th rec so nota duplicate)
5J K1/1/2004AAA00777777777
6J K1/1/2004AAA7777777777777777770
(Above set will fall in duplicate as one of the ids is notnullable and that matches with the other id ex: Id3 in 5th rec matches Id1 and Id2 in 6th rec so a duplicate)
7X Y1/1/2003BBB04567891230
8X Y1/1/2003BBB00456789123
9X Y1/1/2003BBB000
(Above set will fall in duplicate as one of the ids is notnullable and that matches with the other id ex: Id2 in 7th rec matches Id3 in 8th rec so a duplicate, but 9th rec wont be under a duplicate as all ids are blank and doesnt match up with 7th or 8th rec)
Now what i tried doing is get the set of recs where fname, lname, dob and batch matches doing a group by on these column
select fname, lname, dob, batch from table1
group by fname, lname, dob, batch having count(*) >1
this gives me duplicates based on abve cols how from thsi list can i get only the duplicate recs considering the Id1, Id2 and Id3 columns.
Any help on this will be greatly apprecaited. TIA
February 12, 2009 at 9:18 am
Join the table to itself in a query where the record numbers don't match, but either the ID1, ID2 or ID3 do match.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 12, 2009 at 9:42 am
But unfortunately that gives my multiple records or am i doing something wrong
select a.fname, a.lname, a.dob, a.batch,a.id1, a.id2, a.id3
from table1 a, table1 b where ( a.fname = b.fname and a.lname = b.lname and a.dob=b.dob and a.batch = b.batch) and
((a.id1 = b.id1 or a.id1 = b.id2 or a.id1 = b.id3)
OR (a.id2 = b.id1 or a.id2 = b.id2 or a.id2 = b.id3)
OR (a.id3 = b.id1 or a.id3 = b.id2 or a.id3 = b.id3))
February 12, 2009 at 10:44 am
-- *** Test Data ***
DECLARE @t TABLE
(
    srno int NOT NULL
    ,FName varchar(20) NOT NULL
    ,LName varchar(20) NOT NULL
    ,dob datetime NOT NULL
    ,batch varchar(10) NOT NULL
    ,Id1 int NOT NULL
    ,Id2 int NOT NULL
    ,Id3 int NOT NULL
)
INSERT INTO @t
SELECT 1, 'A', 'B', '20000101', 'ABC', 999999999, 0, 0 UNION ALL
SELECT 2, 'A', 'B', '20000101', 'ABC', 0, 0, 999999999 UNION ALL
SELECT 3, 'A', 'B', '20050101', 'ABC', 888888888, 0, 0 UNION ALL
SELECT 4, 'A', 'B', '20050101', 'ABC', 0, 0, 0 UNION ALL
SELECT 5, 'A', 'B', '20040101', 'ABC', 0, 0, 777777777 UNION ALL
SELECT 6, 'A', 'B', '20040101', 'ABC', 777777777, 777777777, 0 UNION ALL
SELECT 7, 'A', 'B', '20040101', 'ABC', 0, 456789123, 0 UNION ALL
SELECT 8, 'A', 'B', '20040101', 'ABC', 0, 0, 456789123 UNION ALL
SELECT 9, 'A', 'B', '20040101', 'ABC', 0, 0, 0
-- *** End Test Data ***
SELECT T.*
FROM @t T
    LEFT JOIN
    (
        SELECT MIN(T1.srno) AS srno
            ,T1.FName, T1.LName, T1.dob, T1.batch, I.Id
        FROM @t T1
            JOIN
            (
                SELECT DISTINCT *
                FROM
                (
                    SELECT srno
                        ,CASE N.N
                            WHEN 1 THEN Id1
                            WHEN 2 THEN Id2
                            ELSE Id3
                        END AS Id
                    FROM @t
                        CROSS JOIN
                        (
                            SELECT 1 UNION ALL
                            SELECT 2 UNION ALL
                            SELECT 3
                        ) N(N)
                ) D
                WHERE Id > 0    
            ) I
                ON T1.srno = I.srno
        GROUP BY T1.FName, T1.LName, T1.dob, T1.batch, I.Id
    ) D
        ON T.srno = D.srno
WHERE D.srno IS NOT NULL
    OR (T.Id1 = 0 AND T.Id2 = 0 AND T.Id3 = 0)
February 12, 2009 at 12:51 pm
Thank so much Ken for your sql. Its gets me 1 existance of every record where either id1,id or id3 exists. howevere for a records as this
3 A B2005-01-01 00:00:00.000ABC888888888 0 0
4 A B2005-01-01 00:00:00.000ABC0 0 0
if you see these 2 recs these dont fall in duplicate category as 888888888 doesnt exists in 4th rec either as id1 or id2 or id3. how can this records be filtered out as these wont fall in duplicate record category. any thoughts on this.
TIA
February 13, 2009 at 2:59 am
Did you run the OR (T.Id1 = 0 AND T.Id2 = 0 AND T.Id3 = 0) part of the query?
February 16, 2009 at 12:32 pm
Thanks Ken. you are right if i exclude thsi part of the sql
OR (T.Id1 = 0 AND T.Id2 = 0 AND T.Id3 = 0)
i get the data as
1AB2000-01-01 00:00:00.000ABC99999999900
3AB2005-01-01 00:00:00.000ABC88888888800
5AB2004-01-01 00:00:00.000ABC00777777777
7AB2004-01-01 00:00:00.000ABC04567891230
However if you see original insert for a record with srno 3 the id1 has value as '888888888' but a duplicate of that record which is srno doesnt ahve a value for id1, id2 or id3. As a result this wont fall in duplicate record category what happens if a 0 is substitued by '' or with a null value. can teh sql be tweaked to take care of that situation also
February 17, 2009 at 1:38 am
OR (ISNULL(NULLIF(T.Id1, ''), 0) AND ... )
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply