Duplicate record count

  • - I've formatted it and added a new column [RowType] to help you to understand what's happening.

    -- Run it and investigate the output.

    When i execute above query it is displaying both duplicate and non duplicate records .

    Comment out the HAVING COUNT(*) > 0 line with two hyphens, same as this line,

    -- and run it again. Investigate the output. It should be the same as the last run.

    When i execute the query it is displaying both duplicate and non duplicate records .

    -- Uncomment HAVING COUNT(*) > 0 and change it to HAVING COUNT(*) > 1.

    -- run it and investigate the output. Report back what you see.

    When i execute the query it is displaying only duplicate records.

    what should i do next??

  • pilla.sree85 (7/17/2014)


    ...

    what should i do next??

    Remember what you've observed and give yourself some credit for learning it. Next:

    -- Having corrected the filter 'HAVING COUNT(*) > 1',

    -- change the table joins to be ANSI-compliant.

    -- Old-style *outer joins are no longer supported by SQL server

    -- Run the query and confirm that the results are the same as

    -- the last version you ran using old-style joins.

    SELECT

    COUNT(L1.AcctNo) totalcount,

    [RowType] = CASE

    WHEN COUNT(*) = 1 THEN 'Not a dupe'

    WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

    ELSE 'Nonsense' END,

    L1.AcctNo,

    L1.ReceivedFileID

    FROM Legacy L1

    INNER JOIN Legacy L2

    ON L2.AcctNo = L1.AcctNo

    INNER JOIN ReceivedFiles r

    ON r.ReceivedFileID = L1.ReceivedFileID

    WHERE L1.MarketCode NOT IN (SELECT MarketCode FROM Markets WHERE (AllowDupes = '1' OR UPPER(AllowDupes) = 'Y' ))

    GROUP BY L1.AcctNo, L1.ReceivedFileID

    HAVING COUNT(*) > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • -- Next I've commented out the join to Legacy L2.

    -- It appears to be redundant for this purpose, but

    -- will add to the cost of running the query, i.e. how long it takes to run.

    -- when you run this, you should get the same number of rows returned as the last query,

    -- but the values in totalcount will be halved.

    -- Notice how easy it is to remove a table from the FROM list when you use ANSI-compliant joins.

    SELECT

    COUNT(L1.AcctNo) totalcount,

    [RowType] = CASE

    WHEN COUNT(*) = 1 THEN 'Not a dupe'

    WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

    ELSE 'Nonsense' END,

    L1.AcctNo,

    L1.ReceivedFileID

    FROM Legacy L1

    --INNER JOIN Legacy L2

    --ON L2.AcctNo = L1.AcctNo

    INNER JOIN ReceivedFiles r

    ON r.ReceivedFileID = L1.ReceivedFileID

    WHERE L1.MarketCode NOT IN (SELECT MarketCode FROM Markets WHERE (AllowDupes = '1' OR UPPER(AllowDupes) = 'Y' ))

    GROUP BY L1.AcctNo, L1.ReceivedFileID

    HAVING COUNT(*) > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • i have learned lot from ur suggestions.

    i have executed above query it is giving results as previous

  • -- Next, a little reformatting. Note that I've changed the expression for totalcount,

    -- to indicate that you are counting rows. Aggregate functions COUNT, SUM, AVG etc ignore

    -- rows where the parameter is null. If you have any rows where AcctNo is null, then COUNT(*)

    -- will count them but COUNT(AcctNo) will not.

    -- I've also changed the filter on the Markets table, for two reasons; it's more readable, and it's

    -- now SARGable i.e. SQL Server can now use an appropriate index (if one exists) to speed up the subselect.

    SELECT

    [totalcount] = COUNT(*),

    [RowType] = CASE

    WHEN COUNT(*) = 1 THEN 'Not a dupe'

    WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

    ELSE 'Nonsense' END,

    l.AcctNo,

    l.ReceivedFileID

    FROM Legacy l

    INNER JOIN ReceivedFiles r

    ON r.ReceivedFileID = l.ReceivedFileID

    WHERE l.MarketCode NOT IN (

    SELECT MarketCode FROM Markets WHERE AllowDupes IN ('1','y','Y') --(AllowDupes = '1' OR UPPER(AllowDupes) = 'Y' )

    )

    GROUP BY l.AcctNo, l.ReceivedFileID

    HAVING COUNT(*) > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • -- Next, check to see if table ReceivedFiles is required in the query,

    -- perhaps you can get away with checking to see if there is a value or not in l.ReceivedFileID

    -- I've also changed the Markets table check. Since EXISTS is safer than IN because of NULL handling,

    -- I tend to use EXISTS instead of IN and NOT EXISTS instead of NOT IN.

    SELECT

    [totalcount] = COUNT(*),

    [RowType] = CASE

    WHEN COUNT(*) = 1 THEN 'Not a dupe'

    WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

    ELSE 'Nonsense' END,

    l.AcctNo,

    l.ReceivedFileID

    FROM Legacy l

    --INNER JOIN ReceivedFiles r

    --ON r.ReceivedFileID = l.ReceivedFileID

    WHERE l.ReceivedFileID IS NOT NULL

    AND NOT EXISTS (

    SELECT 1

    FROM Markets m

    WHERE m.MarketCode = l.MarketCode

    AND m.AllowDupes IN ('1','y','Y')

    )

    GROUP BY l.AcctNo, l.ReceivedFileID

    HAVING COUNT(*) > 1

    -- So, is the table [ReceivedFiles] required?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Till now i am good.then how would i check the condition duplicate count >1 then i need to execute query for further processing in order to display output as Duplicate

  • pilla.sree85 (7/17/2014)


    Till now i am good.then how would i check the condition duplicate count >1 then i need to execute query for further processing in order to display output as Duplicate

    If you have finished modifying the last query I sent you, then post it here and folks will show you how to proceed.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply