Duplicate record count

Question

Post reply

Duplicate record count

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

pilla.sree85 SSC Enthusiast Points: 139 More actions · Answer 1

- I've formatted it and added a new column [RowType] to help you to understand what's happening.

-- Run it and investigate the output.

When i execute above query it is displaying both duplicate and non duplicate records .

Comment out the HAVING COUNT(*) > 0 line with two hyphens, same as this line,

-- and run it again. Investigate the output. It should be the same as the last run.

When i execute the query it is displaying both duplicate and non duplicate records .

-- Uncomment HAVING COUNT(*) > 0 and change it to HAVING COUNT(*) > 1.

-- run it and investigate the output. Report back what you see.

When i execute the query it is displaying only duplicate records.

what should i do next??

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

pilla.sree85 (7/17/2014)
...
what should i do next??

Remember what you've observed and give yourself some credit for learning it. Next:

-- Having corrected the filter 'HAVING COUNT(*) > 1',

-- change the table joins to be ANSI-compliant.

-- Old-style *outer joins are no longer supported by SQL server

-- Run the query and confirm that the results are the same as

-- the last version you ran using old-style joins.

SELECT

COUNT(L1.AcctNo) totalcount,

[RowType] = CASE

WHEN COUNT(*) = 1 THEN 'Not a dupe'

WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

ELSE 'Nonsense' END,

L1.AcctNo,

L1.ReceivedFileID

FROM Legacy L1

INNER JOIN Legacy L2

ON L2.AcctNo = L1.AcctNo

INNER JOIN ReceivedFiles r

ON r.ReceivedFileID = L1.ReceivedFileID

WHERE L1.MarketCode NOT IN (SELECT MarketCode FROM Markets WHERE (AllowDupes = '1' OR UPPER(AllowDupes) = 'Y' ))

GROUP BY L1.AcctNo, L1.ReceivedFileID

HAVING COUNT(*) > 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

-- Next I've commented out the join to Legacy L2.

-- It appears to be redundant for this purpose, but

-- will add to the cost of running the query, i.e. how long it takes to run.

-- when you run this, you should get the same number of rows returned as the last query,

-- but the values in totalcount will be halved.

-- Notice how easy it is to remove a table from the FROM list when you use ANSI-compliant joins.

SELECT

COUNT(L1.AcctNo) totalcount,

[RowType] = CASE

WHEN COUNT(*) = 1 THEN 'Not a dupe'

WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

ELSE 'Nonsense' END,

L1.AcctNo,

L1.ReceivedFileID

FROM Legacy L1

--INNER JOIN Legacy L2

--ON L2.AcctNo = L1.AcctNo

INNER JOIN ReceivedFiles r

ON r.ReceivedFileID = L1.ReceivedFileID

WHERE L1.MarketCode NOT IN (SELECT MarketCode FROM Markets WHERE (AllowDupes = '1' OR UPPER(AllowDupes) = 'Y' ))

GROUP BY L1.AcctNo, L1.ReceivedFileID

HAVING COUNT(*) > 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

pilla.sree85 SSC Enthusiast Points: 139 More actions · Answer 4

i have learned lot from ur suggestions.

i have executed above query it is giving results as previous

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

-- Next, a little reformatting. Note that I've changed the expression for totalcount,

-- to indicate that you are counting rows. Aggregate functions COUNT, SUM, AVG etc ignore

-- rows where the parameter is null. If you have any rows where AcctNo is null, then COUNT(*)

-- will count them but COUNT(AcctNo) will not.

-- I've also changed the filter on the Markets table, for two reasons; it's more readable, and it's

-- now SARGable i.e. SQL Server can now use an appropriate index (if one exists) to speed up the subselect.

SELECT

[totalcount] = COUNT(*),

[RowType] = CASE

WHEN COUNT(*) = 1 THEN 'Not a dupe'

WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

ELSE 'Nonsense' END,

l.AcctNo,

l.ReceivedFileID

FROM Legacy l

INNER JOIN ReceivedFiles r

ON r.ReceivedFileID = l.ReceivedFileID

WHERE l.MarketCode NOT IN (

SELECT MarketCode FROM Markets WHERE AllowDupes IN ('1','y','Y') --(AllowDupes = '1' OR UPPER(AllowDupes) = 'Y' )

)

GROUP BY l.AcctNo, l.ReceivedFileID

HAVING COUNT(*) > 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

-- Next, check to see if table ReceivedFiles is required in the query,

-- perhaps you can get away with checking to see if there is a value or not in l.ReceivedFileID

-- I've also changed the Markets table check. Since EXISTS is safer than IN because of NULL handling,

-- I tend to use EXISTS instead of IN and NOT EXISTS instead of NOT IN.

SELECT

[totalcount] = COUNT(*),

[RowType] = CASE

WHEN COUNT(*) = 1 THEN 'Not a dupe'

WHEN COUNT(*) > 1 THEN 'AcctNo duplicate'

ELSE 'Nonsense' END,

l.AcctNo,

l.ReceivedFileID

FROM Legacy l

--INNER JOIN ReceivedFiles r

--ON r.ReceivedFileID = l.ReceivedFileID

WHERE l.ReceivedFileID IS NOT NULL

AND NOT EXISTS (

SELECT 1

FROM Markets m

WHERE m.MarketCode = l.MarketCode

AND m.AllowDupes IN ('1','y','Y')

)

GROUP BY l.AcctNo, l.ReceivedFileID

HAVING COUNT(*) > 1

-- So, is the table [ReceivedFiles] required?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

pilla.sree85 SSC Enthusiast Points: 139 More actions · Answer 7

Till now i am good.then how would i check the condition duplicate count >1 then i need to execute query for further processing in order to display output as Duplicate

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

pilla.sree85 (7/17/2014)
Till now i am good.then how would i check the condition duplicate count >1 then i need to execute query for further processing in order to display output as Duplicate

If you have finished modifying the last query I sent you, then post it here and folks will show you how to proceed.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden