Identifying discrepancies

Question

Identifying discrepancies

ccmret

Right there with Babe

Points: 759
More actions
March 17, 2014 at 10:56 am

#283483

I have a series of records based on empid where I want to identify the empid that may have discrepancies listed. I have some empids that are listed more than once and have different DOB's. In the example I am trying to Create a DOB_ERROR column and either say yes if the DOB doesn't match the other records in the file with the same empid.
SELECT
Empid,
DOB,
CASE WHEN DOB = DOB THEN 'No' ELSE 'Yes' END AS DOB_ERROR,
City,
St,
Gender
FROM Emp
WHERE EMPID IN
('12335', '23456', '545432','231245')

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Adam Angelini SSCrazy Eights Points: 8548 More actions · Answer 1

Not completely sure what your end results need to be, but to strictly get a list of EMPIds that are listed more than once without the same DOB, this would work.

select Empid,count(DOB)

from Emp

group by Empid having count(distinct DOB) > 1

ccmret Right there with Babe Points: 759 More actions · Answer 2

I want to select the empid listed more than once and then mark DOB as an ERROR if the dates are not equal.

mickyT SSChampion Points: 10360 More actions · Answer 3

ccmret (3/17/2014)
I want to select the empid listed more than once and then mark DOB as an ERROR if the dates are not equal.

So something like this? If not then post up some sample data and expected result

WITH SampleData AS (

SELECT *

FROM (VALUES

(12335,CAST('19900101' AS DATETIME),'One'),

(12335,CAST('19900101' AS DATETIME),'Two'),

(23456,CAST('19900101' AS DATETIME),'One'),

(545432,CAST('19900101' AS DATETIME),'One'),

(545432,CAST('19900301' AS DATETIME),'Two'),

(545432,CAST('19900228' AS DATETIME),'Three'),

(231245,CAST('19900101' AS DATETIME),'One')

) AS EMP(EmpID, DOB, OtherStuff)

)

SELECT Empid,

CASE WHEN MAX(DOB) OVER (PARTITION BY EmpID) <> MIN(DOB) OVER (PARTITION BY EmpID) THEN 'Yes' ELSE 'No' END DOB_ERROR,

OtherStuff

FROM SampleData;

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

While MickyT's solution will work (thanks Micky for the sample data!), I tend to shy away from the window aggregate functions for performance reasons.

WITH SampleData AS

(

SELECT *

FROM (VALUES

(12335,CAST('19900101' AS DATETIME),'One'),

(12335,CAST('19900101' AS DATETIME),'Two'),

(23456,CAST('19900101' AS DATETIME),'One'),

(545432,CAST('19900101' AS DATETIME),'One'),

(545432,CAST('19900301' AS DATETIME),'Two'),

(545432,CAST('19900228' AS DATETIME),'Three'),

(231245,CAST('19900101' AS DATETIME),'One')

) AS EMP(EmpID, DOB, OtherStuff)

),

PreAggregate AS

(

SELECT EmpID, MinDOB=MIN(DOB), MaxDOB=MAX(DOB)

FROM SampleData

GROUP BY EmpID

)

SELECT a.Empid

,DOB_ERROR=CASE WHEN MaxDOB <> MinDOB THEN 'Yes' ELSE 'No' END

,OtherStuff

FROM PreAggregate a

JOIN SampleData b ON a.EmpID = b.EmpID;

While this query looks more complicated, there is ample reason to believe it will perform better:

The Performance of the T-SQL Window Functions [/url]

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

ccmret (3/17/2014)
I want to select the empid listed more than once and then mark DOB as an ERROR if the dates are not equal.

So if you have an empid with two rows, each with different DOB, they both get marked as error? This logic would mark every row (of empid dupes) as error so long as the DOB's are not identical for each row of the dupe set. It's going nowhere. Change the way you're thinking about this.

What if there are two rows in the dupe set with the same DOB - no mark?

With different DOB - mark one of them?

With three rows it gets more complicated

All three have same DOB - no mark

All three have different DOB - mark two of them as error

Two of them have the same DOB, the third is different - mark the third as error.

Here's a query which may help you decide what you really want to do:

-- sample data setup

DROP TABLE #SampleData

SELECT *

INTO #SampleData

FROM (VALUES

(12335,CAST('19900101' AS DATETIME),'One'),

(12335,CAST('19900101' AS DATETIME),'Two'),

(23456,CAST('19900101' AS DATETIME),'One'),

(545432,CAST('19900101' AS DATETIME),'One'),

(545432,CAST('19900301' AS DATETIME),'Two'),

(545432,CAST('19900228' AS DATETIME),'Three'),

(545434,CAST('19900101' AS DATETIME),'One'),

(545434,CAST('19900101' AS DATETIME),'Two'),

(545434,CAST('19900228' AS DATETIME),'Three'),

(231245,CAST('19900101' AS DATETIME),'One')

) EMP (EmpID, DOB, OtherStuff)

-- useful query

SELECT

s.EmpID,

d.EmpID_Dupes,

s.DOB,

DOB_Dupes = COUNT(*),

DOB_Count = COUNT(*) OVER(PARTITION BY s.EmpID)

FROM #SampleData s

INNER JOIN (

SELECT

EmpID,

EmpID_Dupes = COUNT(*)

FROM #SampleData

GROUP BY EmpID

HAVING COUNT(*) > 1

) d ON d.EmpID = s.EmpID

GROUP BY s.EmpID, s.DOB, d.EmpID_Dupes

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden