Delete duplication with where clause condition

  • Hi,

    I want to delete those employees who have at least 02 marital status and their first marital status is not in logical order as following

    -- Marital Status Logical Order

    --1 = Single

    --2 = Married

    --3 = Widowed

    --4 = Divorced

    USE TEMPDB

    GO

    IF OBJECT_ID('temdb.dbo.#temp_delete_duplicate') IS NOT NULL

    DROP TABLE #temp_delete_duplicate

    CREATE TABLE #temp_delete_duplicate

    (

    --emphist int identity(1,1),

    empno int,

    marital_status char(8),

    status_change_date datetime default getdate()

    )

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (1, 'SINGLE', '01-JAN-2010')

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (2, 'MARRIED', '01-JAN-2010')

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (3, 'SINGLE', '01-JAN-2010')

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (3, 'DIVORCED', '31-DEC-2010')

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (4, 'WIDOWED', '01-JAN-2010')

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (4, 'SINGLE', '30-SEP-2010')

    INSERT INTO #temp_delete_duplicate (empno, marital_status, status_change_date) VALUES (5, 'WIDOWED', '01-JAN-2010')

    SELECT *

    FROM #temp_delete_duplicate

    order by

    empno,

    status_change_date

    select

    empno

    from #temp_delete_duplicate

    group by

    empno

    having

    count(1) > 1

    /* I have fixed this using following CTE: */

    ;with cte_marital_status

    as

    (

    select

    row_number() over(partition by empno order by empno,status_change_date) as marital_status_order,

    *

    from #temp_delete_duplicate

    )

    delete from temp

    from

    #temp_delete_duplicate temp

    inner join

    cte_marital_status cte

    on

    temp.empno = cte.empno

    and

    temp.marital_status = cte.marital_status

    where cte.marital_status_order = 1

    and cte.marital_status <> 'single'

    and exists

    (

    select 'x'

    from cte_marital_status

    where empno = cte.empno

    group by empno

    having count(1) > 1

    )

    SELECT *

    FROM #temp_delete_duplicate

    order by

    empno,

    status_change_date

    In this created example, Employee 3 and 4 have two statuses however, employee 4 in this example is not in logical order because his status is "single" after status "divorced"

    The similar logic can be applied on queries where we need to apply grouping condition and also to filter records with a where clause as well.

    Question: Any other thoughts/different logic then please share.

    Thanks

  • In the final output of the query, I see that an entry has been deleted for employee no as 3 well

    Can you explain the logic behind that deletion?


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • You want to be careful here. Life isn't nearly as simple as your model implies. For example, someone could get married, divorced, and then re-married. Their current status would be married, but your model would force them into a divorced status.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Hi,

    "Editing as requested by senior member for better clarity."

    "@Kingston Dhasian": There were 7 rows initially and after deletion there are 6 rows. So i can't understand your question.

    "@drew.allen": Please note that I have created this dataset and question for better clarity and keeping in view all my future aspects. So I am expecting answer based on this dataset assuming that this dataset is covering all future possibilities. So I am expecting answer based on this dataset only. Otherwise there are N number of possibilities 😀 you never know.

    Any other solution in your mind now ?

    Thank you.

  • iBar (4/30/2012)


    Hi,

    There were 7 rows and after deletion there are 6 rows. So i can't understand your question.

    Note: Question and Query is based on data at the time of query and assuming that current data is covering all future possibilities. So please remain focus on this data set.

    Any other solution in your mind now ?

    Thank you.

    I'm sorry, but to whom are you addressing this comment? It is a bit rude considering a previous poster was trying to identify a possible flaw in your logic.

  • Ohhhhh I had no intention to offend anyone but i just went with the flow. sorry!

    this has also happened because i could not memorises everyone names and switching back and forth just to check names is not easy.

    I will edit the above post now to clarify things.

    Cheers 🙂

  • his has also happened because i could not memorises everyone names and switching back and forth just to check names is not easy.

    Actually, it should be very easy. Just have a tab open with the thread and one with your answer. Or type your answer in Word and then paste it into the Reply box when you have finished it. This would have the added benefit of highlighting some of your typos so that you can fix them before you post them.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • iBar (4/30/2012)


    ..."@drew.allen": Please note that I have created this dataset and question for better clarity and keeping in view all my future aspects. So I am expecting answer based on this dataset assuming that this dataset is covering all future possibilities. So I am expecting answer based on this dataset only. Otherwise there are N number of possibilities 😀 you never know.

    ...

    Here is the problem with this, we've been there before. We have all posted answers to an OPs dataset and rules only to have the OP come back and say that solution didn't work and provide additional rules and an expanded dataset, This sime times turns into a vicious circle as the OP adds rules and exceptions that should have been know up front.

    In your case, Drew saw issue up front and brought it to your attention. Instead of chastizing us that we should stick to the dataset and rules provided to develop a solution, you really should have clarified and expanded on the dataset and rules to cover the new possibility. Yes, there may be other possibile combinations of data that could violate your initial rules and if they can be identified, you need to be able to adapt and change.

    We are volunteers on this site, and we have our own jobs as DBAs, Database Developers, Data Architects, etc., and we will bring our knowledge and experience to the table when helping others with questions. This means if we see something thay may have missed, we will bring it to their attention. We are trying to provide the best help possible, as well as trying to help improve the skills and knowledge of others.

    Drew telling you he sees a problem with your rules, is the same thing you should do when working with your users when they ask for something. If you see a possible flaw or ommision, it is necessary for you to bring it up early rather than later after the solution is developed.

  • Hi buddies,

    With the following script you can find the employees that violates the marital status changes common sense rules.

    ;WITH CivilStatus (StatusSequenceNumber,[Marital_Status])

    AS

    (

    SELECT 1,'SINGLE' UNION

    SELECT 2,'MARRIED' UNION

    SELECT 3,'DIVORCED' UNION

    SELECT 3,'WIDOWED'

    ),

    PersonCivilStatus

    AS

    (

    SELECT d.empno,d.marital_status,d.status_change_date,

    Sequence = row_number() over (partition by empno order by status_change_date),

    StatusSequenceNumber

    FROM #temp_delete_duplicate d

    INNER JOIN CivilStatus cs

    ON cs.Marital_Status = d.Marital_status

    )

    select os.empno,os.marital_status, os.sequence , fs.marital_Status,fs.sequence,

    StatusDifference = fs.StatusSequenceNumber - os.StatusSequenceNumber

    from PersonCivilStatus os

    left JOIN PersonCivilStatus fs

    ON os.sequence = fs.sequence - 1

    where (fs.StatusSequenceNumber - os.StatusSequenceNumber) <> 1

  • Now you can just delete those employees.

  • i think i overlooked one common rule, i will be back soon, but you can still check it tough.

  • Now this is better. If the first marital status recorded can be anything (not just single), delete the last condition of the where clause.

    ;WITH CivilStatus (StatusSequenceNumber,[Marital_Status])

    AS

    (

    SELECT 1,'SINGLE' UNION

    SELECT 2,'MARRIED' UNION

    SELECT 3,'DIVORCED' UNION

    SELECT 3,'WIDOWED'

    ),

    PersonCivilStatus

    AS

    (

    SELECT d.empno,d.marital_status,d.status_change_date,

    Sequence = row_number() over (partition by empno order by status_change_date),

    StatusSequenceNumber

    FROM #temp_delete_duplicate d

    INNER JOIN CivilStatus cs

    ON cs.Marital_Status = d.Marital_status

    )

    select os.empno,os.marital_status, os.sequence , fs.marital_Status,fs.sequence,

    StatusDifference = fs.StatusSequenceNumber - os.StatusSequenceNumber

    from PersonCivilStatus os

    left JOIN PersonCivilStatus fs

    ON os.sequence = fs.sequence - 1 and

    os.empno = fs.empno

    where (fs.StatusSequenceNumber - os.StatusSequenceNumber) <> 1 OR

    (fs.marital_status IS NULL and os.sequence = 1 and os.StatusSequenceNumber <> 1)

  • Hi,

    This is better and stick to the "at least two marital status" clause.

    ;WITH CivilStatus (StatusSequenceNumber,[Marital_Status])

    AS

    (

    SELECT 1,'SINGLE' UNION

    SELECT 2,'MARRIED' UNION

    SELECT 3,'DIVORCED' UNION

    SELECT 3,'WIDOWED'

    ),

    PersonCivilStatus

    AS

    (

    SELECT d.empno,d.marital_status,d.status_change_date,

    Sequence = row_number() over (partition by empno order by status_change_date),

    StatusSequenceNumber

    FROM #temp_delete_duplicate d

    INNER JOIN CivilStatus cs

    ON cs.Marital_Status = d.Marital_status

    )

    select os.empno,os.marital_status, os.sequence , fs.marital_Status,fs.sequence,

    StatusDifference = fs.StatusSequenceNumber - os.StatusSequenceNumber

    from PersonCivilStatus os

    inner JOIN PersonCivilStatus fs

    ON os.sequence = fs.sequence - 1 and

    os.empno = fs.empno

    where (fs.StatusSequenceNumber - os.StatusSequenceNumber) <> 1

  • well, no comments, im out..

  • Hi Lynn Pettis,

    Here is the problem with this, we've been there before. We have all posted answers to an OPs dataset and rules only to have the OP come back and say that solution didn't work and provide additional rules and an expanded dataset, This sime times turns into a vicious circle as the OP adds rules and exceptions that should have been know up front.

    In your case, Drew saw issue up front and brought it to your attention. Instead of chastizing us that we should stick to the dataset and rules provided to develop a solution, you really should have clarified and expanded on the dataset and rules to cover the new possibility. Yes, there may be other possibile combinations of data that could violate your initial rules and if they can be identified, you need to be able to adapt and change.

    We are volunteers on this site, and we have our own jobs as DBAs, Database Developers, Data Architects, etc., and we will bring our knowledge and experience to the table when helping others with questions. This means if we see something thay may have missed, we will bring it to their attention. We are trying to provide the best help possible, as well as trying to help improve the skills and knowledge of others.

    Drew telling you he sees a problem with your rules, is the same thing you should do when working with your users when they ask for something. If you see a possible flaw or ommision, it is necessary for you to bring it up early rather than later after the solution is developed.

    Yes i understood your point already and i appreciate that and i tried my best to explain my situation clearly and also tried to calm things down as i mentioned earlier:

    Ohhhhh I had no intention to offend anyone but i just went with the flow. sorry!

    this has also happened because i could not memorises everyone names and switching back and forth just to check names is not easy.

    However, as i felt Now that , like my initial abrupt reply, now your are throwing all your thoughts which made you uncomfortable at first point, without trying to understand my situation, my words and my apologies.

    Please take it a bit light. Cool down:-) please.

    Yes we are volunteers and volunteering our thoughts and sharing our problems to learn from experts like you, and we are contributing in SQL community so that novice could learn as well in future from these problems and solutions, however, if people keep on ignoring apologies and keep on insisting their point of view, just to prove that their thoughts are right, correct and better; will not help to solve anything.

    If you have really noticed in my first post of this question then one could see that i have spent sufficient time to prepare a sample data set and i have also provided my solution and desired output, according to some standards and just to ease out the volunteers and to get a quicker reply. So sample and desired output mean i have tried my best to communicate the requirement. After those much efforts, what you will expect? Like this long discussed, I must also suggest that you must also highlight and teach people to put attention to details as well - sometimes and which everyone miss almost sometimes.

    Still i learnt a lot from this discuss as well and i hope others will learn too.

    I don't want to drag this any further and again hereby request to accept my sincere apologies which made any member uncomfortable.

    Lets try to find a better solution and I believe the best is yet to come from your side soon.

    Thanks and Cheers.

    @Phil Parkin: Thanks for suggestions.

    @adrian.facio: I shall get back to you soon.

    Thanks to all.

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply