how to rewrite query to avoid duplicate records (across columns)

  • Hello Good Morning All,

    Happy New Year,

    Could you please help to rewrite the below two queries (so that i can avoid duplicates) i need to send email to everyone not the dup

    licated ones), please?

    Create table #MyPhoneList

    (

    AccountID int,

    EmailWork varchar(50),

    EmailHome varchar(50),

    EmailOther varchar(50),

    IsOffersToWorkEmail bit,

    IsOffersToHomeEmail bit,

    IsOffersToOtherEmail bit,

    IsValidEmailWork bit,

    IsValidEmailHome bit,

    IsValidEmailOther bit

    )

    --> In this table AccountID is uniquee

    --> email values could be null or repetetive for work / home / Other (same email can be used more than one columns for accountid)

    -- a new column will be created with name as Sourceflag( the value could be work, Home, Other depend on email coming from) then removes duplicates

    SELECT AccountID , Email, SourceFlag, ROW_NUMBER() OVER(PARTITION BY AccountID, Email ORDER BY Sourceflag desc) AS ROW

    INTO #List

    from (

    SELECTAccountID

    , EmailWorkAS EMAIL

    , 'Work'AS SourceFlag

    FROM#MyPhoneList (NoLock) eml

    WHEREIsOffersToWorkEmail= 1

    ANDEMAILWORKIS NOT NULL

    ANDIsValidEmailWork= 1

    UNION

    SELECTAccountID

    , EmailHome

    , 'Home'AS SourceFlag

    FROM#MyPhoneList (NoLock) eml

    WHEREIsOffersToHomeEmail= 1

    ANDEMAILHOMEIS NOT NULL

    ANDIsValidEmailHome= 1

    UNION

    SELECTAccountID

    , EmailOther

    , 'Other'AS Sourceflag

    FROM#MyPhoneList (NoLock) eml

    WHEREEmailOther= 1

    ANDEMAILOtherIS NOT NULL

    ANDIsValidEmailOther= 1

    ) eml --select count(*) from #email

    --Email dedupe on source

    SELECT * INTO #distinct FROM #List WHERE ROW = 1

    Thank you in Advance

    Dhani

  • Quick questions, could you provide some sample data? Why the NOLOCK hints? What is the definition of a unique entry in the set?

    😎

  • Hello

    Thank you for your inputs

    Nolock is ignore the locks (please ignore it)

    sure i will provide the sample data soon

    i mean accountid is primarykey, in table

    Thank you,

    Dhani

  • Please see below sample data

    insert into #MyPhoneList values (123456789,'King@travel.com',null,'b.king@gmail.com',0,0,1,1,0,1)

    insert into #MyPhoneList values (234567891,'drew@cocacola.com','drew@cocacola.com',null,1,1,1,1,0,1)

    insert into #MyPhoneList values (345678912,'bob@chase.com',null,'kuya@gmail.com',1,0,1,1,0,1)

    insert into #MyPhoneList values (456789123,null,'cmith@aix.com','bbc.king@gmail.com',1,1,1,1,1,1)

    insert into #MyPhoneList values (567891234,'Kate@walmart.com','boxwe@yahoo.com','simb.king@gmail.com',1,1,1,1,1,1)

    insert into #MyPhoneList values (678912345,'steve@shoprite.com','zoom@gmail.com','jr.steve@hotmail.com',1,1,1,1,1,1)

    Thanks a ton in advaance

  • A quick (self explanatory) solution which should be enough to get you passed this hurdle

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    GO

    /* Drop the sample table if it exists */

    IF OBJECT_ID(N'tempdb..#MyPhoneList') IS NOT NULL DROP TABLE #MyPhoneList;

    /* Create the sample table */

    Create table #MyPhoneList

    (

    AccountID INT NOT NULL CONSTRAINT PK_MYPHONELIST_ACCOUNTID PRIMARY KEY CLUSTERED

    ,EmailWork VARCHAR(50) NULL

    ,EmailHome VARCHAR(50) NULL

    ,EmailOther VARCHAR(50) NULL

    ,IsOffersToWorkEmail BIT NOT NULL

    ,IsOffersToHomeEmail BIT NOT NULL

    ,IsOffersToOtherEmail BIT NOT NULL

    ,IsValidEmailWork BIT NOT NULL

    ,IsValidEmailHome BIT NOT NULL

    ,IsValidEmailOther BIT NOT NULL

    );

    /* Insert sample data */

    INSERT INTO #MyPhoneList

    (

    AccountID

    ,EmailWork

    ,EmailHome

    ,EmailOther

    ,IsOffersToWorkEmail

    ,IsOffersToHomeEmail

    ,IsOffersToOtherEmail

    ,IsValidEmailWork

    ,IsValidEmailHome

    ,IsValidEmailOther

    )

    VALUES

    (123456789, 'King@travel.com' ,NULL , 'b.king@gmail.com' ,0,0,1,1,0,1)

    ,(234567891, 'drew@cocacola.com' ,'drew@cocacola.com', NULL ,1,1,1,1,0,1)

    ,(345678912, 'bob@chase.com' ,NULL , 'kuya@gmail.com' ,1,0,1,1,0,1)

    ,(456789123, NULL ,'cmith@aix.com' , 'bbc.king@gmail.com' ,1,1,1,1,1,1)

    ,(567891234, 'Kate@walmart.com' ,'boxwe@yahoo.com' , 'simb.king@gmail.com' ,1,1,1,1,1,1)

    ,(678912345, 'steve@shoprite.com','zoom@gmail.com' , 'jr.steve@hotmail.com',1,1,1,1,1,1)

    ;

    /* BASE_DATA CTE is where all the work is done, the reason for using

    a CTE is simply to be able to filter on the ROW_NUMBER function's

    output.

    */

    ;WITH BASE_DATA AS

    (

    SELECT

    MP.AccountID

    ,X.T_EMAIL

    ,X.T_OFFER

    ,X.T_VALID

    ,X.T_TYPE

    /* Counting instances whithin each partition of the set,

    setting an optional preference order (T_PREF)

    */

    ,ROW_NUMBER() OVER

    (

    PARTITION BY MP.AccountID

    ,X.T_EMAIL

    ORDER BY X.T_PREF

    ) AS DD_RID

    FROM #MyPhoneList MP

    /* Cross-tab type unpivot */

    CROSS APPLY (

    SELECT EmailWork, IsOffersToWorkEmail, IsValidEmailWork, 'WORK' ,1 UNION ALL

    SELECT EmailHome, IsOffersToHomeEmail, IsValidEmailHome, 'HOME' ,2 UNION ALL

    SELECT EmailOther, IsOffersToOtherEmail, IsValidEmailOther, 'OTHER' ,3

    ) AS X(

    T_EMAIL

    ,T_OFFER

    ,T_VALID

    ,T_TYPE

    ,T_PREF

    )

    WHERE X.T_VALID = 1

    AND X.T_OFFER = 1

    AND X.T_EMAIL IS NOT NULL

    )

    /* Final output */

    SELECT

    BD.AccountID

    ,BD.T_EMAIL

    ,BD.T_OFFER

    ,BD.T_VALID

    ,BD.T_TYPE

    ,BD.DD_RID

    FROM BASE_DATA BD

    /* Filter for de-duplication */

    WHERE BD.DD_RID = 1;

    Results

    AccountID T_EMAIL T_OFFER T_VALID T_TYPE DD_RID

    ----------- --------------------- ------- ------- ------ --------

    123456789 b.king@gmail.com 1 1 OTHER 1

    234567891 drew@cocacola.com 1 1 WORK 1

    345678912 bob@chase.com 1 1 WORK 1

    345678912 kuya@gmail.com 1 1 OTHER 1

    456789123 bbc.king@gmail.com 1 1 OTHER 1

    456789123 cmith@aix.com 1 1 HOME 1

    567891234 boxwe@yahoo.com 1 1 HOME 1

    567891234 Kate@walmart.com 1 1 WORK 1

    567891234 simb.king@gmail.com 1 1 OTHER 1

    678912345 jr.steve@hotmail.com 1 1 OTHER 1

    678912345 steve@shoprite.com 1 1 WORK 1

    678912345 zoom@gmail.com 1 1 HOME 1

  • asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    To me it appears as the nolock hint is used here as a blanket solution, without understanding the potential effects of it, which is a very bad practice. In this case for example, using it makes no sense, the table being only accessible from the session it is created in so what is the purpose? Seriously suggest you to read into this!

    😎

  • Eirikur Eiriksson (1/4/2015)


    asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    To me it appears as the nolock hint is used here as a blanket solution, without understanding the potential effects of it, which is a very bad practice. In this case for example, using it makes no sense, the table being only accessible from the session it is created in so what is the purpose? Seriously suggest you to read into this!

    😎

    I agree. It makes no sense here. Since the table is a Temp Table, it would make more sense to use WITH(TABLOCKX) to keep from having to create the shared locking on each row and escalating to page locks and finally a table lock. That takes time and memory and it makes much more sense to cut to the chase by using WITH(TABLOCKX).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • asita (1/3/2015)


    Please see below sample data

    insert into #MyPhoneList values (123456789,'King@travel.com',null,'b.king@gmail.com',0,0,1,1,0,1)

    insert into #MyPhoneList values (234567891,'drew@cocacola.com','drew@cocacola.com',null,1,1,1,1,0,1)

    insert into #MyPhoneList values (345678912,'bob@chase.com',null,'kuya@gmail.com',1,0,1,1,0,1)

    insert into #MyPhoneList values (456789123,null,'cmith@aix.com','bbc.king@gmail.com',1,1,1,1,1,1)

    insert into #MyPhoneList values (567891234,'Kate@walmart.com','boxwe@yahoo.com','simb.king@gmail.com',1,1,1,1,1,1)

    insert into #MyPhoneList values (678912345,'steve@shoprite.com','zoom@gmail.com','jr.steve@hotmail.com',1,1,1,1,1,1)

    Thanks a ton in advaance

    Just a couple of suggestions...

    I think it ironic that the validation flags are "1" for some of the NULL emails. And, I'm don't know if you have any control over it, but the #MyPhoneList table has a couple of major problems as may the original table that it's been feed from.

    1. The name of the table is #MyPhoneList and, yet, the table is exclusively dedicated to email addresses.

    2. 50 characters will not be large enough for email addresses. I've seen some as large as 155 characters. I recommend VARCHAR(200) for email address columns.

    3. The table is de-normalized and will cause you much pain if someone comes up with more than 1 work, home, or "other" email address. The table shouldn't be constrained in such a fashion. Instead, it should be designed as follows...

    CREATE TABLE #AccountEmail

    (

    AccountID INT NOT NULL,

    EmailAddress VARCHAR(200) NOT NULL,

    EmailType CHAR(5) NOT NULL,

    IsOffers BIT NOT NULL DEFAULT 0,

    IsValid BIT NOT NULL DEFAULT 0,

    UsagePriority TINYINT NOT NULL DEFAULT 1,--Identifies which email for the AccountID is primary, secondary, etc

    PRIMARY KEY CLUSTERED (AccountID, EmailType, EmailAddress)

    )

    ;

    I might also add a start and end date to the table for each row so that you can replay history when it comes to it.

    EmailType should either have a constraint on it to limit it to Work, Home, or Other entries or, better yet, should be FK'd to an "EmailType" reference table.

    Yeah... I know. Sounds like a bit of extra work. It is because it wasn't designed like this to begin with but it will save you a shedload of troubleshooting and additional work down the line when the inevitable occurs on the original table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/4/2015)


    asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

    Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (1/6/2015)


    Jeff Moden (1/4/2015)


    asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

    Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

    Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)

    😎

  • Eirikur Eiriksson (1/6/2015)


    ScottPletcher (1/6/2015)


    Jeff Moden (1/4/2015)


    asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

    Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

    Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)

    😎

    Every lock is some amount of overhead. Thus, when you don't need a lock, why take one? If you have thousands of users, as we do on some of our servers, such locks can add up.

    Yes, you have to use NOLOCK only when it's applicable, and it gets abused far too often. But saying it should never be used is too extreme. It's great for historical data and other truly, 100% static data.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (1/6/2015)


    Eirikur Eiriksson (1/6/2015)


    ScottPletcher (1/6/2015)


    Jeff Moden (1/4/2015)


    asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

    Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

    Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)

    😎

    Every lock is some amount of overhead. Thus, when you don't need a lock, why take one? If you have thousands of users, as we do on some of our servers, such locks can add up.

    Yes, you have to use NOLOCK only when it's applicable, and it gets abused far too often. But saying it should never be used is too extreme. It's great for historical data and other truly, 100% static data.

    Still will take a shared lock while reading the data, if snapshot isolation it uses row version, what is the case for not taking the exclusive table lock if one is the only reader?

    😎

  • Eirikur Eiriksson (1/6/2015)


    ScottPletcher (1/6/2015)


    Eirikur Eiriksson (1/6/2015)


    ScottPletcher (1/6/2015)


    Jeff Moden (1/4/2015)


    asita (1/3/2015)


    Nolock is ignore the locks (please ignore it)

    Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

    Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

    Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)

    😎

    Every lock is some amount of overhead. Thus, when you don't need a lock, why take one? If you have thousands of users, as we do on some of our servers, such locks can add up.

    Yes, you have to use NOLOCK only when it's applicable, and it gets abused far too often. But saying it should never be used is too extreme. It's great for historical data and other truly, 100% static data.

    Still will take a shared lock while reading the data, if snapshot isolation it uses row version, what is the case for not taking the exclusive table lock if one is the only reader?

    😎

    Even a single exclusive table lock is a bit of overhead. Why deliberately spend resources on any locking you don't need?! I can't imagine not using NOLOCK on local temp tables -- what is the downside to it??

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply