how to rewrite query to avoid duplicate records (across columns)

Question

how to rewrite query to avoid duplicate records (across columns)

asita

SSCarpal Tunnel

Points: 4144
More actions
January 3, 2015 at 1:51 pm

#302181

Hello Good Morning All,
Happy New Year,
Could you please help to rewrite the below two queries (so that i can avoid duplicates) i need to send email to everyone not the dup
licated ones), please?
Create table #MyPhoneList
(
AccountID int,
EmailWork varchar(50),
EmailHome varchar(50),
EmailOther varchar(50),
IsOffersToWorkEmail bit,
IsOffersToHomeEmail bit,
IsOffersToOtherEmail bit,
IsValidEmailWork bit,
IsValidEmailHome bit,
IsValidEmailOther bit
)
--> In this table AccountID is uniquee
--> email values could be null or repetetive for work / home / Other (same email can be used more than one columns for accountid)
-- a new column will be created with name as Sourceflag( the value could be work, Home, Other depend on email coming from) then removes duplicates
SELECT AccountID , Email, SourceFlag, ROW_NUMBER() OVER(PARTITION BY AccountID, Email ORDER BY Sourceflag desc) AS ROW
INTO #List
from (
SELECTAccountID
, EmailWorkAS EMAIL
, 'Work'AS SourceFlag
FROM#MyPhoneList (NoLock) eml
WHEREIsOffersToWorkEmail= 1
ANDEMAILWORKIS NOT NULL
ANDIsValidEmailWork= 1
UNION
SELECTAccountID
, EmailHome
, 'Home'AS SourceFlag
FROM#MyPhoneList (NoLock) eml
WHEREIsOffersToHomeEmail= 1
ANDEMAILHOMEIS NOT NULL
ANDIsValidEmailHome= 1
UNION
SELECTAccountID
, EmailOther
, 'Other'AS Sourceflag
FROM#MyPhoneList (NoLock) eml
WHEREEmailOther= 1
ANDEMAILOtherIS NOT NULL
ANDIsValidEmailOther= 1
) eml --select count(*) from #email
--Email dedupe on source
SELECT * INTO #distinct FROM #List WHERE ROW = 1
Thank you in Advance
Dhani

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 1

Quick questions, could you provide some sample data? Why the NOLOCK hints? What is the definition of a unique entry in the set?

😎

asita SSCarpal Tunnel Points: 4144 More actions · Answer 2

Hello

Thank you for your inputs

Nolock is ignore the locks (please ignore it)

sure i will provide the sample data soon

i mean accountid is primarykey, in table

Thank you,

Dhani

asita SSCarpal Tunnel Points: 4144 More actions · Answer 3

Please see below sample data

insert into #MyPhoneList values (123456789,'King@travel.com',null,'b.king@gmail.com',0,0,1,1,0,1)

insert into #MyPhoneList values (234567891,'drew@cocacola.com','drew@cocacola.com',null,1,1,1,1,0,1)

insert into #MyPhoneList values (345678912,'bob@chase.com',null,'kuya@gmail.com',1,0,1,1,0,1)

insert into #MyPhoneList values (456789123,null,'cmith@aix.com','bbc.king@gmail.com',1,1,1,1,1,1)

insert into #MyPhoneList values (567891234,'Kate@walmart.com','boxwe@yahoo.com','simb.king@gmail.com',1,1,1,1,1,1)

insert into #MyPhoneList values (678912345,'steve@shoprite.com','zoom@gmail.com','jr.steve@hotmail.com',1,1,1,1,1,1)

Thanks a ton in advaance

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 4

A quick (self explanatory) solution which should be enough to get you passed this hurdle

😎

USE tempdb;

GO

SET NOCOUNT ON;

GO

/* Drop the sample table if it exists */

IF OBJECT_ID(N'tempdb..#MyPhoneList') IS NOT NULL DROP TABLE #MyPhoneList;

/* Create the sample table */

Create table #MyPhoneList

(

AccountID INT NOT NULL CONSTRAINT PK_MYPHONELIST_ACCOUNTID PRIMARY KEY CLUSTERED

,EmailWork VARCHAR(50) NULL

,EmailHome VARCHAR(50) NULL

,EmailOther VARCHAR(50) NULL

,IsOffersToWorkEmail BIT NOT NULL

,IsOffersToHomeEmail BIT NOT NULL

,IsOffersToOtherEmail BIT NOT NULL

,IsValidEmailWork BIT NOT NULL

,IsValidEmailHome BIT NOT NULL

,IsValidEmailOther BIT NOT NULL

);

/* Insert sample data */

INSERT INTO #MyPhoneList

(

AccountID

,EmailWork

,EmailHome

,EmailOther

,IsOffersToWorkEmail

,IsOffersToHomeEmail

,IsOffersToOtherEmail

,IsValidEmailWork

,IsValidEmailHome

,IsValidEmailOther

)

VALUES

(123456789, 'King@travel.com' ,NULL , 'b.king@gmail.com' ,0,0,1,1,0,1)

,(234567891, 'drew@cocacola.com' ,'drew@cocacola.com', NULL ,1,1,1,1,0,1)

,(345678912, 'bob@chase.com' ,NULL , 'kuya@gmail.com' ,1,0,1,1,0,1)

,(456789123, NULL ,'cmith@aix.com' , 'bbc.king@gmail.com' ,1,1,1,1,1,1)

,(567891234, 'Kate@walmart.com' ,'boxwe@yahoo.com' , 'simb.king@gmail.com' ,1,1,1,1,1,1)

,(678912345, 'steve@shoprite.com','zoom@gmail.com' , 'jr.steve@hotmail.com',1,1,1,1,1,1)

;

/* BASE_DATA CTE is where all the work is done, the reason for using

a CTE is simply to be able to filter on the ROW_NUMBER function's

output.

*/

;WITH BASE_DATA AS

(

SELECT

MP.AccountID

,X.T_EMAIL

,X.T_OFFER

,X.T_VALID

,X.T_TYPE

/* Counting instances whithin each partition of the set,

setting an optional preference order (T_PREF)

*/

,ROW_NUMBER() OVER

(

PARTITION BY MP.AccountID

,X.T_EMAIL

ORDER BY X.T_PREF

) AS DD_RID

FROM #MyPhoneList MP

/* Cross-tab type unpivot */

CROSS APPLY (

SELECT EmailWork, IsOffersToWorkEmail, IsValidEmailWork, 'WORK' ,1 UNION ALL

SELECT EmailHome, IsOffersToHomeEmail, IsValidEmailHome, 'HOME' ,2 UNION ALL

SELECT EmailOther, IsOffersToOtherEmail, IsValidEmailOther, 'OTHER' ,3

) AS X(

T_EMAIL

,T_OFFER

,T_VALID

,T_TYPE

,T_PREF

)

WHERE X.T_VALID = 1

AND X.T_OFFER = 1

AND X.T_EMAIL IS NOT NULL

)

/* Final output */

SELECT

BD.AccountID

,BD.T_EMAIL

,BD.T_OFFER

,BD.T_VALID

,BD.T_TYPE

,BD.DD_RID

FROM BASE_DATA BD

/* Filter for de-duplication */

WHERE BD.DD_RID = 1;

Results

AccountID T_EMAIL T_OFFER T_VALID T_TYPE DD_RID

----------- --------------------- ------- ------- ------ --------

123456789 b.king@gmail.com 1 1 OTHER 1

234567891 drew@cocacola.com 1 1 WORK 1

345678912 bob@chase.com 1 1 WORK 1

345678912 kuya@gmail.com 1 1 OTHER 1

456789123 bbc.king@gmail.com 1 1 OTHER 1

456789123 cmith@aix.com 1 1 HOME 1

567891234 boxwe@yahoo.com 1 1 HOME 1

567891234 Kate@walmart.com 1 1 WORK 1

567891234 simb.king@gmail.com 1 1 OTHER 1

678912345 jr.steve@hotmail.com 1 1 OTHER 1

678912345 steve@shoprite.com 1 1 WORK 1

678912345 zoom@gmail.com 1 1 HOME 1

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 5

asita (1/3/2015)
Nolock is ignore the locks (please ignore it)

To me it appears as the nolock hint is used here as a blanket solution, without understanding the potential effects of it, which is a very bad practice. In this case for example, using it makes no sense, the table being only accessible from the session it is created in so what is the purpose? Seriously suggest you to read into this!

😎

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 6

Eirikur Eiriksson (1/4/2015)
asita (1/3/2015)
Nolock is ignore the locks (please ignore it)
To me it appears as the nolock hint is used here as a blanket solution, without understanding the potential effects of it, which is a very bad practice. In this case for example, using it makes no sense, the table being only accessible from the session it is created in so what is the purpose? Seriously suggest you to read into this!
😎

I agree. It makes no sense here. Since the table is a Temp Table, it would make more sense to use WITH(TABLOCKX) to keep from having to create the shared locking on each row and escalating to page locks and finally a table lock. That takes time and memory and it makes much more sense to cut to the chase by using WITH(TABLOCKX).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 7

asita (1/3/2015)
Nolock is ignore the locks (please ignore it)

Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 8

asita (1/3/2015)
Please see below sample data
insert into #MyPhoneList values (123456789,'King@travel.com',null,'b.king@gmail.com',0,0,1,1,0,1)
insert into #MyPhoneList values (234567891,'drew@cocacola.com','drew@cocacola.com',null,1,1,1,1,0,1)
insert into #MyPhoneList values (345678912,'bob@chase.com',null,'kuya@gmail.com',1,0,1,1,0,1)
insert into #MyPhoneList values (456789123,null,'cmith@aix.com','bbc.king@gmail.com',1,1,1,1,1,1)
insert into #MyPhoneList values (567891234,'Kate@walmart.com','boxwe@yahoo.com','simb.king@gmail.com',1,1,1,1,1,1)
insert into #MyPhoneList values (678912345,'steve@shoprite.com','zoom@gmail.com','jr.steve@hotmail.com',1,1,1,1,1,1)
Thanks a ton in advaance

Just a couple of suggestions...

I think it ironic that the validation flags are "1" for some of the NULL emails. And, I'm don't know if you have any control over it, but the #MyPhoneList table has a couple of major problems as may the original table that it's been feed from.

1. The name of the table is #MyPhoneList and, yet, the table is exclusively dedicated to email addresses.

2. 50 characters will not be large enough for email addresses. I've seen some as large as 155 characters. I recommend VARCHAR(200) for email address columns.

3. The table is de-normalized and will cause you much pain if someone comes up with more than 1 work, home, or "other" email address. The table shouldn't be constrained in such a fashion. Instead, it should be designed as follows...

CREATE TABLE #AccountEmail

(

AccountID INT NOT NULL,

EmailAddress VARCHAR(200) NOT NULL,

EmailType CHAR(5) NOT NULL,

IsOffers BIT NOT NULL DEFAULT 0,

IsValid BIT NOT NULL DEFAULT 0,

UsagePriority TINYINT NOT NULL DEFAULT 1,--Identifies which email for the AccountID is primary, secondary, etc

PRIMARY KEY CLUSTERED (AccountID, EmailType, EmailAddress)

)

;

I might also add a start and end date to the table for each row so that you can replay history when it comes to it.

EmailType should either have a constraint on it to limit it to Work, Home, or Other entries or, better yet, should be FK'd to an "EmailType" reference table.

Yeah... I know. Sounds like a bit of extra work. It is because it wasn't designed like this to begin with but it will save you a shedload of troubleshooting and additional work down the line when the inevitable occurs on the original table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101097 More actions · Answer 9

Jeff Moden (1/4/2015)
asita (1/3/2015)
Nolock is ignore the locks (please ignore it)
Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.

Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 10

ScottPletcher (1/6/2015)
Jeff Moden (1/4/2015)
asita (1/3/2015)
Nolock is ignore the locks (please ignore it)
Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.
Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?

Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)

😎

ScottPletcher SSC Guru Points: 101097 More actions · Answer 11

Eirikur Eiriksson (1/6/2015)
ScottPletcher (1/6/2015)
Jeff Moden (1/4/2015)
asita (1/3/2015)
Nolock is ignore the locks (please ignore it)
Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.
Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?
Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)
😎

Every lock is some amount of overhead. Thus, when you don't need a lock, why take one? If you have thousands of users, as we do on some of our servers, such locks can add up.

Yes, you have to use NOLOCK only when it's applicable, and it gets abused far too often. But saying it should never be used is too extreme. It's great for historical data and other truly, 100% static data.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 12

ScottPletcher (1/6/2015)
Eirikur Eiriksson (1/6/2015)
ScottPletcher (1/6/2015)
Jeff Moden (1/4/2015)
asita (1/3/2015)
Nolock is ignore the locks (please ignore it)
Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.
Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?
Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)
😎
Every lock is some amount of overhead. Thus, when you don't need a lock, why take one? If you have thousands of users, as we do on some of our servers, such locks can add up.
Yes, you have to use NOLOCK only when it's applicable, and it gets abused far too often. But saying it should never be used is too extreme. It's great for historical data and other truly, 100% static data.

Still will take a shared lock while reading the data, if snapshot isolation it uses row version, what is the case for not taking the exclusive table lock if one is the only reader?

😎

ScottPletcher SSC Guru Points: 101097 More actions · Answer 13

Eirikur Eiriksson (1/6/2015)
ScottPletcher (1/6/2015)
Eirikur Eiriksson (1/6/2015)
ScottPletcher (1/6/2015)
Jeff Moden (1/4/2015)
asita (1/3/2015)
Nolock is ignore the locks (please ignore it)
Trust me Dhani, we know what it does and folks in-the-know can't just ignore it. It's normally a bad thing to do unless you're hitting a static table and if the table is static, there's usually no sense in using it. On Temp Tables, there's absolutely no sense in using it. Far better to use WITH(TABLOCKX) on Temp Tables as I explain above.
Huh? Wouldn't using "WITH (NOLOCK)" on a temp table avoid the shared locks that would normally be taken as the table as read? Doesn't that reduce overhead, if only slightly? How does reducing overhead make "no sense"?
Exclusive table lock makes more sense than any kind of a possible maybe situation, especially when only one user/spid/thread/connection can access the table anyway, on the scale the absence of lock escalation has less of an impact than a single spid locking it self out (which will never happen)
😎
Every lock is some amount of overhead. Thus, when you don't need a lock, why take one? If you have thousands of users, as we do on some of our servers, such locks can add up.
Yes, you have to use NOLOCK only when it's applicable, and it gets abused far too often. But saying it should never be used is too extreme. It's great for historical data and other truly, 100% static data.
Still will take a shared lock while reading the data, if snapshot isolation it uses row version, what is the case for not taking the exclusive table lock if one is the only reader?
😎

Even a single exclusive table lock is a bit of overhead. Why deliberately spend resources on any locking you don't need?! I can't imagine not using NOLOCK on local temp tables -- what is the downside to it??

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".