Creating Comma seperated list of details from multiple columns

Question

Creating Comma seperated list of details from multiple columns

doyle_31

SSC Rookie

Points: 34
More actions
June 3, 2010 at 3:03 am

#233320

Hi All,
I am trying to find a way to add into a table a flattened (comma seperated list) of email addresses based on the multiple columns of nformation in another table (joined by customer_full_name and postcode.
This is to highlight duplicate email addresses for people under the same customer_full_name and Postcode.
I have done this using a loop which loops through concatenating the email addresses but it takes 1minute to do 1000. The table is 19,000 so this isn't really acceptable. I have tried temp tables, table variables and none of this seems to make any difference. I think that it is becuase i am joining on text columns?
I know you guys are MASSIVE fans of the tally table, but i have not quite got my head around how i can adapt it to make it work for this.
any hints or tips would be massively appreciated.
Please find some sample data below.
MASSIVE thanks,
Matt
Create table #tempa
(
customer_Full_Name varchar(100),
Customer_Email varchar(100),
Postcode varchar(100),
AlternateEmail varchar(max)NULL
)
insert into #tempa (customer_full_name,customer_email,postcode)
select 'mark davis','mdavis@yahoo.co.uk','me18 7ed'
insert into #tempa (customer_full_name,customer_email,postcode)
select 'mark travis','mtravis@yahoo.co.uk','SE18 5et'
insert into #tempa (customer_full_name,customer_email,postcode)
select 'mike pervis','mpervis@hotmail.co.uk','SW1 5et'
insert into #tempa (customer_full_name,customer_email,postcode)
select 'Frank line','fline@Btconnect.co.uk','YR3 9RT'
Create table #tempb
(
Idcol int identity (1,1),
Customer_Full_Name varchar(100),
Customer_Email varchar(100),
Postcode varchar(100)
)
insert into #tempb (customer_full_name,customer_email,postcode)
select 'mark davis','mdavis@yahoo.co.uk','me18 7ed'
insert into #tempb (customer_full_name,customer_email,postcode)
select 'mark davis','mdavis@hotmail..co.uk','me18 7ed'
insert into #tempb (customer_full_name,customer_email,postcode)
select 'mark davis','mark.davis@Gmail..co.uk','me18 7ed'
insert into #tempb (customer_full_name,customer_email,postcode)
select 'mark davis','bigboymark@Gmail..co.uk','me18 7ed'
insert into #tempb (customer_full_name,customer_email,postcode)
select 'mike pervis','mark.pervis@hotmail.co.uk','SW1 5et'
insert into #tempb (customer_full_name,customer_email,postcode)
select 'mike pervis','markyp@Gmail.co.uk','SW1 5et'

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150467 More actions · Answer 1

Hey Matt,

So that's reasonably clear, but what output do you want? Do you want a list of matches between the two tables on email address AND postcode, where there is more than one match? Do you want to update the alternate_email column with the comma-separated list of email addresses...or something else?

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

doyle_31 SSC Rookie Points: 34 More actions · Answer 2

Hi Paul,

Thanks for the response.

what i am hoping for is to update the alternate email culumn with all of the email addresses that we have listed for that customer i.e.

Mark.davis@gmail.co.uk, Mdavis @hotmail.co.uk, mdavis@yahoo.co.uk

I have this at the moment which is grouping the customers together and ranking them, but i cant figure out (wihout a loop!) how to get this flattened down into the comma seperated list!

with cte as (

select

ROW_NUMBER() over(PARTITION BY b.customer_full_name order by b.customer_full_name)idcol,

b.customer_full_name,b.customer_email

from #tempb b join #tempa a on a.customer_full_name = b.customer_full_name

group by b.customer_full_name,b.customer_email

)

ANY help is very much appreciated.

Thanks

Matt

Paul White SSC Guru Points: 150467 More actions · Answer 3

Ok, I need to do some stuff with a hash and an index to make this fly for larger sets, but as a demonstration only:

Setup script (slightly modified)

CREATE TABLE #tempa

(

customer_full_name VARCHAR(100) NOT NULL,

customer_email VARCHAR(100) NOT NULL,

postcode VARCHAR(100) NOT NULL,

alternateemail VARCHAR(MAX) NULL,

);

GO

INSERT #tempa (customer_full_name,customer_email,postcode)

VALUES ('mark davis', 'mdavis@yahoo.co.uk', 'me18 7ed');

INSERT #tempa (customer_full_name,customer_email,postcode)

VALUES ('mark travis', 'mtravis@yahoo.co.uk', 'SE18 5et');

INSERT #tempa (customer_full_name,customer_email,postcode)

VALUES ('mike pervis', 'mpervis@hotmail.co.uk', 'SW1 5et');

INSERT #tempa (customer_full_name,customer_email,postcode)

VALUES ('Frank line', 'fline@Btconnect.co.uk', 'YR3 9RT');

GO

CREATE TABLE #tempb

(

idcol INTEGER IDENTITY(1,1) PRIMARY KEY,

customer_full_name VARCHAR(100),

customer_email VARCHAR(100),

postcode VARCHAR(100),

);

GO

INSERT #tempb (customer_full_name,customer_email,postcode)

VALUES ('mark davis', 'mdavis@yahoo.co.uk', 'me18 7ed');

INSERT #tempb (customer_full_name,customer_email,postcode)

VALUES ('mark davis', 'mdavis@hotmail..co.uk', 'me18 7ed');

INSERT #tempb (customer_full_name,customer_email,postcode)

VALUES ('mark davis', 'mark.davis@Gmail..co.uk', 'me18 7ed');

INSERT #tempb (customer_full_name,customer_email,postcode)

VALUES ('mark davis', 'bigboymark@Gmail..co.uk', 'me18 7ed');

INSERT #tempb (customer_full_name,customer_email,postcode)

VALUES ('mike pervis', 'mark.pervis@hotmail.co.uk', 'SW1 5et');

INSERT #tempb (customer_full_name,customer_email,postcode)

VALUES ('mike pervis', 'markyp@Gmail.co.uk', 'SW1 5et');

GO

UPDATE A

SET alternateemail = ''

FROM #tempa A

Proposed solution:

UPDATE UpdateSet

SET alternateemail = csv

FROM (

SELECT A.alternateemail,

csv = STUFF(EmailList.csv, 1, 1, SPACE(0))

FROM #tempa A

CROSS

APPLY (

SELECT ',' + B.customer_email

FROM #tempb B

WHERE B.postcode = A.postcode

AND B.customer_full_name = A.customer_full_name

ORDER BY B.idcol

FOR XML PATH(''), TYPE

) Concatenated (xml_emails)

CROSS

APPLY (

SELECT Concatenated.xml_emails.value('./text()[1]', 'VARCHAR(MAX)')

) EmailList (csv)

) UpdateSet

WHERE csv IS NOT NULL;

GO

DROP TABLE #tempa, #tempb

Could you verify that it produces the results you want?

Paul

edit: added an ORDER BY to the FOR XML PATH, just cos I can 🙂

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Gopi Muluka SSCarpal Tunnel Points: 4594 More actions · Answer 4

Alternative Solution

UPDATE #TempA

SET AlternateEmail=

LEFT(B.ALT,LEN(B.ALT)-1)

FROM

(

SELECT Customer_Full_Name,

CASE WHEN ISNULL([1],'')='' THEN ''

ELSE [1]+',' END

+CASE WHEN ISNULL([2],'')='' THEN ''

ELSE [2]+',' END

+CASE WHEN ISNULL([3],'')='' THEN ''

ELSE [3]+',' END

+CASE WHEN ISNULL([4],'')='' THEN ''

ELSE [4]+',' END

+CASE WHEN ISNULL([5],'')='' THEN ''

ELSE [5]+',' END

AS ALT, PostCode

FROM

( SELECT ROW_NUMBER() OVER(PARTITION BY Customer_Full_Name ORDER BY Customer_Full_Name)Idcol

,Customer_Full_Name,Customer_Email,PostCode FROM #tempb) P

PIVOT

(

MAX(Customer_Email)

FOR Idcol

IN ([1],[2],[3],[4],[5])

) PVT

) B

INNER JOIN #Tempa

ON B.Customer_Full_Name=#Tempa.Customer_Full_Name

AND B.PostCode=#Tempa.PostCode

SELECT * FROM #TempA

doyle_31 SSC Rookie Points: 34 More actions · Answer 5

Hey guys,

Thanks to you both for replying and helping with the solution to this. They both work PERFECTLY :-D:-D

This is MASIVELY appreciated. Thank you SO much.

Matt

Paul White SSC Guru Points: 150467 More actions · Answer 6

Gopi,

That's good, but I don't think we can safely hard-code the IdCol values like that. I'm pretty sure there are more than 5 records...

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 7

Here's the slightly more optimised version I promised.

It uses an integer hash of the customer_full_name and postcode to quickly find potential matches without huge string matching.

Only potential matches are then compared string-to-string.

Setup: (hash computed column and indexes added)

CREATE TABLE #tempa

(

customer_full_name VARCHAR(100) NOT NULL,

customer_email VARCHAR(100) NOT NULL,

postcode VARCHAR(100) NOT NULL,

alternateemail VARCHAR(MAX) NULL,

hash_pc_cfn AS CHECKSUM(postcode, customer_full_name),