1-M join, concat all records

Question

Post reply

1-M join, concat all records

Tony++

SSCarpal Tunnel

Points: 4421
More actions
February 13, 2009 at 12:14 pm

#206511

drop table #product;drop table #category;
Create table #product (productID int, Categories varchar(4000) );
Create table #category (productID int, CategoryText varchar(4000) );
insert into #product values (1,'');
insert into #product values (2,'');
insert into #product values (5,'');
insert into #category values (1,'Pasta');
insert into #category values (1,'Noodle');
insert into #category values (2,'Fish');
insert into #category values (4,'');
insert into #category values (5,'Fruit');
Update Product
Set Product.Categories = Product.Categories + Category.CategoryText + ' '
From #product as Product INNER JOIN #category as Category
On Product.ProductID = Category.ProductID
select * from #product
select * from #category
Notes:
* Product.Categories starts out blank, not null.
* Category.CategoryText may be blank, but never null.
* It's OK to have a blank space on the end of Product.Categories. No need to code to avoid or strip it.
The goal is to end up with Product.Categories having
a space-delimited list of the CategoryText values for each product.
It does update Product.Categories, but for any records with a 1-M join, it only updates with the value from one of the records (seems to be the first record found for the join).
ProductID #1 is getting a value "Pasta" when it should have "Pasta Noodle".
How can I get the join & concatenation to use the values from ALL matching records?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 1

I adapted the following solution from this article by Jeff Moden:

http://www.sqlservercentral.com/articles/Test+Data/61572/

INSERT #product (ProductId, Categories)

SELECT

C1.ProductID,

Categories = (

SELECT LTRIM(C.CategoryText + ' ')

FROM #category C

WHERE (C.ProductId = C1.ProductId)

FOR XML PATH('')

)

FROM #category C1

GROUP BY C1.ProductId

Tony++ SSCarpal Tunnel Points: 4421 More actions · Answer 2

I can work with this idea. I'm not great with XML so the XML piece didn't occur to me.

This introduces a new problem:

I have a value like "Pasta & Sauces". The XML is converting this to "Pasta & Sauces". This data will be fed into an index (non-SQL) for web site search, so I can't introduce anything into the data.

"/" doesn't convert to anything funky. Not sure what other than "&" would get changed.

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 3

I think these are the XML characters that will most likely give you trouble (and it is hard work to get the entity encodings to display properly in this forum as well!)

& .... &

< .... <

> .... >

INSERT #product (ProductId, Categories)

SELECT

C1.ProductID,

Categories = REPLACE(REPLACE(REPLACE((

SELECT LTRIM(C.CategoryText + ' ')

FROM #category C

WHERE (C.ProductId = C1.ProductId)

FOR XML PATH('')), '<', '<'), '>', '>'), '&', '&')

FROM #category C1

GROUP BY C1.ProductId

Tony++ SSCarpal Tunnel Points: 4421 More actions · Answer 4

Very nice! It's working and performs well.

I'll have to try this in similar places using functions and CTEs, and see what performs better.

I can end my week on a high note, having this task solved, and a bit smarter for it. Thanks!

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 5

Just thought of another way of dealing with the XML encoding, which is probably better than my previous offering. This method uses the XQuery value method to decode the generated XML text.

SELECT

C1.ProductID,

Categories = CONVERT(xml,

(SELECT C.CategoryText + ' '

FROM #category C

WHERE (C.ProductId = C1.ProductId)

FOR XML PATH(''))

).value('.', 'varchar(4000)')

FROM #category C1

GROUP BY C1.ProductId

Well safer at least, but performance is worse.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 6

Tony (2/13/2009)
Very nice! It's working and performs well.
I'll have to try this in similar places using functions and CTEs, and see what performs better.
I can end my week on a high note, having this task solved, and a bit smarter for it. Thanks!

Ummm... ok.... help us be a bit smarter, please. You wrote...

The goal is to end up with Product.Categories having
a space-delimited list of the CategoryText values for each product.

Why??? This is one of the worst things you can to do data in a database. It's an extreme form of denormalization that will make your life miserable in the future. Please share with us what the business requirement to do this terrible thing is. And, no, not looking to slam you into the wall on this... I'm truly curious for two reasons...

1. The obvious... because it's such a terrible requirement.

2. If we know the real business reason for this, there may be a better way to do it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Tony++ SSCarpal Tunnel Points: 4421 More actions · Answer 7

The data is getting laid out in a denormalized form so it can be put into an external indexer (Lucene) for web sites to use for term searches.

There's a lot of data massage to take it from application tables & put it into a form easy for the index build (and also easy for the index update process to find changes).

The index is basically 1 entry per PK (product ID), and the rest of the data is indexable and searchable by the data in the field(s). The code I was trying to work out here is how to take multiple product description records and deliberately denormalize. Since I don't know in advance how many descriptions a product may have, the code spins then all into a single field.

The Lucene index is then able to search for text whether it's contained in 1 field or multiple.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 8

Thanks Tony. Just one more question... is your goal to create a text file for Lucene or are you just going to pass it through the API?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Tony++ SSCarpal Tunnel Points: 4421 More actions · Answer 9

We're only loading through the Lucene (.net) API.

For what it's worth, we're also looking at dtSearch. Same concept for feeding data.

Advantages: testing shows full build in 20% of the time, can return suggested similar search terms (that actually have matches) when no matches were found.

Disadvantages: dtSearch is commercial and so has a pricetag(whereas Lucene is opensource).

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 10

Then, I think Andrew's solution may be the best. If it was a text file input, you could simply do a BCP out using a simple SELECT and use a "space" as the end of row teminator.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)