February 13, 2009 at 12:14 pm
drop table #product;drop table #category;
Create table #product (productID int, Categories varchar(4000) );
Create table #category (productID int, CategoryText varchar(4000) );
insert into #product values (1,'');
insert into #product values (2,'');
insert into #product values (5,'');
insert into #category values (1,'Pasta');
insert into #category values (1,'Noodle');
insert into #category values (2,'Fish');
insert into #category values (4,'');
insert into #category values (5,'Fruit');
Update Product
Set Product.Categories = Product.Categories + Category.CategoryText + ' '
From #product as Product INNER JOIN #category as Category
On Product.ProductID = Category.ProductID
select * from #product
select * from #category
Notes:
* Product.Categories starts out blank, not null.
* Category.CategoryText may be blank, but never null.
* It's OK to have a blank space on the end of Product.Categories. No need to code to avoid or strip it.
The goal is to end up with Product.Categories having
a space-delimited list of the CategoryText values for each product.
It does update Product.Categories, but for any records with a 1-M join, it only updates with the value from one of the records (seems to be the first record found for the join).
ProductID #1 is getting a value "Pasta" when it should have "Pasta Noodle".
How can I get the join & concatenation to use the values from ALL matching records?
February 13, 2009 at 1:08 pm
I adapted the following solution from this article by Jeff Moden:
http://www.sqlservercentral.com/articles/Test+Data/61572/
INSERT #product (ProductId, Categories)
SELECT
C1.ProductID,
Categories = (
SELECT LTRIM(C.CategoryText + ' ')
FROM #category C
WHERE (C.ProductId = C1.ProductId)
FOR XML PATH('')
)
FROM #category C1
GROUP BY C1.ProductId
February 13, 2009 at 1:29 pm
I can work with this idea. I'm not great with XML so the XML piece didn't occur to me.
This introduces a new problem:
I have a value like "Pasta & Sauces". The XML is converting this to "Pasta & Sauces". This data will be fed into an index (non-SQL) for web site search, so I can't introduce anything into the data.
"/" doesn't convert to anything funky. Not sure what other than "&" would get changed.
February 13, 2009 at 1:44 pm
I think these are the XML characters that will most likely give you trouble (and it is hard work to get the entity encodings to display properly in this forum as well!)
& .... &
< .... <
> .... >
INSERT #product (ProductId, Categories)
SELECT
C1.ProductID,
Categories = REPLACE(REPLACE(REPLACE((
SELECT LTRIM(C.CategoryText + ' ')
FROM #category C
WHERE (C.ProductId = C1.ProductId)
FOR XML PATH('')), '<', '<'), '>', '>'), '&', '&')
FROM #category C1
GROUP BY C1.ProductId
February 13, 2009 at 2:37 pm
Very nice! It's working and performs well.
I'll have to try this in similar places using functions and CTEs, and see what performs better.
I can end my week on a high note, having this task solved, and a bit smarter for it. Thanks!
February 13, 2009 at 3:04 pm
Just thought of another way of dealing with the XML encoding, which is probably better than my previous offering. This method uses the XQuery value method to decode the generated XML text.
SELECT
C1.ProductID,
Categories = CONVERT(xml,
(SELECT C.CategoryText + ' '
FROM #category C
WHERE (C.ProductId = C1.ProductId)
FOR XML PATH(''))
).value('.', 'varchar(4000)')
FROM #category C1
GROUP BY C1.ProductId
Well safer at least, but performance is worse.
February 13, 2009 at 6:39 pm
Tony (2/13/2009)
Very nice! It's working and performs well.I'll have to try this in similar places using functions and CTEs, and see what performs better.
I can end my week on a high note, having this task solved, and a bit smarter for it. Thanks!
Ummm... ok.... help us be a bit smarter, please. You wrote...
The goal is to end up with Product.Categories having
a space-delimited list of the CategoryText values for each product.
Why??? This is one of the worst things you can to do data in a database. It's an extreme form of denormalization that will make your life miserable in the future. Please share with us what the business requirement to do this terrible thing is. And, no, not looking to slam you into the wall on this... I'm truly curious for two reasons...
1. The obvious... because it's such a terrible requirement.
2. If we know the real business reason for this, there may be a better way to do it.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2009 at 7:14 pm
The data is getting laid out in a denormalized form so it can be put into an external indexer (Lucene) for web sites to use for term searches.
There's a lot of data massage to take it from application tables & put it into a form easy for the index build (and also easy for the index update process to find changes).
The index is basically 1 entry per PK (product ID), and the rest of the data is indexable and searchable by the data in the field(s). The code I was trying to work out here is how to take multiple product description records and deliberately denormalize. Since I don't know in advance how many descriptions a product may have, the code spins then all into a single field.
The Lucene index is then able to search for text whether it's contained in 1 field or multiple.
February 13, 2009 at 7:55 pm
Thanks Tony. Just one more question... is your goal to create a text file for Lucene or are you just going to pass it through the API?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2009 at 7:35 am
We're only loading through the Lucene (.net) API.
For what it's worth, we're also looking at dtSearch. Same concept for feeding data.
Advantages: testing shows full build in 20% of the time, can return suggested similar search terms (that actually have matches) when no matches were found.
Disadvantages: dtSearch is commercial and so has a pricetag(whereas Lucene is opensource).
February 14, 2009 at 10:20 am
Then, I think Andrew's solution may be the best. If it was a text file input, you could simply do a BCP out using a simple SELECT and use a "space" as the end of row teminator.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply