1-M join, concat all records

  • drop table #product;drop table #category;

    Create table #product (productID int, Categories varchar(4000) );

    Create table #category (productID int, CategoryText varchar(4000) );

    insert into #product values (1,'');

    insert into #product values (2,'');

    insert into #product values (5,'');

    insert into #category values (1,'Pasta');

    insert into #category values (1,'Noodle');

    insert into #category values (2,'Fish');

    insert into #category values (4,'');

    insert into #category values (5,'Fruit');

    Update Product

    Set Product.Categories = Product.Categories + Category.CategoryText + ' '

    From #product as Product INNER JOIN #category as Category

    On Product.ProductID = Category.ProductID

    select * from #product

    select * from #category

    Notes:

    * Product.Categories starts out blank, not null.

    * Category.CategoryText may be blank, but never null.

    * It's OK to have a blank space on the end of Product.Categories. No need to code to avoid or strip it.

    The goal is to end up with Product.Categories having

    a space-delimited list of the CategoryText values for each product.

    It does update Product.Categories, but for any records with a 1-M join, it only updates with the value from one of the records (seems to be the first record found for the join).

    ProductID #1 is getting a value "Pasta" when it should have "Pasta Noodle".

    How can I get the join & concatenation to use the values from ALL matching records?

  • I adapted the following solution from this article by Jeff Moden:

    http://www.sqlservercentral.com/articles/Test+Data/61572/

    INSERT #product (ProductId, Categories)

    SELECT

    C1.ProductID,

    Categories = (

    SELECT LTRIM(C.CategoryText + ' ')

    FROM #category C

    WHERE (C.ProductId = C1.ProductId)

    FOR XML PATH('')

    )

    FROM #category C1

    GROUP BY C1.ProductId

  • I can work with this idea. I'm not great with XML so the XML piece didn't occur to me.

    This introduces a new problem:

    I have a value like "Pasta & Sauces". The XML is converting this to "Pasta & Sauces". This data will be fed into an index (non-SQL) for web site search, so I can't introduce anything into the data.

    "/" doesn't convert to anything funky. Not sure what other than "&" would get changed.

  • I think these are the XML characters that will most likely give you trouble (and it is hard work to get the entity encodings to display properly in this forum as well!)

    & .... &

    < .... &lt;

    > .... &gt;

    INSERT #product (ProductId, Categories)

    SELECT

    C1.ProductID,

    Categories = REPLACE(REPLACE(REPLACE((

    SELECT LTRIM(C.CategoryText + ' ')

    FROM #category C

    WHERE (C.ProductId = C1.ProductId)

    FOR XML PATH('')), '&lt;', '<'), '&gt;', '>'), '&amp;', '&')

    FROM #category C1

    GROUP BY C1.ProductId

  • Very nice! It's working and performs well.

    I'll have to try this in similar places using functions and CTEs, and see what performs better.

    I can end my week on a high note, having this task solved, and a bit smarter for it. Thanks!

  • Just thought of another way of dealing with the XML encoding, which is probably better than my previous offering. This method uses the XQuery value method to decode the generated XML text.

    SELECT

    C1.ProductID,

    Categories = CONVERT(xml,

    (SELECT C.CategoryText + ' '

    FROM #category C

    WHERE (C.ProductId = C1.ProductId)

    FOR XML PATH(''))

    ).value('.', 'varchar(4000)')

    FROM #category C1

    GROUP BY C1.ProductId

    Well safer at least, but performance is worse.

  • Tony (2/13/2009)


    Very nice! It's working and performs well.

    I'll have to try this in similar places using functions and CTEs, and see what performs better.

    I can end my week on a high note, having this task solved, and a bit smarter for it. Thanks!

    Ummm... ok.... help us be a bit smarter, please. You wrote...

    The goal is to end up with Product.Categories having

    a space-delimited list of the CategoryText values for each product.

    Why??? This is one of the worst things you can to do data in a database. It's an extreme form of denormalization that will make your life miserable in the future. Please share with us what the business requirement to do this terrible thing is. And, no, not looking to slam you into the wall on this... I'm truly curious for two reasons...

    1. The obvious... because it's such a terrible requirement.

    2. If we know the real business reason for this, there may be a better way to do it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The data is getting laid out in a denormalized form so it can be put into an external indexer (Lucene) for web sites to use for term searches.

    There's a lot of data massage to take it from application tables & put it into a form easy for the index build (and also easy for the index update process to find changes).

    The index is basically 1 entry per PK (product ID), and the rest of the data is indexable and searchable by the data in the field(s). The code I was trying to work out here is how to take multiple product description records and deliberately denormalize. Since I don't know in advance how many descriptions a product may have, the code spins then all into a single field.

    The Lucene index is then able to search for text whether it's contained in 1 field or multiple.

  • Thanks Tony. Just one more question... is your goal to create a text file for Lucene or are you just going to pass it through the API?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We're only loading through the Lucene (.net) API.

    For what it's worth, we're also looking at dtSearch. Same concept for feeding data.

    Advantages: testing shows full build in 20% of the time, can return suggested similar search terms (that actually have matches) when no matches were found.

    Disadvantages: dtSearch is commercial and so has a pricetag(whereas Lucene is opensource).

  • Then, I think Andrew's solution may be the best. If it was a text file input, you could simply do a BCP out using a simple SELECT and use a "space" as the end of row teminator.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply