String Concatenation of Data to Third Party App using FOR XML PATH

  • I’m working on getting some data put together to send to a another program. (Getting the addresses in the format to send to software that will presort it.)

    With some help from a friend, I have some code that I’m trying to understand before I put it into the production code. I have a few questions that I’m having a hard time wrapping my brains around and Books Online isn’t helping much.

    Here’s the code and sample data.

    if OBJECT_ID(N'tempdb..#Addresses') is not null

    drop table #Addresses;

    create table #Addresses (

    ID int identity primary key,

    ListID varchar(100),

    Address1 varchar(100),

    Address2 varchar(100),

    City varchar(100),

    State varchar(100),

    Zip char(10));

    Insert into #Addresses (ListID, Address1, Address2, City, State, Zip)

    Select 'PO Box 101', '', 'USA', '99999-9999' Union All

    Select 'PO Box 102', '', 'USA', '99999-9999' Union All

    Select 'PO Box 103', '', 'USA', '99999-9999' Union All

    Select 'PO Box 104', '', 'USA', '99999-9999' Union All

    Select 'PO Box 105', '', 'USA', '99999-9999' Union All

    Select 'PO Box 106', '', 'USA', '99999-9999' Union All

    Select 'PO Box 107', '', 'USA', '99999-9999' Union All

    Select 'PO Box 108', '', 'USA', '99999-9999' Union All

    Select 'PO Box 109', '', 'USA', '99999-9999' Union All

    Select 'PO Box 110', '123 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 111', '124 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 112', '125 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 113', '126 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 114', '127 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 115', '128 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 116', '129 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 117', '130 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 118', '131 Main St', 'USA', '99999-9999' Union All

    Select 'PO Box 119', '132 Main St', 'USA', '99999-9999';

    declare @BatchSize int,

    @FieldDelimiter char(1),

    @RecordDelimiter char(1);

    select @BatchSize = 5,

    @FieldDelimiter = CHAR(9),

    @RecordDelimiter = CHAR(10);

    ;with CTE as

    (select ROW_NUMBER() over (order by ID) as AddrNum,

    ROW_NUMBER() over (order by ID)/@BatchSize + 1 as Batch,

    (select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip+ @RecordDelimiter + '' for XML path('')) as Addr

    from #Addresses)

    select distinct Batch,

    (select Addr + @recordDelimiter

    from CTE as CTE2

    where CTE2.Batch = CTE.Batch

    for XML path(''))

    from CTE

    order by Batch;

    The expected output is to have in a single row all the records for that batch and this gives me that. But I have a question on the output. When I don’t have an Address2 (it’s an empty string), I get this "& a m p ; # x 0 9 ;" (spaced out cause it kept disappearing when I previewed the post) in the resulting string when I run it in Management Studio. Is this just the representation of the @FieldDelimiter on the screen and I don’t have to worry about it? Or is that the actual characters that will be in the data that gets sent to the outside program? And if it is just the representation of @FieldDelimiter, why don’t I have the same thing between the other fields?

    I also tried it using a single space in Address2 and I got the same results. The addresses with an Address2 (not an empty string or a space) don’t show this.

    Now, the other question I have is on the FOR XML PATH in the code. Specifically this construction that creates the third column of the CTE:

    (select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip+ @RecordDelimiter + '' for XML path('')) as Addr

    Before I use this construction, I want to understand what I’m doing. So, plucking that out of the CTE, I try this snippet of code

    declare @BatchSize int,

    @FieldDelimiter char(1),

    @RecordDelimiter char(1);

    select @BatchSize = 5,

    @FieldDelimiter = CHAR(9),

    @RecordDelimiter = CHAR(10);

    select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + @RecordDelimiter + '' for XML path('') as Addr

    from #Addresses

    And get the following error message:

    Msg 156, Level 15, State 1, Line 49

    Incorrect syntax near the keyword 'as'.

    Why am I getting this error message? What am I missing? If I put the FOR XML PATH after the FROM clause, I don’t get the error, but the results are in the hyperlink blue (which I guess is a result of not pairing it up with the other columns) which is different from when I run the whole thing. This leaves me with the question of why does the FOR XML PATH work in the SELECT part of the statement in the CTE and not in the regular select that I’m experimenting with.

    edit: To fix Code tags

    -- Kit

  • Just have time for super quick answer...

    Here's a working piece of concatenation code. Change ONE thing at a time and you'll understand where you're having issues. I remember struggling with this for hours untill I got it... so don't feel bad ;-).

    SELECT

    T.name,

    STUFF((

    SELECT

    ',' + name

    FROM

    sys.columns C

    WHERE C.object_id = T.object_id

    ORDER BY

    name

    FOR

    XML PATH('')

    ) , 1 , 1 , '') As Columns_

    FROM sys.tables T

    ORDER BY name

  • Taking your advice, I took another crack at taking it apart piece by piece. The code below is where I get lost as to why it's working when it seems to break a few rules I thought were set in stone. (The breaking sound is my brain trying to figure out what's going on and I don't know where to start looking to find out what SQL is doing.)

    Here's the code. (Temp table construction is in the first post.)

    (SELECT

    (SELECT cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + @RecordDelimiter + '' for XML path('')) as Addr

    --This is where I would expect another FROM clause.

    FROM #Addresses)

    This gives me results as just plain old string data, one row for each record in my dataset.

    What I don't understand is that I have a SELECT statement without a FROM clause and I don't understand how it is working (see comment in code). If I run that inner select, SQL complains because I don't have the FROM clause. When I add the FROM clause, it complains about the placement of the AS and when I move the FOR XML PATH to after the FROM clause, the code now runs, but gives results that look different and it's not using the alias as the column name.

    --inner select I'm talking about above

    SELECT cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + @RecordDelimiter + '' for XML path('')) as Addr

    ends up looking like this when it finally runs

    SELECT cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + @RecordDelimiter + '' as Addr

    from #Addresses

    for XML path('')

    This gives me the results as a hyperlink with "XML_<what looks like a GUID>" as the column name instead of "Addr".

    Can someone explain this behavior to me or point me to the reference that will explain it? My brain is getting fried trying to figure it out.

    -- Kit

  • Hi Kit,

    Try this:

    ;with CTE as

    (select ROW_NUMBER() over (order by ID) as AddrNum,

    ROW_NUMBER() over (order by ID)/@BatchSize + 1 as Batch,

    (select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip+ @RecordDelimiter + '' for XML path(''),TYPE).value('.','varchar(max)') as Addr

    from #Addresses)

    select distinct Batch,

    (select Addr + @recordDelimiter

    from CTE as CTE2

    where CTE2.Batch = CTE.Batch

    for XML path(''))

    from CTE

    order by Batch;

    For an explanation, see this article: Creating a comma-separated list (SQL Spackle)[/url]

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi Wayne,

    Thanks for the reply but that doesn't quite answer my question. The code works. What I'm trying to figure out is WHY it works so I can duplicate it in the future.

    My question concerns the comment I put in the snippet that I pulled out of the CTE code about the missing FROM clause. In all of the examples that I've seen in BOL, the FOR XML construction has:

    SELECT ....

    FROM ....

    FOR XML PATH ....

    But the construction for the third column in that CTE has

    SELECT

    (SELECT ...

    /*--MISSING FROM CLAUSE--*/

    FOR XML PATH ... AS (column name)

    )

    FROM ....

    What I'm looking for is a reference that explains WHY it works in that select clause without the FROM clause.

    (Edit: to get the post right, posted before I was ready.)

    -- Kit

  • Kit G (6/1/2011)


    Before I use this construction, I want to understand what I’m doing. So, plucking that out of the CTE, I try this snippet of code

    declare @BatchSize int,

    @FieldDelimiter char(1),

    @RecordDelimiter char(1);

    select @BatchSize = 5,

    @FieldDelimiter = CHAR(9),

    @RecordDelimiter = CHAR(10);

    select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + @RecordDelimiter + '' for XML path('') as Addr

    from #Addresses

    And get the following error message:

    Msg 156, Level 15, State 1, Line 49

    Incorrect syntax near the keyword 'as'.

    Why am I getting this error message? What am I missing? If I put the FOR XML PATH after the FROM clause, I don’t get the error, but the results are in the hyperlink blue (which I guess is a result of not pairing it up with the other columns) which is different from when I run the whole thing. This leaves me with the question of why does the FOR XML PATH work in the SELECT part of the statement in the CTE and not in the regular select that I’m experimenting with.

    edit: To fix Code tags

    You need to have For XML either after a From clause, or after a list of variables/columns in a sub-query without a From clause.

    Change it to:

    declare @BatchSize int,

    @FieldDelimiter char(1),

    @RecordDelimiter char(1);

    select @BatchSize = 5,

    @FieldDelimiter = CHAR(9),

    @RecordDelimiter = CHAR(10);

    select (select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + @RecordDelimiter + '' for XML path('')) as Addr

    from #Addresses

    Should work without an error message.

    What that then does is pull in the column values from the outer query. An inner query (correlated sub-query, to be technical) has access to the columns of the outer query. But it needs to be its own Select statement, enclosed in parentheses.

    You can get a similar effect with variables:

    declare @Var1 varchar(10), @Var2 varchar(10);

    select @Var1 = 'Var1', @Var2 = 'Var2';

    select @Var1 + ' is the value for @Var1, while ', @Var2 + ' is the value for @Var2' for XML path('')

    In the sub-query, it's doing the same thing. The outer query (which is just the name for the part outside the parentheses, hence "outer") passes in the column values as if they were variables, and the For XML then does its thing on them just like this last example.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • On the other question about XML escape characters in the results (the & amp bit), you'll need to (a) wrap your fields in IsNull or Coalesce functions to make sure you don't end up with missing columns. NULL + delimiter = NULL, so you need to replace the Null with a zero-length string.

    Try that, see if it does what you need.

    Edit: Also, try different delimiters. XML doesn't like certain characters at all.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Kit G (6/1/2011)


    Hi Wayne,

    Thanks for the reply but that doesn't quite answer my question. The code works. What I'm trying to figure out is WHY it works so I can duplicate it in the future.

    My question concerns the comment I put in the snippet that I pulled out of the CTE code about the missing FROM clause. In all of the examples that I've seen in BOL, the FOR XML construction has:

    SELECT ....

    FROM ....

    FOR XML PATH ....

    But the construction for the third column in that CTE has

    SELECT

    (SELECT ...

    /*--MISSING FROM CLAUSE--*/

    FOR XML PATH ... AS (column name)

    )

    FROM ....

    What I'm looking for is a reference that explains WHY it works in that select clause without the FROM clause.

    (Edit: to get the post right, posted before I was ready.)

    As an extra note, the final paren in your sample would need to be BEFORE the AS, not after the column alias. The sub-query, starting with "(SELECT" and ending with "FOR XML PATH)" is being treated by the outer query as a single column in the outer query.

    For an example that might be simpler (lower gradient learning curve):

    select (select 1) as JustA1 from #Addresses;

    You can put any Select statement inside those parentheses, just so long as it only returns ONE row and ONE column of data. If it tries to return more than one of either, you'll get an error message.

    This example is the query as:

    select 1 as JustA1 from #Addresses;

    It's just written as a sub-query to illustrate the point.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the replies. So, because it's an inner SELECT that's pulling from the same table, I don't have to put in the FROM clause. That makes sense.

    As for another delimiter, not sure I can change the delimiter. I'll check into that.

    -- Kit

  • Thanks everyone for your replies. I understand what the code is doing much better now. With a bit more help from GSquared, I got it working so that I get the right output with those annoying control characters and ampersands. I've included the code below. Had to go into a bit of XQuery to get it work properly.

    if OBJECT_ID(N'tempdb..#Addresses') is not null

    drop table #Addresses;

    create table #Addresses (

    ID int identity primary key,

    ListID varchar(100),

    Address1 varchar(100),

    Address2 varchar(100),

    City varchar(100),

    State varchar(100),

    Zip char(10));

    Insert into #Addresses (ListID, Address1, Address2, City, State, Zip)

    Select '1001', 'PO Box 101', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1002', 'PO Box 102', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1003', 'PO Box 103', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1004', 'PO Box 104', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1005', 'PO Box 105', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1006', 'PO Box 106', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1007', 'PO Box 107', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1008', 'PO Box 108', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1009', 'PO Box 109', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1010', 'PO Box 110', ' ', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1011', 'PO Box 111', '124 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1012', 'PO Box 112', '125 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1013', 'PO Box 113', '126 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1014', 'PO Box 114', '127 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1015', 'PO Box 115', '128 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1016', 'PO Box 116', '129 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1017', 'PO Box 117', '130 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1018', 'PO Box 118', '131 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1019', 'PO Box 119', '132 Main St', 'Anywhere', 'USA', '99999-9999' Union All

    Select '1020', 'PO Box 120', '133 Main St', 'Anywhere', 'USA', '99999-9999';

    declare @BatchSize int,

    @FieldDelimiter char(1),

    @RecordDelimiter char(1);

    select @BatchSize = 20,

    @FieldDelimiter = CHAR(9),

    @RecordDelimiter = CHAR(10);

    ;with CTE as

    (select ROW_NUMBER() over (order by ID) as AddrNum,

    ROW_NUMBER() over (order by ID)/@BatchSize + 1 as Batch,

    (select cast(ID as varchar(10)) + @FieldDelimiter,

    ListID + @FieldDelimiter,

    Address1 + @FieldDelimiter,

    Address2 + @FieldDelimiter,

    City + @FieldDelimiter,

    State + @FieldDelimiter,

    Zip + '' for XML path(''), type) as Addr

    from #Addresses)

    select distinct batch,

    (select Addr.value('(.)[1]','varchar(max)') + @recordDelimiter

    from CTE as CTE2

    where CTE2.Batch = CTE.Batch

    for XML path(''), type).value('(.)[1]','varchar(max)') as Addr1

    from CTE

    order by batch;

    edit to fix code tags

    -- Kit

  • Remember to add in IsNull/Coalesce around the columns in the inner query. Otherwise you'll be missing columns in the output string for some rows.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply