Join via String of Integers

  • Hi

    There are two tables one with a varchar column with strings of integers that relate to the other table's ID column I need to produce a join and result set

    Here is some set up code describing the two tables with data and a third table showing the needed results.

    I also have a function that returns a table of the intgers in a comma seperated string of integers - my question how can I use it in a join to produce the result set?

    Many thanks

    Andy

    declare @C int

    CREATE TABLE #Table_S(

    IDs int IDENTITY(1,1) NOT NULL,

    strInts varchar(50) NULL)

    Insert #Table_S(strInts)

    values('1,3,6,11')

    Insert #Table_S(strInts)

    values('2,4,7,8,9,10')

    CREATE TABLE #Table_P(

    IDp int IDENTITY(1,1) NOT NULL,

    ProName varchar(10) NULL)

    set @C=1

    while @C<21

    begin

    Insert #Table_P(ProName)

    values('line'+cast(@c as varchar(10)))

    set @C=@c+1

    end

    CREATE TABLE #Table_R(

    IDp int,

    ProName varchar(50)

    )

    Insert #Table_R(IDp,ProName)

    select IDp, 'line'+cast(IDp as varchar(10)) from #Table_P where IDp in (1,3,6,11)

    Insert #Table_R(IDp,ProName)

    select IDp, 'line'+cast(IDp as varchar(10)) from #Table_P where IDp in (2,4,7,8,9,10)

    select * from #Table_S

    drop table #Table_S

    select * from #Table_P

    drop table #Table_P

    select * from #Table_R

    drop table #Table_R

    the UDF:

    CREATE FUNCTION [dbo].[iter_intlist_to_table] (@list varchar(4000))

    RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL,

    number int NOT NULL) AS

    BEGIN

    DECLARE @pos int,

    @textpos int,

    @chunklen smallint,

    @STR nvarchar(4000),

    @tmpstr nvarchar(4000),

    @leftover nvarchar(4000)

    SET @textpos = 1

    SET @leftover = ''

    WHILE @textpos <= datalength(@list) / 2

    BEGIN

    SET @chunklen = 4000 - datalength(@leftover) / 2

    SET @tmpstr = ltrim(@leftover + substring(@list, @textpos, @chunklen))

    SET @textpos = @textpos + @chunklen

    SET @pos = charindex(',', @tmpstr)

    WHILE @pos > 0

    BEGIN

    SET @STR = substring(@tmpstr, 1, @pos - 1)

    INSERT @tbl (number) VALUES(convert(int, @STR))

    SET @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr)))

    SET @pos = charindex(',', @tmpstr)

    END

    SET @leftover = @tmpstr

    END

    IF ltrim(rtrim(@leftover)) <> ''

    INSERT @tbl (number) VALUES(convert(int, @leftover))

    RETURN

    END

    --use: iter_intlist_to_table(@ids) @ids as in the form of '1,2,3,4,5,6'

  • Well, here is an answer and pretty slick too 🙂

    select number, c.proname from #table_s a

    cross apply

    iter_intlist_to_table(a.strInts) as b

    inner join #table_p c

    on b.number = c.idp

    order by a.ids

    with thanks to hanbingl over on the SQLteam.com

  • I would recommend using an inline table-valued function instead.

    Like this:

    -- Generate numbers between @start and @end

    create function [dbo].[GetRange](@start int, @end int) returns table as

    return

    select top (@end-@start+1) (row_number() over(order by (select null)))-1+@start as n

    from sys.system_columns c1

    cross join sys.system_columns c2

    cross join sys.system_columns c3

    go

    create function [dbo].[SplitString](@f varchar(max)) returns table

    as

    return

    select

    substring(@f, n+1,

    case

    when charindex(',', @f, n+1)=0 then 0x7fffffff

    else charindex(',', @f, n+1)-n-1

    end) as value, n

    from dbo.getrange(0, len(@f))

    where substring(@f, n, 1)=',' or n=0

    go

    select IDp, ProName

    from #Table_S s1

    cross apply dbo.SplitString(s1.strInts) s2

    join #Table_P p on s2.value = p.idp

    ITVFs have very much better performance. 😎

    /SG

  • Even slicker Stefan_G - Many thanks!

  • EDIT: Please ignore this post. My performance test was flawed.

    It gave totally wrong results because I was using a test table with 1000000 identical strings. With that test data the SQL Server only split the string once and reused the results.

    In reality my original itvf-method is the fastest of the methods discussed here. xml is much, much slower than the other methods.


    I just made some performance tests and I realized that converting a comma-separated string to a table is much more efficient using XML than my previous method.

    Like this:

    create function SplitStringX(@f varchar(max)) returns table

    as

    return

    select v.value('.','int') as value

    from (select convert(xml, '<v>'+replace(@f,',','</v><v>')+'</v>') as f ) t1

    cross apply t1.f.nodes('v') r(v)

    go

    -- Performance test

    select '12,3,44,560,345556,22,1,,33' as f

    into #t

    from dbo.getrange(1,1000000)

    go

    --old method

    select max(value)

    from #t cross apply dbo.SplitString(f)

    -- 29 seconds

    -- new xml-based method

    select max(value)

    from #t cross apply dbo.SplitStringX(f)

    -- 4 seconds

    -- original multi-statement method

    select max(number)

    from #t cross apply dbo.[iter_intlist_to_table](f)

    -- 5 seconds

    So, my first mehod is much slower, the xml method is the fastest, and the original method is almost as fast as the xml method.

    Quite surprising actually.

    /SG

  • Are you sure that is correct? I get a different result from your original method.

    And it takes a lot longer than 29 secs - more like 79 on my system!

  • Andy Lucas (4/27/2010)


    Are you sure that is correct? I get a different result from your original method.

    And it takes a lot longer than 29 secs - more like 79 on my system!

    Sorry, I made a small change to my original method, I changed the datatype from varchar(max) to varchar(8000). That changed the time from 98 to 29 seconds on my system.

  • On string splitting and performance:

    There was a *huge* thread on this last year on SSC. Florian Reischl was good enough to post a very comprehensive set of results on his blog:

    http://florianreischl.blogspot.com/2009/09/high-performance-string-split-functions.html

    The method I use and recommend is based on:

    http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

    Paul

  • Simply get the IDs into a table variable, and then the problem is solved.

    I encounter this scenario a lot with stored procedures, typically developed for reporting purposes, which accept a delimited string of IDs as an input parameter. What I do is first return the IDs from a table valued UDF into a table variable, and then join that table variable with the application table(s).

    -- create temp table with sample data:

    declare @table_p table

    (

    idp int identity(1,1) not null,

    proname varchar(10) null

    );

    declare @C int;

    set @C=1

    while @C<21

    begin

    insert @table_p(proname) values('line'+cast(@c as varchar(10)))

    set @C=@c+1

    end;

    -- first, return result from table valued udf into temp table:

    declare @intlist table

    (

    idp int not null primary key

    );

    insert into @intlist (idp)

    select number

    from iter_intlist_to_table('1,2,3,4,5,6');

    -- next, join the 2 temp tables:

    select p.idp, p.proname

    from @table_p p

    join @intlist i on i.idp = p.idp;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Yes thanks for that simple explanation.

    I was actually stumped because the csi's (comma seperated integers) are in another table column and I was confused as to how to make that join.

  • My app is working perfectly fine using Stefan_G's second production (it seems to be the fastest apart from Paul's SQLCLR masterpiece)

    Just a question based on this thread and the reason I first asked for help.

    How can I get to grips with joins - in my head?

    I have a fairly logical brain but I just seem to get confused when I try to work out joins.

    Is there a way i can view/visualize the process? How do you guys handle it so well?

    Is there a really simple book on it or something you can recommend?

    Thanks for your thoughts

  • Paul White NZ (4/27/2010)


    On string splitting and performance:

    There was a *huge* thread on this last year on SSC. Florian Reischl was good enough to post a very comprehensive set of results on his blog:

    http://florianreischl.blogspot.com/2009/09/high-performance-string-split-functions.html

    The method I use and recommend is based on:

    http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

    Paul

    Really interesting thread.

    It is clear that the subject of high-performance string splitting is much more complex than it looks.

  • Andy Lucas (4/27/2010)


    My app is working perfectly fine using Stefan_G's second production (it seems to be the fastest apart from Paul's SQLCLR masterpiece)

    Please dont use the xml-based method if you are at all concerned with performance. My original performance test was seriously flawed. When using identical strings the SQL server is smart enough to split the string once and reuse the results.

    Splitting 2000 random 4000-character long strings takes 11 seconds using my SplitString function, but 3800 seconds for the SplitStringX function.

    Using CLR is much, much faster. Splitting the same 2000 random 4000-character strings takes less than a second.

    Sorry for the confusion

    /SG

  • Here is a well illustrated primer on SQL joins by Pinal Dave, one of the best SQL Server bloggers.

    http://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/

    The specific situation you present here, which is parsing out a column containing delimited IDs and using them to join tables, is a classic example of "denormalization". As a result, the join syntax is now made more confusing and difficult, requiring a user defined function and derived table. The IDs should be contained in a related table as foreign keys.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply