Join via String of Integers

Question

Post reply

Join via String of Integers

Andy Lucas

SSC Veteran

Points: 295
More actions
April 26, 2010 at 8:50 am

#231988

Hi
There are two tables one with a varchar column with strings of integers that relate to the other table's ID column I need to produce a join and result set
Here is some set up code describing the two tables with data and a third table showing the needed results.
I also have a function that returns a table of the intgers in a comma seperated string of integers - my question how can I use it in a join to produce the result set?
Many thanks
Andy
declare @C int
CREATE TABLE #Table_S(
IDs int IDENTITY(1,1) NOT NULL,
strInts varchar(50) NULL)
Insert #Table_S(strInts)
values('1,3,6,11')
Insert #Table_S(strInts)
values('2,4,7,8,9,10')
CREATE TABLE #Table_P(
IDp int IDENTITY(1,1) NOT NULL,
ProName varchar(10) NULL)
set @C=1
while @C<21
begin
Insert #Table_P(ProName)
values('line'+cast(@c as varchar(10)))
set @C=@c+1
end
CREATE TABLE #Table_R(
IDp int,
ProName varchar(50)
)
Insert #Table_R(IDp,ProName)
select IDp, 'line'+cast(IDp as varchar(10)) from #Table_P where IDp in (1,3,6,11)
Insert #Table_R(IDp,ProName)
select IDp, 'line'+cast(IDp as varchar(10)) from #Table_P where IDp in (2,4,7,8,9,10)
select * from #Table_S
drop table #Table_S
select * from #Table_P
drop table #Table_P
select * from #Table_R
drop table #Table_R
the UDF:
CREATE FUNCTION [dbo].[iter_intlist_to_table] (@list varchar(4000))
RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL,
number int NOT NULL) AS
BEGIN
DECLARE @pos int,
@textpos int,
@chunklen smallint,
@STR nvarchar(4000),
@tmpstr nvarchar(4000),
@leftover nvarchar(4000)
SET @textpos = 1
SET @leftover = ''
WHILE @textpos <= datalength(@list) / 2
BEGIN
SET @chunklen = 4000 - datalength(@leftover) / 2
SET @tmpstr = ltrim(@leftover + substring(@list, @textpos, @chunklen))
SET @textpos = @textpos + @chunklen
SET @pos = charindex(',', @tmpstr)
WHILE @pos > 0
BEGIN
SET @STR = substring(@tmpstr, 1, @pos - 1)
INSERT @tbl (number) VALUES(convert(int, @STR))
SET @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr)))
SET @pos = charindex(',', @tmpstr)
END
SET @leftover = @tmpstr
END
IF ltrim(rtrim(@leftover)) <> ''
INSERT @tbl (number) VALUES(convert(int, @leftover))
RETURN
END
--use: iter_intlist_to_table(@ids) @ids as in the form of '1,2,3,4,5,6'

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Andy Lucas SSC Veteran Points: 295 More actions · Answer 1

Well, here is an answer and pretty slick too 🙂

select number, c.proname from #table_s a

cross apply

iter_intlist_to_table(a.strInts) as b

inner join #table_p c

on b.number = c.idp

order by a.ids

with thanks to hanbingl over on the SQLteam.com

Stefan_G SSCertifiable Points: 6609 More actions · Answer 2

I would recommend using an inline table-valued function instead.

Like this:

-- Generate numbers between @start and @end

create function [dbo].[GetRange](@start int, @end int) returns table as

return

select top (@end-@start+1) (row_number() over(order by (select null)))-1+@start as n

from sys.system_columns c1

cross join sys.system_columns c2

cross join sys.system_columns c3

go

create function [dbo].[SplitString](@f varchar(max)) returns table

as

return

select

substring(@f, n+1,

case

when charindex(',', @f, n+1)=0 then 0x7fffffff

else charindex(',', @f, n+1)-n-1

end) as value, n

from dbo.getrange(0, len(@f))

where substring(@f, n, 1)=',' or n=0

go

select IDp, ProName

from #Table_S s1

cross apply dbo.SplitString(s1.strInts) s2

join #Table_P p on s2.value = p.idp

ITVFs have very much better performance. 😎

/SG

Andy Lucas SSC Veteran Points: 295 More actions · Answer 3

Andy Lucas

SSC Veteran

Points: 295

April 27, 2010 at 4:13 am

#1158072

Even slicker Stefan_G - Many thanks!

Stefan_G SSCertifiable Points: 6609 More actions · Answer 4

EDIT: Please ignore this post. My performance test was flawed.

It gave totally wrong results because I was using a test table with 1000000 identical strings. With that test data the SQL Server only split the string once and reused the results.

In reality my original itvf-method is the fastest of the methods discussed here. xml is much, much slower than the other methods.

I just made some performance tests and I realized that converting a comma-separated string to a table is much more efficient using XML than my previous method.

Like this:

create function SplitStringX(@f varchar(max)) returns table

as

return

select v.value('.','int') as value

from (select convert(xml, '<v>'+replace(@f,',','</v><v>')+'</v>') as f ) t1

cross apply t1.f.nodes('v') r(v)

go

-- Performance test

select '12,3,44,560,345556,22,1,,33' as f

into #t

from dbo.getrange(1,1000000)

go

--old method

select max(value)

from #t cross apply dbo.SplitString(f)

-- 29 seconds

-- new xml-based method

select max(value)

from #t cross apply dbo.SplitStringX(f)

-- 4 seconds

-- original multi-statement method

select max(number)

from #t cross apply dbo.[iter_intlist_to_table](f)

-- 5 seconds

So, my first mehod is much slower, the xml method is the fastest, and the original method is almost as fast as the xml method.

Quite surprising actually.

/SG

Andy Lucas SSC Veteran Points: 295 More actions · Answer 5

Are you sure that is correct? I get a different result from your original method.

And it takes a lot longer than 29 secs - more like 79 on my system!

Stefan_G SSCertifiable Points: 6609 More actions · Answer 6

Andy Lucas (4/27/2010)
Are you sure that is correct? I get a different result from your original method.
And it takes a lot longer than 29 secs - more like 79 on my system!

Sorry, I made a small change to my original method, I changed the datatype from varchar(max) to varchar(8000). That changed the time from 98 to 29 seconds on my system.

Paul White SSC Guru Points: 150468 More actions · Answer 7

On string splitting and performance:

There was a *huge* thread on this last year on SSC. Florian Reischl was good enough to post a very comprehensive set of results on his blog:

http://florianreischl.blogspot.com/2009/09/high-performance-string-split-functions.html

The method I use and recommend is based on:

http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Eric M Russell SSC Guru Points: 125562 More actions · Answer 8

Simply get the IDs into a table variable, and then the problem is solved.

I encounter this scenario a lot with stored procedures, typically developed for reporting purposes, which accept a delimited string of IDs as an input parameter. What I do is first return the IDs from a table valued UDF into a table variable, and then join that table variable with the application table(s).

-- create temp table with sample data:

declare @table_p table

(

idp int identity(1,1) not null,

proname varchar(10) null

);

declare @C int;

set @C=1

while @C<21

begin

insert @table_p(proname) values('line'+cast(@c as varchar(10)))

set @C=@c+1

end;

-- first, return result from table valued udf into temp table:

declare @intlist table

(

idp int not null primary key

);

insert into @intlist (idp)

select number

from iter_intlist_to_table('1,2,3,4,5,6');

-- next, join the 2 temp tables:

select p.idp, p.proname

from @table_p p

join @intlist i on i.idp = p.idp;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Andy Lucas SSC Veteran Points: 295 More actions · Answer 9

Yes thanks for that simple explanation.

I was actually stumped because the csi's (comma seperated integers) are in another table column and I was confused as to how to make that join.

Andy Lucas SSC Veteran Points: 295 More actions · Answer 10

My app is working perfectly fine using Stefan_G's second production (it seems to be the fastest apart from Paul's SQLCLR masterpiece)

Just a question based on this thread and the reason I first asked for help.

How can I get to grips with joins - in my head?

I have a fairly logical brain but I just seem to get confused when I try to work out joins.

Is there a way i can view/visualize the process? How do you guys handle it so well?

Is there a really simple book on it or something you can recommend?

Thanks for your thoughts

Stefan_G SSCertifiable Points: 6609 More actions · Answer 11

Paul White NZ (4/27/2010)
On string splitting and performance:
There was a *huge* thread on this last year on SSC. Florian Reischl was good enough to post a very comprehensive set of results on his blog:
http://florianreischl.blogspot.com/2009/09/high-performance-string-split-functions.html
The method I use and recommend is based on:
http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx
Paul

Really interesting thread.

It is clear that the subject of high-performance string splitting is much more complex than it looks.

Stefan_G SSCertifiable Points: 6609 More actions · Answer 12

Andy Lucas (4/27/2010)
My app is working perfectly fine using Stefan_G's second production (it seems to be the fastest apart from Paul's SQLCLR masterpiece)

Please dont use the xml-based method if you are at all concerned with performance. My original performance test was seriously flawed. When using identical strings the SQL server is smart enough to split the string once and reuse the results.

Splitting 2000 random 4000-character long strings takes 11 seconds using my SplitString function, but 3800 seconds for the SplitStringX function.

Using CLR is much, much faster. Splitting the same 2000 random 4000-character strings takes less than a second.

Sorry for the confusion

/SG

Eric M Russell SSC Guru Points: 125562 More actions · Answer 13

Here is a well illustrated primer on SQL joins by Pinal Dave, one of the best SQL Server bloggers.

http://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/

The specific situation you present here, which is parsing out a column containing delimited IDs and using them to join tables, is a classic example of "denormalization". As a result, the join syntax is now made more confusing and difficult, requiring a user defined function and derived table. The IDs should be contained in a related table as foreign keys.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho