April 26, 2010 at 8:50 am
Hi
There are two tables one with a varchar column with strings of integers that relate to the other table's ID column I need to produce a join and result set
Here is some set up code describing the two tables with data and a third table showing the needed results.
I also have a function that returns a table of the intgers in a comma seperated string of integers - my question how can I use it in a join to produce the result set?
Many thanks
Andy
declare @C int
CREATE TABLE #Table_S(
IDs int IDENTITY(1,1) NOT NULL,
strInts varchar(50) NULL)
Insert #Table_S(strInts)
values('1,3,6,11')
Insert #Table_S(strInts)
values('2,4,7,8,9,10')
CREATE TABLE #Table_P(
IDp int IDENTITY(1,1) NOT NULL,
ProName varchar(10) NULL)
set @C=1
while @C<21
begin
Insert #Table_P(ProName)
values('line'+cast(@c as varchar(10)))
set @C=@c+1
end
CREATE TABLE #Table_R(
IDp int,
ProName varchar(50)
)
Insert #Table_R(IDp,ProName)
select IDp, 'line'+cast(IDp as varchar(10)) from #Table_P where IDp in (1,3,6,11)
Insert #Table_R(IDp,ProName)
select IDp, 'line'+cast(IDp as varchar(10)) from #Table_P where IDp in (2,4,7,8,9,10)
select * from #Table_S
drop table #Table_S
select * from #Table_P
drop table #Table_P
select * from #Table_R
drop table #Table_R
the UDF:
CREATE FUNCTION [dbo].[iter_intlist_to_table] (@list varchar(4000))
RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL,
number int NOT NULL) AS
BEGIN
DECLARE @pos int,
@textpos int,
@chunklen smallint,
@STR nvarchar(4000),
@tmpstr nvarchar(4000),
@leftover nvarchar(4000)
SET @textpos = 1
SET @leftover = ''
WHILE @textpos <= datalength(@list) / 2
BEGIN
SET @chunklen = 4000 - datalength(@leftover) / 2
SET @tmpstr = ltrim(@leftover + substring(@list, @textpos, @chunklen))
SET @textpos = @textpos + @chunklen
SET @pos = charindex(',', @tmpstr)
WHILE @pos > 0
BEGIN
SET @STR = substring(@tmpstr, 1, @pos - 1)
INSERT @tbl (number) VALUES(convert(int, @STR))
SET @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr)))
SET @pos = charindex(',', @tmpstr)
END
SET @leftover = @tmpstr
END
IF ltrim(rtrim(@leftover)) <> ''
INSERT @tbl (number) VALUES(convert(int, @leftover))
RETURN
END
--use: iter_intlist_to_table(@ids) @ids as in the form of '1,2,3,4,5,6'
April 26, 2010 at 10:26 am
Well, here is an answer and pretty slick too 🙂
select number, c.proname from #table_s a
cross apply
iter_intlist_to_table(a.strInts) as b
inner join #table_p c
on b.number = c.idp
order by a.ids
with thanks to hanbingl over on the SQLteam.com
April 27, 2010 at 3:00 am
I would recommend using an inline table-valued function instead.
Like this:
-- Generate numbers between @start and @end
create function [dbo].[GetRange](@start int, @end int) returns table as
return
select top (@end-@start+1) (row_number() over(order by (select null)))-1+@start as n
from sys.system_columns c1
cross join sys.system_columns c2
cross join sys.system_columns c3
go
create function [dbo].[SplitString](@f varchar(max)) returns table
as
return
select
substring(@f, n+1,
case
when charindex(',', @f, n+1)=0 then 0x7fffffff
else charindex(',', @f, n+1)-n-1
end) as value, n
from dbo.getrange(0, len(@f))
where substring(@f, n, 1)=',' or n=0
go
select IDp, ProName
from #Table_S s1
cross apply dbo.SplitString(s1.strInts) s2
join #Table_P p on s2.value = p.idp
ITVFs have very much better performance. 😎
/SG
April 27, 2010 at 4:13 am
Even slicker Stefan_G - Many thanks!
April 27, 2010 at 4:15 am
EDIT: Please ignore this post. My performance test was flawed.
It gave totally wrong results because I was using a test table with 1000000 identical strings. With that test data the SQL Server only split the string once and reused the results.
In reality my original itvf-method is the fastest of the methods discussed here. xml is much, much slower than the other methods.
I just made some performance tests and I realized that converting a comma-separated string to a table is much more efficient using XML than my previous method.
Like this:
create function SplitStringX(@f varchar(max)) returns table
as
return
select v.value('.','int') as value
from (select convert(xml, '<v>'+replace(@f,',','</v><v>')+'</v>') as f ) t1
cross apply t1.f.nodes('v') r(v)
go
-- Performance test
select '12,3,44,560,345556,22,1,,33' as f
into #t
from dbo.getrange(1,1000000)
go
--old method
select max(value)
from #t cross apply dbo.SplitString(f)
-- 29 seconds
-- new xml-based method
select max(value)
from #t cross apply dbo.SplitStringX(f)
-- 4 seconds
-- original multi-statement method
select max(number)
from #t cross apply dbo.[iter_intlist_to_table](f)
-- 5 seconds
So, my first mehod is much slower, the xml method is the fastest, and the original method is almost as fast as the xml method.
Quite surprising actually.
/SG
April 27, 2010 at 5:29 am
Are you sure that is correct? I get a different result from your original method.
And it takes a lot longer than 29 secs - more like 79 on my system!
April 27, 2010 at 5:46 am
Andy Lucas (4/27/2010)
Are you sure that is correct? I get a different result from your original method.And it takes a lot longer than 29 secs - more like 79 on my system!
Sorry, I made a small change to my original method, I changed the datatype from varchar(max) to varchar(8000). That changed the time from 98 to 29 seconds on my system.
April 27, 2010 at 7:01 am
On string splitting and performance:
There was a *huge* thread on this last year on SSC. Florian Reischl was good enough to post a very comprehensive set of results on his blog:
http://florianreischl.blogspot.com/2009/09/high-performance-string-split-functions.html
The method I use and recommend is based on:
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 27, 2010 at 8:31 am
Simply get the IDs into a table variable, and then the problem is solved.
I encounter this scenario a lot with stored procedures, typically developed for reporting purposes, which accept a delimited string of IDs as an input parameter. What I do is first return the IDs from a table valued UDF into a table variable, and then join that table variable with the application table(s).
-- create temp table with sample data:
declare @table_p table
(
idp int identity(1,1) not null,
proname varchar(10) null
);
declare @C int;
set @C=1
while @C<21
begin
insert @table_p(proname) values('line'+cast(@c as varchar(10)))
set @C=@c+1
end;
-- first, return result from table valued udf into temp table:
declare @intlist table
(
idp int not null primary key
);
insert into @intlist (idp)
select number
from iter_intlist_to_table('1,2,3,4,5,6');
-- next, join the 2 temp tables:
select p.idp, p.proname
from @table_p p
join @intlist i on i.idp = p.idp;
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
April 27, 2010 at 9:43 am
Yes thanks for that simple explanation.
I was actually stumped because the csi's (comma seperated integers) are in another table column and I was confused as to how to make that join.
April 27, 2010 at 9:53 am
My app is working perfectly fine using Stefan_G's second production (it seems to be the fastest apart from Paul's SQLCLR masterpiece)
Just a question based on this thread and the reason I first asked for help.
How can I get to grips with joins - in my head?
I have a fairly logical brain but I just seem to get confused when I try to work out joins.
Is there a way i can view/visualize the process? How do you guys handle it so well?
Is there a really simple book on it or something you can recommend?
Thanks for your thoughts
April 28, 2010 at 12:58 am
Paul White NZ (4/27/2010)
On string splitting and performance:There was a *huge* thread on this last year on SSC. Florian Reischl was good enough to post a very comprehensive set of results on his blog:
http://florianreischl.blogspot.com/2009/09/high-performance-string-split-functions.html
The method I use and recommend is based on:
Paul
Really interesting thread.
It is clear that the subject of high-performance string splitting is much more complex than it looks.
April 28, 2010 at 2:37 am
Andy Lucas (4/27/2010)
My app is working perfectly fine using Stefan_G's second production (it seems to be the fastest apart from Paul's SQLCLR masterpiece)
Please dont use the xml-based method if you are at all concerned with performance. My original performance test was seriously flawed. When using identical strings the SQL server is smart enough to split the string once and reuse the results.
Splitting 2000 random 4000-character long strings takes 11 seconds using my SplitString function, but 3800 seconds for the SplitStringX function.
Using CLR is much, much faster. Splitting the same 2000 random 4000-character strings takes less than a second.
Sorry for the confusion
/SG
April 28, 2010 at 7:45 am
Here is a well illustrated primer on SQL joins by Pinal Dave, one of the best SQL Server bloggers.
http://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/
The specific situation you present here, which is parsing out a column containing delimited IDs and using them to join tables, is a classic example of "denormalization". As a result, the join syntax is now made more confusing and difficult, requiring a user defined function and derived table. The IDs should be contained in a related table as foreign keys.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply