May 7, 2008 at 8:07 am
You folks are absolutely the Best! I'm humbled by your awesome compliments and the testing that alot of you have taken the time to do and post.
I've got a lot of replies to make to all of you and I'll jump on those tonight... contrary to popular myth, I actually do work between answering posts 😛 Thanks again, folks! Keep 'em coming! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 7, 2008 at 8:11 am
What - only 12 useful tips? in a single article?:hehe:
Stop it now - you're setting the bar too high for the rest of us....
(superlative as usual).....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 7, 2008 at 8:37 am
I am using the Tally table explanation from this article to parse out an incoming string that consists of whole records. By design, I know that the record has 3 values: an int, a datetime, and a float. What I need to get out are sets of those records parsed from the input string. The following SQL works fine (but I have questions):
[font="Courier New"]declare @param varchar(8000)
set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'
declare @result table(idx int identity(1,1), val int, ord datetime, fval float)
declare @raw table (idx int identity(1,1), val varchar(50))
set @param = ',' + @param + ','
insert into @raw (val)
select substring(@param, N+1, charindex(',',@param,N+1)-N-1)
from dbo.Tally where N<len(@param) and substring(@param,N,1)=','
insert into @result(val,ord,fval)
select cast(R1.val as int), convert(datetime, R2.val, 102), convert(float, R3.val)
from dbo.Tally T
join @raw R1 on T.N=R1.idx
join @raw R2 on T.N+1=R2.idx
join @raw R3 on T.N+2=R3.idx
where (T.N-1)%3 = 0
select * from @result[/font]
As you can see I am parsing into a raw table that is varchar data with an index, then using that table to move sets of 3 items into a new table with the correct data type conversion.
My questions are:
1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?
2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?
Thanks for the extremely interesting and useful article, and for any assistance with these questions.
May 7, 2008 at 8:45 am
Thanks for taking the time to explain this, Jeff. It was very helpful.
May 7, 2008 at 8:46 am
Carl Federl (5/7/2008)
Another great article ! ThanksThe first reference for a tally table that I can recall is in "Guide to Sybase and SQL Server" by C. J. Date and D.McGoveran published in June of 1992.
Does anyone know of an earlier reference ?
Carl, what is the page number of that? As it happens, I have the '92 edition in front of me (no idea why!) and couldn't find it readily.
It was a great book for its time, it was one of the first two books that I bought when we got 4.21a in the door.
Thanks!
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
May 7, 2008 at 8:52 am
Excellent article, Jeff! I came across an interesting date table article on Tech Republic that was linked here yesterday after it re-surfaced at BuilderAU.Com.au, I updated the loader to run from a tally table and the speed diff was just amazing.
I have not yet gotten in to string parsing with a tally table, but I'll get there eventually. I did have an amusing little time re-writing the Bizz/Buzz test using it:
[font="Courier New"]select case when (Number % 15 = 0) then 'BizzBuzz'
when (Number % 3 = 0) then 'Bizz'
when (Number % 5 = 0) then 'Buzz'
else cast(Number as char)
end
from zNumbers
where Number < 100[/font]
Keep up the good work, Jeff!
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
May 7, 2008 at 8:52 am
For the use of a tally table or other mechanism, populating a variable with sample data like the above is completely wrong! Those delimited values would never be passed by an application. The proper approach for generating sample data is to use RANDOM values.
John,
Since the only important bit of the data for the performance test is the number of commas using random data is overkill for this. I would, however, agree that using an ordered set is a mistake since you don't notice that the routine ends up sorting the results instead of returning them in the order they were passed as would be necessary for parameter passing
--
JimFive
May 7, 2008 at 8:57 am
May 7, 2008 at 9:02 am
1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?
Probably, but I haven't figured it out yet. One thing I would consider is, instead of splitting everything out at once. First, split everything into groups of 3 and then split the groups of 3 into your final table. That is, instead of selecting every comma, select every 3rd comma (using N%3 = 0) then use that query as the source for your actual query. This would allow you to get rid of the identity column as well since you are keeping your data grouped together. (See next response)
2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?
Since tables are unordered, if you remove the identity then your data won't be kept adjacent in the intermediate table.
--
JimFive
May 7, 2008 at 9:04 am
Jeff Marcus (5/7/2008)
I am using the Tally table explanation from this article to parse out an incoming string that consists of whole records. By design, I know that the record has 3 values: an int, a datetime, and a float. What I need to get out are sets of those records parsed from the input string. The following SQL works fine (but I have questions):[font="Courier New"]declare @param varchar(8000)
set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'
declare @result table(idx int identity(1,1), val int, ord datetime, fval float)
declare @raw table (idx int identity(1,1), val varchar(50))
set @param = ',' + @param + ','
insert into @raw (val)
select substring(@param, N+1, charindex(',',@param,N+1)-N-1)
from dbo.Tally where N<len(@param) and substring(@param,N,1)=','
insert into @result(val,ord,fval)
select cast(R1.val as int), convert(datetime, R2.val, 102), convert(float, R3.val)
from dbo.Tally T
join @raw R1 on T.N=R1.idx
join @raw R2 on T.N+1=R2.idx
join @raw R3 on T.N+2=R3.idx
where (T.N-1)%3 = 0
select * from @result[/font]
As you can see I am parsing into a raw table that is varchar data with an index, then using that table to move sets of 3 items into a new table with the correct data type conversion.
My questions are:
1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?
2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?
Thanks for the extremely interesting and useful article, and for any assistance with these questions.
> 1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while
> still being somewhat readable and understandable)?
You're using the intermediate table to generate ordinal values. If you are using SQL Server 2005 you
can use ROW_NUMBER instead. Also you can replace the three joins with a single "group by" [in my experience this is more efficient].
with cte as (
select row_number() over(order by N) - 1 as num,
substring(@param, N+1, charindex(',',@param,N+1)-N-1) as val
from dbo.Tally
where N<len(@param) and substring(@param,N,1)=',')
insert into @result(val,ord,fval)
select cast(max(case when num%3 = 0 then val end) as int) as val,
convert(datetime, max(case when num%3 = 1 then val end), 102) as ord,
convert(float, max(case when num%3 = 2 then val end)) as fval
from cte
group by num/3
select * from @result
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 7, 2008 at 9:14 am
John,
Since the only important bit of the data for the performance test is the number of commas using random data is overkill for this.
I disagree to a bit. It also affects the amount of parsing that is being done. Using random values of varying lengths is the proper approach. In addition, the length of the string will also affect the results, particularly when you cross over the 8,000 character (4,000 for UNICODE) limit and get into [N]VARCHAR(MAX).
I've been involved in this tally table vs. a table-valued function discussion before. I did some benchmarking and found some very interesting results when the list of values was not random data and/or the length of the data was short vs. long.
I'm not knocking the Tally table. I'm just stating that proper performance testing requires the use of good random data. I'm been around the block for a long time and have seen too many instances where the application worked fine in the performance tests with "generated" data but had major performance problems when "real" data was used -- as in customer data. In the "post-mortem", we'd find that the "generated" data was of such a nature that it was "helping out" the database.
May 7, 2008 at 9:21 am
This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!
May 7, 2008 at 9:30 am
buktseat (5/7/2008)
This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!
N is a column (or rather: the column) in the Tally table.
May 7, 2008 at 9:35 am
HEEEEYY!!! It's funny how it all snaps into focus once you get past those mundane details! 🙂 I knew it was something rediculous I was missing...
Thanks!
May 7, 2008 at 9:46 am
Jeff,
Plz don't forget to reply me with suggestion and help i requested on page 1 of this disussion.
Shamshad Ali.
Viewing 15 posts - 31 through 45 (of 511 total)
You must be logged in to reply to this topic. Login to reply