September 25, 2008 at 4:54 am
I did my own optimization on simple Tally tables, and it looks very good in my initial tests with a 99881 characters long list of IDs I am splitting. The result came by adding a few new tweaks to an example I saw in another article, so I like some feedback!
Here is my code (replace smiley with ')', it is added by forum and not under my control):
create function dbo.fnTally( @input varchar(max) ) returns table
as
return
(
select top ( isnull( len( @input ), 0 ) )
row_number() over ( order by anchor.constant ) as 'N'
from
( select 0 as constant ) anchor
cross join master.sys.All_Columns as t1 with( nolock )
cross join master.sys.All_Columns as t2 with( nolock )
)
;
Test code (100001 chars):
declare @STR varchar(max)
set @STR = replicate( convert( varchar(max), '1,32,' ), 20000 ) + '5';
select len( @STR );
select * from dbo.fnTally( @STR ) order by N;
As you can see, I evaded a SQL constraint of not being able to deal with constants in the mandatory 'over ( order by ... )'. This resulted in the effect that only the number of rows and not the values in them matter. This opens a way to (yet unexplored) denser source tables.
Another thing I did instinctively, but not measured its effect off is to use no locking (by means of a table hint) in a bid to prevent read locks coming into play and limiting resource use.
In my testing of using the shown tally function on the test input the estimated sub-tree cost of my whole query wend from 0.0131634 to 0.0071212 which is an improvement of over 45% !!! The benefit comes entirely from a no longer performed merge join that resulted from sorting on a column from the source table.
Another thing I wanted to try but did not do yet do is to make the 2nd cross join a conditional one by encapsulating it in a derived query that handles the condition. This would then just return 1 row in case the cross join is really not needed in full. The optimizer might already do this silently, but it wouldn't hurt to try and cut out some more I/O. It is just an idea.....might not work at all or simply backfire. I will try such a thing when I find some time and if it works will post here again!
September 27, 2008 at 7:07 am
peter (9/25/2008)
In my testing of using the shown tally function on the test input the estimated sub-tree cost of my whole query wend from 0.0131634 to 0.0071212 which is an improvement of over 45% !!! The benefit comes entirely from a no longer performed merge join that resulted from sorting on a column from the source table.
Cool... let's see the code for the test setup you did.
Another thing I wanted to try but did not do yet do is to make the 2nd cross join a conditional one by encapsulating it in a derived query that handles the condition. This would then just return 1 row in case the cross join is really not needed in full. The optimizer might already do this silently, but it wouldn't hurt to try and cut out some more I/O. It is just an idea.....might not work at all or simply backfire. I will try such a thing when I find some time and if it works will post here again!
I've found that it's not necessary to do that. A cross join will first grab all the rows from just one of the tables before it starts on the other. Even after that, it's very linear.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2008 at 11:21 am
As you can see Jeff, I have some *** synthetic *** test code in my post simulating the counting portion of the process. The 45% improvement is what the query plan shows when I compare a version with this optimization against one without. The version without is not using a derived query and performs an order by on a real field instead of the fabricated constant. The improvement in practice will be less of course as the counting is just part of a larger task to be performed. I mostly like the function because of its simplicity and having the option to replace it with something better if that comes along without modifying existing queries.
Quietly I was hoping you could put this function into one of your no doubt many practical tests and see what impact it will give. I just kind of bumped into this and have no direct non synthetic problem at hand that is big enough to show a clear outcome. In fact I have no access to any production server this and coming days at all (moving office).
September 29, 2008 at 2:24 pm
Heh... I can show you a query plan for two queries... one comes up with 100% of the batch size and the other 0%... when you run the queries, the 100% of the batch size takes a tiny, tiny fraction of what the 0% one does. I don't trust % of batch nor do I trust sub-tree costs... nor should you.
When I get home from work, I'll see if I can find those queries and post them.
I'm not sure that I'll actually have the time to test your good code, Peter... new gig has be driving 1:15:00 one way.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2008 at 6:15 am
As I should, I will do some thorough testing with a recognizable real world problem. It will take me a few days before I can do so and am already chewing on another idea :).
Stay tuned!
October 14, 2008 at 1:40 pm
Jeff, this is a great article.
I do have a question as I am trying to apply your solution to my needs.
I have hundreds of addresses in a single address column like this 1111 west street
When I employ you solution I get rows like this:
1111
west
street
Now I need to pivot the results into their own columns like this:
Number,name,suffix
1111,west,street
would be be so kind as to show me how?
Thanks
October 14, 2008 at 4:04 pm
GF (10/14/2008)
Jeff, this is a great article.I do have a question as I am trying to apply your solution to my needs.
I have hundreds of addresses in a single address column like this 1111 west street
When I employ you solution I get rows like this:
1111
west
street
Now I need to pivot the results into their own columns like this:
Number,name,suffix
1111,west,street
would be be so kind as to show me how?
Thanks
Hi GF,
If that's what you need, then forget about the solution using a table of numbers. That solution is great for breaking components up into several rows. If you need them in several columns, you need a different technique.
About two years ago, I wrote a blog post demonstrating such a technique. It uses the example of first name; middle initial; last name, but the techniques to be used remain the same. Here's the URL:
http://sqlblog.com/blogs/hugo_kornelis/archive/2006/10/12/Stuff.aspx
Edit: Fixed the URL
October 14, 2008 at 6:38 pm
GF (10/14/2008)
Jeff, this is a great article.I do have a question as I am trying to apply your solution to my needs.
I have hundreds of addresses in a single address column like this 1111 west street
When I employ you solution I get rows like this:
1111
west
street
Now I need to pivot the results into their own columns like this:
Number,name,suffix
1111,west,street
would be be so kind as to show me how?
Thanks
Thanks for the feedback, GF.
I can show you how to pivot the columns back pretty easily, but the problem with addresses is that they don't always have the same components. For example, you may have an address that looks like 123 East 39th Street. I'd recommend purchasing something like ZP4 or some other CASS certification program that will not only do the split correctly for you, but will also validate the address as being real or not.
For your "pivot", do you have a maximum number of columns an address would be split into?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 14, 2008 at 6:41 pm
peter (9/30/2008)
As I should, I will do some thorough testing with a recognizable real world problem. It will take me a few days before I can do so and am already chewing on another idea :).Stay tuned!
Peter, (or anyone else interested)...
Here's the code where the "Percent of Batch" is, quite literally, 100% wrong.
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2008-01-01'
;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 5, @DateVal)
)
select @Bitbucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO
--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2008-01-01'
SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))
@Bitbucket = @StartDate-1+t.N
FROM Tally t
ORDER BY N
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
--Jeff Moden
Change is inevitable... Change for the better is not.
October 15, 2008 at 3:49 pm
Jeff, thank you for taking the time to listen.
My current needs are to split an address into two parts.
example: 1234 south street needs to be split into 1234 and south street. I am already able to do this part.
the problem I am having is when there is only a number like 4567 and no street name.
example: 4567
So what I am try to get to is this:
column1 column2
1234 south street
4567
876 west
What I currently have is
column1
123 north ave
Thanks in advance.
Gary
October 16, 2008 at 11:55 am
Hey, Gary. I just worked on cleaning up a big address table, talk about fun! Anyway, how about something like this:
select left(Address, charindex(' ', Address) - 1) as StreetNum,
right(rtrim(Address), len(rtrim(Address)) - charindex(' ', Address)) as Street
from [whatever]
where len(rtrim(Address)) > 1
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
October 28, 2008 at 9:27 pm
GF (10/15/2008)
Jeff, thank you for taking the time to listen.My current needs are to split an address into two parts.
example: 1234 south street needs to be split into 1234 and south street. I am already able to do this part.
the problem I am having is when there is only a number like 4567 and no street name.
example: 4567
So what I am try to get to is this:
column1 column2
1234 south street
4567
876 west
What I currently have is
column1
123 north ave
Thanks in advance.
Gary
Gary.... I didn't answer because I thought that Wayne West's code probably did it for you. But, I gotta check... are you all set?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 29, 2008 at 6:39 am
Wayne West (10/16/2008)
Hey, Gary. I just worked on cleaning up a big address table, talk about fun! Anyway, how about something like this:
select left(Address, charindex(' ', Address) - 1) as StreetNum,
right(rtrim(Address), len(rtrim(Address)) - charindex(' ', Address)) as Street
from [whatever]
where len(rtrim(Address)) > 1
Since LEN ignores trailing spaces the RTRIM is not strictly necessary but then who knows if MS will change this in the future 😉
...the problem I am having is when there is only a number like 4567 and no street name...
The above query will not cope with this situation
This is my solution based on Waynes query
SELECT LEFT(Address, CHARINDEX(' ', Address + ' ') - 1) AS [StreetNum],
SUBSTRING(Address, CHARINDEX(' ',Address+' ') + 1, 255) AS [Street]
FROM
Far away is close at hand in the images of elsewhere.
Anon.
October 29, 2008 at 9:29 am
Huh, somehow I never noticed that in the documentation about Len() ignoring trailing spaces. Then again, chances are that I never opened BOL for Len() and just pulled it from previous programming experience. Our ERP system has trailing spaces on char fields, so I make it a habit of always trimming when dealing with any form of string manipulation.
Thanks for the catch, David. Nice solution. 🙂
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
October 29, 2008 at 9:35 am
Wayne West (10/29/2008)
Huh, somehow I never noticed that in the documentation about Len() ignoring trailing spaces. Then again, chances are that I never opened BOL for Len() and just pulled it from previous programming experience. Our ERP system has trailing spaces on char fields, so I make it a habit of always trimming when dealing with any form of string manipulation.Thanks for the catch, David. Nice solution. 🙂
Thanks Wayne 😀
Don't know if or where it is documented, just came across it when writing sql in the past and it is mentioned on threads on this forum.
I too trim data a lot as well, especially on data import.
Far away is close at hand in the images of elsewhere.
Anon.
Viewing 15 posts - 181 through 195 (of 511 total)
You must be logged in to reply to this topic. Login to reply