The "Numbers" or "Tally" Table: What it is and how it replaces a loop

Question

The "Numbers" or "Tally" Table: What it is and how it replaces a loop

Viewing 15 posts - 181 through 195 (of 511 total)

You must be logged in to reply to this topic. Login to reply

peter-757102 SSCertifiable Points: 6877 More actions · Answer 1

I did my own optimization on simple Tally tables, and it looks very good in my initial tests with a 99881 characters long list of IDs I am splitting. The result came by adding a few new tweaks to an example I saw in another article, so I like some feedback!

Here is my code (replace smiley with ')', it is added by forum and not under my control):

create function dbo.fnTally( @input varchar(max) ) returns table

as

return

(

select top ( isnull( len( @input ), 0 ) )

row_number() over ( order by anchor.constant ) as 'N'

from

( select 0 as constant ) anchor

cross join master.sys.All_Columns as t1 with( nolock )

cross join master.sys.All_Columns as t2 with( nolock )

)

;

Test code (100001 chars):

declare @STR varchar(max)

set @STR = replicate( convert( varchar(max), '1,32,' ), 20000 ) + '5';

select len( @STR );

select * from dbo.fnTally( @STR ) order by N;

As you can see, I evaded a SQL constraint of not being able to deal with constants in the mandatory 'over ( order by ... )'. This resulted in the effect that only the number of rows and not the values in them matter. This opens a way to (yet unexplored) denser source tables.

Another thing I did instinctively, but not measured its effect off is to use no locking (by means of a table hint) in a bid to prevent read locks coming into play and limiting resource use.

In my testing of using the shown tally function on the test input the estimated sub-tree cost of my whole query wend from 0.0131634 to 0.0071212 which is an improvement of over 45% !!! The benefit comes entirely from a no longer performed merge join that resulted from sorting on a column from the source table.

Another thing I wanted to try but did not do yet do is to make the 2nd cross join a conditional one by encapsulating it in a derived query that handles the condition. This would then just return 1 row in case the cross join is really not needed in full. The optimizer might already do this silently, but it wouldn't hurt to try and cut out some more I/O. It is just an idea.....might not work at all or simply backfire. I will try such a thing when I find some time and if it works will post here again!

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 2

peter (9/25/2008)
In my testing of using the shown tally function on the test input the estimated sub-tree cost of my whole query wend from 0.0131634 to 0.0071212 which is an improvement of over 45% !!! The benefit comes entirely from a no longer performed merge join that resulted from sorting on a column from the source table.

Cool... let's see the code for the test setup you did.

Another thing I wanted to try but did not do yet do is to make the 2nd cross join a conditional one by encapsulating it in a derived query that handles the condition. This would then just return 1 row in case the cross join is really not needed in full. The optimizer might already do this silently, but it wouldn't hurt to try and cut out some more I/O. It is just an idea.....might not work at all or simply backfire. I will try such a thing when I find some time and if it works will post here again!

I've found that it's not necessary to do that. A cross join will first grab all the rows from just one of the tables before it starts on the other. Even after that, it's very linear.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

peter-757102 SSCertifiable Points: 6877 More actions · Answer 3

As you can see Jeff, I have some *** synthetic *** test code in my post simulating the counting portion of the process. The 45% improvement is what the query plan shows when I compare a version with this optimization against one without. The version without is not using a derived query and performs an order by on a real field instead of the fabricated constant. The improvement in practice will be less of course as the counting is just part of a larger task to be performed. I mostly like the function because of its simplicity and having the option to replace it with something better if that comes along without modifying existing queries.

Quietly I was hoping you could put this function into one of your no doubt many practical tests and see what impact it will give. I just kind of bumped into this and have no direct non synthetic problem at hand that is big enough to show a clear outcome. In fact I have no access to any production server this and coming days at all (moving office).

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 4

Heh... I can show you a query plan for two queries... one comes up with 100% of the batch size and the other 0%... when you run the queries, the 100% of the batch size takes a tiny, tiny fraction of what the 0% one does. I don't trust % of batch nor do I trust sub-tree costs... nor should you.

When I get home from work, I'll see if I can find those queries and post them.

I'm not sure that I'll actually have the time to test your good code, Peter... new gig has be driving 1:15:00 one way.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

peter-757102 SSCertifiable Points: 6877 More actions · Answer 5

As I should, I will do some thorough testing with a recognizable real world problem. It will take me a few days before I can do so and am already chewing on another idea :).

Stay tuned!

GF Hall of Fame Points: 3774 More actions · Answer 6

Jeff, this is a great article.

I do have a question as I am trying to apply your solution to my needs.

I have hundreds of addresses in a single address column like this 1111 west street

When I employ you solution I get rows like this:

1111

west

street

Now I need to pivot the results into their own columns like this:

Number,name,suffix

1111,west,street

would be be so kind as to show me how?

Thanks

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

GF (10/14/2008)
Jeff, this is a great article.
I do have a question as I am trying to apply your solution to my needs.
I have hundreds of addresses in a single address column like this 1111 west street
When I employ you solution I get rows like this:
1111
west
street
Now I need to pivot the results into their own columns like this:
Number,name,suffix
1111,west,street
would be be so kind as to show me how?
Thanks

Hi GF,

If that's what you need, then forget about the solution using a table of numbers. That solution is great for breaking components up into several rows. If you need them in several columns, you need a different technique.

About two years ago, I wrote a blog post demonstrating such a technique. It uses the example of first name; middle initial; last name, but the techniques to be used remain the same. Here's the URL:

http://sqlblog.com/blogs/hugo_kornelis/archive/2006/10/12/Stuff.aspx

Edit: Fixed the URL

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 8

GF (10/14/2008)
Jeff, this is a great article.
I do have a question as I am trying to apply your solution to my needs.
I have hundreds of addresses in a single address column like this 1111 west street
When I employ you solution I get rows like this:
1111
west
street
Now I need to pivot the results into their own columns like this:
Number,name,suffix
1111,west,street
would be be so kind as to show me how?
Thanks

Thanks for the feedback, GF.

I can show you how to pivot the columns back pretty easily, but the problem with addresses is that they don't always have the same components. For example, you may have an address that looks like 123 East 39th Street. I'd recommend purchasing something like ZP4 or some other CASS certification program that will not only do the split correctly for you, but will also validate the address as being real or not.

For your "pivot", do you have a maximum number of columns an address would be split into?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 9

peter (9/30/2008)
As I should, I will do some thorough testing with a recognizable real world problem. It will take me a few days before I can do so and am already chewing on another idea :).
Stay tuned!

Peter, (or anyone else interested)...

Here's the code where the "Percent of Batch" is, quite literally, 100% wrong.

SET NOCOUNT ON

--=======================================================================================

-- Recursive method shown by (Name with-held)

--=======================================================================================

PRINT '========== Recursive method =========='

--===== Turn on some performance counters ===============================================

SET STATISTICS IO ON

SET STATISTICS TIME ON

DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @DateVal DATETIME

SET @DateVal = '2008-01-01'

;with mycte as

(

select @DateVal AS DateVal

union all

select DateVal + 1

from mycte

where DateVal + 1 < DATEADD(yy, 5, @DateVal)

)

select @Bitbucket = d.dateval

from mycte d

OPTION (MAXRECURSION 0)

--===== Turn off the performance counters and print a separator =========================

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT REPLICATE('=',90)

GO

--=======================================================================================

-- Tally table method by Jeff Moden

--=======================================================================================

PRINT '========== Tally table method =========='

--===== Turn on some performance counters ===============================================

SET STATISTICS IO ON

SET STATISTICS TIME ON

DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @StartDate AS DATETIME

SET @StartDate = '2008-01-01'

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))

@Bitbucket = @StartDate-1+t.N

FROM Tally t

ORDER BY N

--===== Turn off the performance counters and print a separator =========================

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT REPLICATE('=',90)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GF Hall of Fame Points: 3774 More actions · Answer 10

Jeff, thank you for taking the time to listen.

My current needs are to split an address into two parts.

example: 1234 south street needs to be split into 1234 and south street. I am already able to do this part.

the problem I am having is when there is only a number like 4567 and no street name.

example: 4567

So what I am try to get to is this:

column1 column2

1234 south street

4567

876 west

What I currently have is

column1

123 north ave

Thanks in advance.

Gary

Wayne West SSC-Insane Points: 22586 More actions · Answer 11

Hey, Gary. I just worked on cleaning up a big address table, talk about fun! Anyway, how about something like this:

select left(Address, charindex(' ', Address) - 1) as StreetNum,

right(rtrim(Address), len(rtrim(Address)) - charindex(' ', Address)) as Street

from [whatever]

where len(rtrim(Address)) > 1

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 12

GF (10/15/2008)
Jeff, thank you for taking the time to listen.
My current needs are to split an address into two parts.
example: 1234 south street needs to be split into 1234 and south street. I am already able to do this part.
the problem I am having is when there is only a number like 4567 and no street name.
example: 4567
So what I am try to get to is this:
column1 column2
1234 south street
4567
876 west
What I currently have is
column1
123 north ave
Thanks in advance.
Gary

Gary.... I didn't answer because I thought that Wayne West's code probably did it for you. But, I gotta check... are you all set?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Burrows SSC Guru Points: 65144 More actions · Answer 13

Wayne West (10/16/2008)
Hey, Gary. I just worked on cleaning up a big address table, talk about fun! Anyway, how about something like this:
select left(Address, charindex(' ', Address) - 1) as StreetNum,
right(rtrim(Address), len(rtrim(Address)) - charindex(' ', Address)) as Street
from [whatever]
where len(rtrim(Address)) > 1

Since LEN ignores trailing spaces the RTRIM is not strictly necessary but then who knows if MS will change this in the future 😉

...the problem I am having is when there is only a number like 4567 and no street name...

The above query will not cope with this situation

This is my solution based on Waynes query

SELECT LEFT(Address, CHARINDEX(' ', Address + ' ') - 1) AS [StreetNum],

SUBSTRING(Address, CHARINDEX(' ',Address+' ') + 1, 255) AS [Street]

FROM

Far away is close at hand in the images of elsewhere.
Anon.

Wayne West SSC-Insane Points: 22586 More actions · Answer 14

Huh, somehow I never noticed that in the documentation about Len() ignoring trailing spaces. Then again, chances are that I never opened BOL for Len() and just pulled it from previous programming experience. Our ERP system has trailing spaces on char fields, so I make it a habit of always trimming when dealing with any form of string manipulation.

Thanks for the catch, David. Nice solution. 🙂

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

David Burrows SSC Guru Points: 65144 More actions · Answer 15

Wayne West (10/29/2008)
Huh, somehow I never noticed that in the documentation about Len() ignoring trailing spaces. Then again, chances are that I never opened BOL for Len() and just pulled it from previous programming experience. Our ERP system has trailing spaces on char fields, so I make it a habit of always trimming when dealing with any form of string manipulation.
Thanks for the catch, David. Nice solution. 🙂

Thanks Wayne 😀

Don't know if or where it is documented, just came across it when writing sql in the past and it is mentioned on threads on this forum.

I too trim data a lot as well, especially on data import.

Far away is close at hand in the images of elsewhere.
Anon.