The "Numbers" or "Tally" Table: What it is and how it replaces a loop

Question

The "Numbers" or "Tally" Table: What it is and how it replaces a loop

Viewing 15 posts - 196 through 210 (of 511 total)

You must be logged in to reply to this topic. Login to reply

Wayne West SSC-Insane Points: 22586 More actions · Answer 1

David, I'm working on a blog post about importing data and have this little code fragment that you might find useful (I love playing with system tables):

select name + ' = ltrim(rtrim(' + name + ')),'

from syscolumns

where id = (select id

from sysobjects

where name = '[Whatever]')

and xtype in (35, 98, 99, 167, 175, 231, 239)

I always import files straight into varchars and do data conversion through a sproc. Add an Update [Whatever] at the start of the block, delete the final comma, and you've got a quickie cleanup of spaces for text fields.

Yeah, BOL says:

Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

David Jackson SSCertifiable Points: 6414 More actions · Answer 2

Ok, I'm trying to use the tally as a loop.

The problem. In one of our databases we use a table to store among other things a reference (DocumentId) to a scanned document, the number of pages and a created date. The documents are stored in the location S:\Document\YYYY\MM\DD\DocumentId-PageNumber (no extension)

So a document created yesterday may have a document Id of 39757385977, number of pages = 3 and '5 Nov 2008' as a created date

This maps to 3 files held on the system as

S:\Document\2008\11\05\39757385977-1

S:\Document\2008\11\05\39757385977-2

S:\Document\2008\11\05\39757385977-3

We need to export the paths of these files as we are moving to a new system.

I have cracked it with two nested While loops, but as there are over 500,000 records, and some documents have up to 140 pages (Itemised phone bills), I guess it wont run well other than on my small test set. But then again it's a one time exercise, so don't sweat over this, but I'd like to use my Tally 🙂

Here's the test data setup

-- Set up test data

IF OBJECT_ID ( 'tempdb..#t1') IS NOT NULL

DROP TABLE #t1

CREATE TABLE #t1 (documentId VARCHAR(30), pages INT, DateCreated datetime)

INSERT INTO #t1 (documentId, pages, DateCreated)

select 39757385955, 1 ,'05/11/2008' UNION

select 39757385956, 2 ,'05/11/2008' UNION

select 39757385957, 2 ,'05/11/2008' UNION

select 39757385959, 1 ,'05/11/2008' UNION

select 39757385960, 1 ,'05/11/2008' UNION

select 39757385961, 1 ,'05/11/2008' UNION

select 39757385962, 3 ,'05/11/2008' UNION

select 39757385963, 1 ,'05/11/2008' UNION

select 39757385964, 1 ,'05/11/2008' UNION

select 39757385965, 3 ,'05/11/2008' UNION

select 39757385966, 3 ,'05/11/2008' UNION

select 39757385967, 1 ,'05/11/2008' UNION

select 39757385968, 9 ,'05/11/2008' UNION

select 39757385969, 5 ,'05/11/2008' UNION

select 39757385971, 5 ,'05/11/2008' UNION

select 39757385972, 1 ,'05/11/2008' UNION

select 39757385973, 3 ,'05/11/2008' UNION

select 39757385974, 2 ,'05/11/2008' UNION

select 39757385975, 1 ,'05/11/2008' UNION

select 39757385976, 1 ,'05/11/2008' UNION

select 39757385977, 6 ,'05/11/2008' UNION

select 39757385978, 1 ,'05/11/2008' UNION

select 39757385980, 10 ,'05/11/2008'

Here is my solution, using nested loops

-- My solution

IF OBJECT_ID ( 'tempdb..#t2') IS NOT NULL

DROP TABLE #t2

IF OBJECT_ID ( 'tempdb..#t3') IS NOT NULL

DROP TABLE #t3

CREATE TABLE #t2 (documentId VARCHAR(30), pages INT, docPath VARCHAR(1000))

CREATE TABLE #t3 (documentId VARCHAR(30), pages INT, docPath VARCHAR(1000))

INSERT INTO #t2

SELECT DocumentId, pages, 'S:\Document\'

+ CAST(YEAR(DateCreated) AS VARCHAR(10))

+ '\' + RIGHT('00' + CAST(MONTH(DateCreated) AS VARCHAR(30)),2)

+ '\' + RIGHT('00' + CAST(Day(DateCreated) AS VARCHAR(6)),2)

+ '\' + CAST(DocumentId AS VARCHAR(30)) AS docPath

FROM #t1

WHERE DateCreated > '3 nov 2008'

DECLARE @maxPage INT, @currentpage int

select @maxPage = MAX(pages) FROM #t2 t

WHILE @maxPage <> 0

BEGIN

SELECT @currentpage = @maxPage

WHILE @currentPage <> 0

BEGIN

INSERT INTO #t3 (documentId,

pages,

docPath )

SELECT documentId, pages, docPath + '-' + CAST(@currentPage AS VARCHAR(4)) docPath

FROM #t2

WHERE pages = @maxPage

SELECT @currentPage = @currentPage - 1

END

SELECT @maxPage = @maxPage - 1

END

SELECT * FROM #t3

ORDER BY docpath

Any thoughts? 😉

Dave J

http://glossopian.co.uk/
"I don't know what I don't know."

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 3

David Jackson (11/6/2008)
Any thoughts? 😉

How about this:

INSERT INTO #t3

(documentId, pages, docPath)

SELECT d.DocumentId, d.pages,

'S:\Document\' + REPLACE(CONVERT(char(10), DateCreated, 111),'/','\')

+ '\' + CAST(DocumentId AS varchar(30)) + '-'

+ CAST(n.Number AS varchar(4))

FROM #t1 AS d

INNER JOIN dbo.Numbers AS n

ON n.Number BETWEEN 1 AND d.pages;

(edit: Corrected an error in the code - I had forgotten to REPLACE the forward slashes from the date formatting with backward slashing)

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

David Jackson SSCertifiable Points: 6414 More actions · Answer 4

Excellent. 😀

I particularly like the date formatting trick. Why didn't I think of that! :w00t:

Many thanks Hugo. I'll run the extract later today and post some stats on the two methods. (I'm waiting to be told exactly what other columns they want extracting.) Guess which one my money's on?

Cheers

Dave J

http://glossopian.co.uk/
"I don't know what I don't know."

David Jackson SSCertifiable Points: 6414 More actions · Answer 5

🙂 You'll like this

Hugo's Tally Method first, my hopeless loop second.

[font="Courier New"]

CPU Reads Writes Duration

--------------------------------------------------------------

014594 880797 0000 15203 (15 Seconds)

832344 31705470 2223 856936 (14 minutes 15 Seconds)[/font]

We'll call that a result then 😉

Dave J

(edited to try and get the results to line up)

http://glossopian.co.uk/
"I don't know what I don't know."

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 6

David Jackson (11/7/2008)
🙂 You'll like this

I do. Thanks for sharing your results, David! 🙂

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 7

David Jackson (11/7/2008)
🙂 You'll like this
Hugo's Tally Method first, my hopeless loop second.
[font="Courier New"]
CPU Reads Writes Duration
--------------------------------------------------------------
014594 880797 0000 15203 (15 Seconds)
832344 31705470 2223 856936 (14 minutes 15 Seconds)[/font]
We'll call that a result then 😉

Wow! Those results are amazing! Great solution, and the DateFormat replace was perfect!

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 8

David Jackson (11/7/2008)
Excellent. 😀
I particularly like the date formatting trick. Why didn't I think of that! :w00t:
Many thanks Hugo. I'll run the extract later today and post some stats on the two methods. (I'm waiting to be told exactly what other columns they want extracting.) Guess which one my money's on?
Cheers
Dave J

My recommendation would be to NOT store files in a dated path... date name the files using the ISO yyyymmdd format with a suffix, instead. That way, you can get to all the files easily instead of having to go through a bazillion directories and they can still list in date sorted order.

The rest is spot on.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MikeM7 SSC Enthusiast Points: 142 More actions · Answer 9

I'm seeing the light on Tally tables. Especially the solution Hugo provided for David Jackson. This a great article and has given me reason to re-think and re-visit a bunch of code.

Thanks!:Wow:

-Mike

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 10

MikeM7 (11/14/2008)
I'm seeing the light on Tally tables. Especially the solution Hugo provided for David Jackson. This a great article and has given me reason to re-think and re-visit a bunch of code.
Thanks!:Wow:
-Mike

You bet... thanks for the good feedback, Mike.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Julie Breutzmann Hall of Fame Points: 3637 More actions · Answer 11

Thanks Jeff for an interesting, understandable and useful article. I just helped my colleague with a view were we made use of a Tally table and Row_number Over. It was a huge improvement over what we had originally planned in simplicity as well as time. And as a bonus, it's much easier to copy this process to future ones of a similar nature.

Julie

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 12

Outstanding. Thanks for taking the time to post your great feedback, Julie... it would be very interesting to see your code along with brief explanation if that's possible and you have the time. A lot of folks still don't understand the performance you can get out of some very simple Tally table code and the more they see, the more they'll understand. Thanks either way.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Julie Breutzmann Hall of Fame Points: 3637 More actions · Answer 13

A column is populated from a webform that lists the box(es) that were checked in response to a question. The items are in one column, separated by commas (for example: 'abc,dcba,xy' or 'abc'). We needed to create a single row for each item with a sequential number for each item per id. In this case, attribs_hs_activities is the column with the list. Using a tally table (Wrt_tally) with numbers N from 1 to 10000, we were able to create a row for each item in the list. We used row_number over to create a sequential list for each id. Note: The selection below was used to create our view. Then we used the view to insert the records into the desired table.

--=== If the test table already exist, drop it

If object_id('tempdb..#CAN_U','U') is not null

drop table #CAN_U

--=== create the test table with

create table #CAN_U

(

id_num int not null primary key clustered,

ATTRIBS_HS_ACTIVITIES varchar(240) null

)

INSERT INTO #CAN_U

(ID_NUM, ATTRIBS_HS_ACTIVITIES)

select '15431','ARGHS,ASNHS,ASGHS' union all

select '92093','ABDHS,ABBHS'

SELECT id_num,

SUBSTRING(','+attribs_hs_activities+',',N+1,CHARINDEX(',',','+attribs_hs_activities+',',N+1)-N-1) AS attrib_cde,

ROW_NUMBER() OVER(partition by id_num order by id_num ) as ATTRIB_SEQ

FROM #can_u

CROSS join wrt_Tally

where N < LEN(','+attribs_hs_activities+',')

AND SUBSTRING(','+attribs_hs_activities+',',N,1) = ','

and attribs_hs_activities IS NOT NULL

-- RESULTS

id_numattrib_cdeATTRIB_SEQ

15431ARGHS1

15431ASNHS2

15431ASGHS3

92093ABDHS1

92093ABBHS2

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 14

Very cool and great explanation of the problem you were trying to solve. Thanks a huge amount for taking the time to post it, Julie! 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 15

By the way, Julie... what does the prefix "Wrt" stand for?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)