October 29, 2008 at 10:29 am
David, I'm working on a blog post about importing data and have this little code fragment that you might find useful (I love playing with system tables):
select name + ' = ltrim(rtrim(' + name + ')),'
from syscolumns
where id = (select id
from sysobjects
where name = '[Whatever]')
and xtype in (35, 98, 99, 167, 175, 231, 239)
I always import files straight into varchars and do data conversion through a sproc. Add an Update [Whatever] at the start of the block, delete the final comma, and you've got a quickie cleanup of spaces for text fields.
Yeah, BOL says:
Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
November 6, 2008 at 11:37 am
Ok, I'm trying to use the tally as a loop.
The problem. In one of our databases we use a table to store among other things a reference (DocumentId) to a scanned document, the number of pages and a created date. The documents are stored in the location S:\Document\YYYY\MM\DD\DocumentId-PageNumber (no extension)
So a document created yesterday may have a document Id of 39757385977, number of pages = 3 and '5 Nov 2008' as a created date
This maps to 3 files held on the system as
S:\Document\2008\11\05\39757385977-1
S:\Document\2008\11\05\39757385977-2
S:\Document\2008\11\05\39757385977-3
We need to export the paths of these files as we are moving to a new system.
I have cracked it with two nested While loops, but as there are over 500,000 records, and some documents have up to 140 pages (Itemised phone bills), I guess it wont run well other than on my small test set. But then again it's a one time exercise, so don't sweat over this, but I'd like to use my Tally 🙂
Here's the test data setup
-- Set up test data
IF OBJECT_ID ( 'tempdb..#t1') IS NOT NULL
DROP TABLE #t1
CREATE TABLE #t1 (documentId VARCHAR(30), pages INT, DateCreated datetime)
INSERT INTO #t1 (documentId, pages, DateCreated)
select 39757385955, 1 ,'05/11/2008' UNION
select 39757385956, 2 ,'05/11/2008' UNION
select 39757385957, 2 ,'05/11/2008' UNION
select 39757385959, 1 ,'05/11/2008' UNION
select 39757385960, 1 ,'05/11/2008' UNION
select 39757385961, 1 ,'05/11/2008' UNION
select 39757385962, 3 ,'05/11/2008' UNION
select 39757385963, 1 ,'05/11/2008' UNION
select 39757385964, 1 ,'05/11/2008' UNION
select 39757385965, 3 ,'05/11/2008' UNION
select 39757385966, 3 ,'05/11/2008' UNION
select 39757385967, 1 ,'05/11/2008' UNION
select 39757385968, 9 ,'05/11/2008' UNION
select 39757385969, 5 ,'05/11/2008' UNION
select 39757385971, 5 ,'05/11/2008' UNION
select 39757385972, 1 ,'05/11/2008' UNION
select 39757385973, 3 ,'05/11/2008' UNION
select 39757385974, 2 ,'05/11/2008' UNION
select 39757385975, 1 ,'05/11/2008' UNION
select 39757385976, 1 ,'05/11/2008' UNION
select 39757385977, 6 ,'05/11/2008' UNION
select 39757385978, 1 ,'05/11/2008' UNION
select 39757385980, 10 ,'05/11/2008'
Here is my solution, using nested loops
-- My solution
IF OBJECT_ID ( 'tempdb..#t2') IS NOT NULL
DROP TABLE #t2
IF OBJECT_ID ( 'tempdb..#t3') IS NOT NULL
DROP TABLE #t3
CREATE TABLE #t2 (documentId VARCHAR(30), pages INT, docPath VARCHAR(1000))
CREATE TABLE #t3 (documentId VARCHAR(30), pages INT, docPath VARCHAR(1000))
INSERT INTO #t2
SELECT DocumentId, pages, 'S:\Document\'
+ CAST(YEAR(DateCreated) AS VARCHAR(10))
+ '\' + RIGHT('00' + CAST(MONTH(DateCreated) AS VARCHAR(30)),2)
+ '\' + RIGHT('00' + CAST(Day(DateCreated) AS VARCHAR(6)),2)
+ '\' + CAST(DocumentId AS VARCHAR(30)) AS docPath
FROM #t1
WHERE DateCreated > '3 nov 2008'
DECLARE @maxPage INT, @currentpage int
select @maxPage = MAX(pages) FROM #t2 t
WHILE @maxPage <> 0
BEGIN
SELECT @currentpage = @maxPage
WHILE @currentPage <> 0
BEGIN
INSERT INTO #t3 (documentId,
pages,
docPath )
SELECT documentId, pages, docPath + '-' + CAST(@currentPage AS VARCHAR(4)) docPath
FROM #t2
WHERE pages = @maxPage
SELECT @currentPage = @currentPage - 1
END
SELECT @maxPage = @maxPage - 1
END
SELECT * FROM #t3
ORDER BY docpath
Any thoughts? 😉
Dave J
November 6, 2008 at 2:54 pm
David Jackson (11/6/2008)
Any thoughts? 😉
How about this:
INSERT INTO #t3
(documentId, pages, docPath)
SELECT d.DocumentId, d.pages,
'S:\Document\' + REPLACE(CONVERT(char(10), DateCreated, 111),'/','\')
+ '\' + CAST(DocumentId AS varchar(30)) + '-'
+ CAST(n.Number AS varchar(4))
FROM #t1 AS d
INNER JOIN dbo.Numbers AS n
ON n.Number BETWEEN 1 AND d.pages;
(edit: Corrected an error in the code - I had forgotten to REPLACE the forward slashes from the date formatting with backward slashing)
November 7, 2008 at 3:39 am
Excellent. 😀
I particularly like the date formatting trick. Why didn't I think of that! :w00t:
Many thanks Hugo. I'll run the extract later today and post some stats on the two methods. (I'm waiting to be told exactly what other columns they want extracting.) Guess which one my money's on?
Cheers
Dave J
November 7, 2008 at 7:33 am
🙂 You'll like this
Hugo's Tally Method first, my hopeless loop second.
[font="Courier New"]
CPU Reads Writes Duration
--------------------------------------------------------------
014594 880797 0000 15203 (15 Seconds)
832344 31705470 2223 856936 (14 minutes 15 Seconds)[/font]
We'll call that a result then 😉
Dave J
(edited to try and get the results to line up)
November 7, 2008 at 7:39 am
David Jackson (11/7/2008)
🙂 You'll like this
I do. Thanks for sharing your results, David! 🙂
November 7, 2008 at 8:21 am
David Jackson (11/7/2008)
🙂 You'll like thisHugo's Tally Method first, my hopeless loop second.
[font="Courier New"]
CPU Reads Writes Duration
--------------------------------------------------------------
014594 880797 0000 15203 (15 Seconds)
832344 31705470 2223 856936 (14 minutes 15 Seconds)[/font]
We'll call that a result then 😉
Wow! Those results are amazing! Great solution, and the DateFormat replace was perfect!
November 7, 2008 at 6:12 pm
David Jackson (11/7/2008)
Excellent. 😀I particularly like the date formatting trick. Why didn't I think of that! :w00t:
Many thanks Hugo. I'll run the extract later today and post some stats on the two methods. (I'm waiting to be told exactly what other columns they want extracting.) Guess which one my money's on?
Cheers
Dave J
My recommendation would be to NOT store files in a dated path... date name the files using the ISO yyyymmdd format with a suffix, instead. That way, you can get to all the files easily instead of having to go through a bazillion directories and they can still list in date sorted order.
The rest is spot on.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2008 at 8:41 am
I'm seeing the light on Tally tables. Especially the solution Hugo provided for David Jackson. This a great article and has given me reason to re-think and re-visit a bunch of code.
Thanks!:Wow:
-Mike
November 14, 2008 at 10:57 am
MikeM7 (11/14/2008)
I'm seeing the light on Tally tables. Especially the solution Hugo provided for David Jackson. This a great article and has given me reason to re-think and re-visit a bunch of code.Thanks!:Wow:
-Mike
You bet... thanks for the good feedback, Mike.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 8:38 am
Thanks Jeff for an interesting, understandable and useful article. I just helped my colleague with a view were we made use of a Tally table and Row_number Over. It was a huge improvement over what we had originally planned in simplicity as well as time. And as a bonus, it's much easier to copy this process to future ones of a similar nature.
Julie
January 16, 2009 at 10:12 am
Outstanding. Thanks for taking the time to post your great feedback, Julie... it would be very interesting to see your code along with brief explanation if that's possible and you have the time. A lot of folks still don't understand the performance you can get out of some very simple Tally table code and the more they see, the more they'll understand. Thanks either way.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 11:16 am
A column is populated from a webform that lists the box(es) that were checked in response to a question. The items are in one column, separated by commas (for example: 'abc,dcba,xy' or 'abc'). We needed to create a single row for each item with a sequential number for each item per id. In this case, attribs_hs_activities is the column with the list. Using a tally table (Wrt_tally) with numbers N from 1 to 10000, we were able to create a row for each item in the list. We used row_number over to create a sequential list for each id. Note: The selection below was used to create our view. Then we used the view to insert the records into the desired table.
--=== If the test table already exist, drop it
If object_id('tempdb..#CAN_U','U') is not null
drop table #CAN_U
--=== create the test table with
create table #CAN_U
(
id_num int not null primary key clustered,
ATTRIBS_HS_ACTIVITIES varchar(240) null
)
INSERT INTO #CAN_U
(ID_NUM, ATTRIBS_HS_ACTIVITIES)
select '15431','ARGHS,ASNHS,ASGHS' union all
select '92093','ABDHS,ABBHS'
SELECT id_num,
SUBSTRING(','+attribs_hs_activities+',',N+1,CHARINDEX(',',','+attribs_hs_activities+',',N+1)-N-1) AS attrib_cde,
ROW_NUMBER() OVER(partition by id_num order by id_num ) as ATTRIB_SEQ
FROM #can_u
CROSS join wrt_Tally
where N < LEN(','+attribs_hs_activities+',')
AND SUBSTRING(','+attribs_hs_activities+',',N,1) = ','
and attribs_hs_activities IS NOT NULL
-- RESULTS
id_numattrib_cdeATTRIB_SEQ
15431ARGHS1
15431ASNHS2
15431ASGHS3
92093ABDHS1
92093ABBHS2
January 16, 2009 at 12:27 pm
Very cool and great explanation of the problem you were trying to solve. Thanks a huge amount for taking the time to post it, Julie! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 12:33 pm
By the way, Julie... what does the prefix "Wrt" stand for?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 196 through 210 (of 511 total)
You must be logged in to reply to this topic. Login to reply