September 29, 2010 at 6:02 pm
hi all, is there are more smart and quick way to generate data based on int column:
column1,column2,column3,column4,column5
234,ieifjrjfm,rwthtr,wh4hhtrh,6
235,ieifjrjfm,rwthtr,wh4hhtrh,2
236,ieifjrjfm,rwthtr,wh4hhtrh,3
237,ieifjrjfm,rwthtr,wh4hhtrh,1
238,ieifjrjfm,rwthtr,wh4hhtrh,8
Loop through this dataset and duplicate row - 234,ieifjrjfm,rwthtr,wh4hhtrh 6 times in another table | 235,ieifjrjfm,rwthtr,wh4hhtrh 2 times | 236,ieifjrjfm,rwthtr,wh4hhtrh 3 times etc. ??
I'm doing a double loop which works but very slow, 1 hour to generate ~5 million rows.
September 29, 2010 at 6:39 pm
This should handle 5 million rows in just a couple of minutes...
--===== Setup a test table (THIS IS NOT A PART OF THE SOLUTION)
SELECT TOP (1000)
column1 = IDENTITY(INT,1,1),
column2 = 'ieifjrjfm',
column3 = 'rwthtr',
column4 = 'wh4hhtrh',
column5 = ABS(CHECKSUM(NEWID()))%10+1
INTO #YourFirstTable
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
--===== Show the test data (THIS IS NOT A PART OF THE SOLUTION)
SELECT TOP 100 *
FROM #YourFirstTable
ORDER BY column1
;
--===== Now, spawn the new rows according to the quantity in column 5.
-- Just change this to an insert and you're golden.
SELECT column1,column2,column3,column4
FROM #YourFirstTable yft
JOIN dbo.Tally t
ON t.N <= yft.column5
;
Now... two things. First, take a look at the article at the first link in my signature line below. With 481 points of your own, it's time you learned how to create a post that people will take more immediate interest in. 😉
Second, if you don't already know what a Tally table is, it's time to learn about that wonderful tool. Please read the following article to learn how it can replace certain While loops like I just did in the code above.
http://www.sqlservercentral.com/articles/T-SQL/62867/
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2010 at 6:50 pm
Oh yeah... almost forgot... make sure that "N" in the Tally table is bigger than the largest value of column 5 or at least do a test to see because the code I wrote won't work correctly if "N" is smaller and it won't warn you. It's too busy working to warn you. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2010 at 8:55 pm
wow!, using this tally approach for the same amount & type of records ran in ~10mins.
I included the qauntity field in an existing non-clustered index which should speed up things.
Would it be quicker if this tally table was permanent and not in tempdb?
I already figured out the max restriction on qty field, thx anyway 😉
I would imagine if the source table has very simple & not too many fields , it would run even quicker, my table has xml, date, etc etc.
I have ready many books and attended courses and have never come across this tally table approach, you should be commended.
Thanks.
September 30, 2010 at 11:27 pm
msarikas (9/29/2010)
wow!, using this tally approach for the same amount & type of records ran in ~10mins.I included the qauntity field in an existing non-clustered index which should speed up things.
Would it be quicker if this tally table was permanent and not in tempdb?
I already figured out the max restriction on qty field, thx anyway 😉
I would imagine if the source table has very simple & not too many fields , it would run even quicker, my table has xml, date, etc etc.
I have ready many books and attended courses and have never come across this tally table approach, you should be commended.
Thanks.
Thanks for the kudo's but the Tally Table isn't my original idea. It was used way back in the 60's and it's frequently called a "Numbers" table now. Shoot... even Celko (the guy above) uses such a thing except (IIRC) he prefers to call his a "sequence" table which is also an appropriate name for it.
And, I have been commended for it... you just did and you made my day. Thank you very much. 🙂
I have to agree with Celko though. This is an extreme amount of denormalization. Why do you need to do this? I mean, why can't you just store one row with 1 quantity for each item?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2010 at 11:32 pm
msarikas (9/29/2010)
wow!, using this tally approach for the same amount & type of records ran in ~10mins.I included the qauntity field in an existing non-clustered index which should speed up things.
Would it be quicker if this tally table was permanent and not in tempdb?
I already figured out the max restriction on qty field, thx anyway 😉
I would imagine if the source table has very simple & not too many fields , it would run even quicker, my table has xml, date, etc etc.
I have ready many books and attended courses and have never come across this tally table approach, you should be commended.
Thanks.
Ah... sorry. Didn't answer the other questions. I usually create a permanent Tally table in a Util database and write a synonym in the other databases to it. Speed won't change noticibly whether the table is in TempDB, local DB, or a Util DB with synonyms.
You are correct... the fewer the number of columns, the faster things will be. That's usually true with any query, though.
Since you're doing a table scan, the addition of the quantity field to a clustered index likely won't do anything except slow down inserts.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2010 at 11:59 pm
im with u jeff that in db world you need to store everything at the lowest grain e.g. date, products etc. will make your dba life easier and the system less prone to major changes.
Originally the data was housed per quantity as per user specs, now we have to expand this qty field to extract each instance of this record. Its a case of mind changing plus users designing system = disaster! welcome to my world :hehe:
October 1, 2010 at 7:54 am
msarikas (9/30/2010)
im with u jeff that in db world you need to store everything at the lowest grain e.g. date, products etc. will make your dba life easier and the system less prone to major changes.Originally the data was housed per quantity as per user specs, now we have to expand this qty field to extract each instance of this record. Its a case of mind changing plus users designing system = disaster! welcome to my world :hehe:
Thanks for the feedback and I have an appreciation for that. Someone reads something somewhere on the internet or in a book and they suddenly become an evangelist for what may (or may not) be a bad method.
The expansion of rows like this isn't an uncommon request. If it's at all possible, I'd really like to know the business reason they have behind this. What utility does it provide?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2010 at 12:14 am
Seriously... I'd like to know just from an educational standpoint. The only reason anyone has ever told me why they wanted to do this was to supposedly make a FIFO inventory system a bit "easier" to manage. I'd like to know if there are other reasons.
Thanks in advance for any information you can share on the subject.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply