April 8, 2009 at 2:56 pm
I am new to SSIS so any ideas directing me in the correct direction is appreciated. I have a large flat file that I am normalizing into several files. I am creating a GUID to be used as the primary key value in say "Table A". I have other junction tables to fill which I use the GUID and the PK from another table. I'm using lookups to get these PK fields from the reference tables and unioning them with the GUID created so when I import this some 17 million records I won't have to use any joins to update the junction tables which would take entirely too long to run. I can just insert Table A first then import into the junction tables afterwards to avoid any FK contraint violations. Am I going about this in the best possible way? I know sooner or later I could end up creating a GUID that is already in the database and I won't be able to insert it. Is that just one of the breaks where I'll have to redirect records that don't import and rerun them to create other GUID IDs?
April 8, 2009 at 4:37 pm
well guids are supposed to be unique; although the algorythm got a little weeker when they disconnected it from the MAC address of a NIC card for the seed, it's still good.
I think you are worried about something that is not going to happen.
for fun, i created a table with 10 million rows, just to see if SQL Server would create a duplicate.
i ran it, checked for dupes, and then did the insert again, for 20 million rows in two passes.
you said you are creating your own GUIDS, I'm using SQL server for that.
no dupes. none. in 20 million rows. I could repeat till I'm out of disk space, but I don't foresee any dupes occurring.
here's my code :
CREATE TABLE #MyHead
( SomeId int identity(1,1) primary key,
SomeValue varchar(36)
)
--this makes tempdb grow like crazy: 4.7 gig,
--took 6:36 seconds on a single proc 1g ram 2.6GHtz machine
INSERT INTO #MyHead
SELECT TOP 10000000 --ten million
NEWID()
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--any dupes? ugly with no index but here goes
select SomeValue
from #MyHead
group by SomeValue
having count(*) > 1
--3:10 seconds no dupes, temp expanded to 5.25 gig
--repeating to get 10Mill more rows
INSERT INTO #MyHead
SELECT TOP 10000000 --ten million
NEWID()
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--took 4:03 seconds to add the second 10 mill
--tempdb is now at 10.1 gig
select SomeValue
from #MyHead
group by SomeValue
having count(*) > 1
3:41 seconds: no dupes.
==
Lowell
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply