May 5, 2011 at 9:12 pm
Wondering if someone could shed some light. I regularly imoprt data fom a flat file to a table. Pretty simple improt of a tab delimited file into 3 columns. One thing i noticed is in one column I get the same string but the other 2 columns are different.
I have sen some DB's where a vednor has created a "refernece table". They have 2 columns first column is a ref number and the second is a value. when importing data into a table it looks in the reference column to see if a value already exists (same value of course) and then inserts the ref number that coresponds to the value. If the value doesnt exist it creates it in the "reference table" and then inserts the new reference number into the original table.
Im sure this makes sense to those more seasoned SQL people. I was thinking I could use this in my DB to reduce the amout of data in my table. Can someone explin how this works or show me?
Thanks.
May 5, 2011 at 11:55 pm
Hi Lance,
It sounds as you want to normalize the data while importing.
Here is an example on how this can be done:
-- Reference table
DECLARE @cities TABLE (
idintIDENTITY(1,1),
namevarchar(50)
)
INSERT INTO @cities (name) VALUES ('Tokyo')
INSERT INTO @cities (name) VALUES ('Stockholm')
INSERT INTO @cities (name) VALUES ('Jerusalem')
-- Temp table, containing all data from the file
DECLARE @temp TABLE (
cityvarchar(50),
namevarchar(50),
numberint
)
INSERT INTO @temp VALUES ('Stockholm', 'Sven', 142)
INSERT INTO @temp VALUES ('New York', 'John', 87)
INSERT INTO @temp VALUES ('Stockholm', 'Arne', 93)
INSERT INTO @temp VALUES ('Berlin', 'Hans', 134)
INSERT INTO @temp VALUES ('New York', 'Jack', 72)
-- Data table (to import to)
DECLARE @persons TABLE (
idintIDENTITY(1,1),
namevarchar(50),
numberint,
cityIDint
)
-- Add new items to reference table
INSERT INTO @cities (name)
SELECT DISTINCT city
FROM @temp
WHERE city not in (SELECT name FROM @cities)
-- insert the data in the target table, with ref number to the reference table
INSERT INTO @persons (name, number, cityID)
SELECT T.name, T.number, C.id
FROM @temp T inner join
@cities C on T.city = C.name
SELECT *
FROM @persons
Good luck!
/Markus
May 10, 2011 at 2:20 am
Hi Hunterwood
I see that Lance is importing data from a flat file - something we regularly do. Can the references be set up during the import as in a package or is it done after getting the data into the database?
Thanks in Advance.
May 10, 2011 at 1:32 pm
>>Can the references be set up during the import as in a package or is it done after getting the data into the database?
Either way is possible. You can have an SSIS package do it all for you or you can import the data into a staging table and then do the processing in TSQL before the final import.
I prefer the latter and I do tend to discourage elaborate SSIS packages. This is based on my experience and the experience of the folks I work with, we have more TSQL experience than SSIS experience. I also find troubleshooting a stored procedure easier than a package SSIS.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply