Importing to normalized structure

Question

Importing to normalized structure

Zen_

SSC Journeyman

Points: 88
More actions
November 8, 2004 at 12:12 pm

#162515

I have a dts that bulk inserts flatfile records into a table (tblStage1) then uses a data pump with transformations to split up the data into columns (tblStage2). The problem with this is that the data is still flat. I took the time to create a normalized structure because the data is extremely redundant. Now that I have the structure, I don't know how to import the data into it. I created a view that looks like the orignal data and someone suggested using an "instead of" trigger to insert the data through the view but I just get more confused when I look up documentation on that. I know this is something that is done every day but I am at a loss. Please Help!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Farrell Keough SSCoach Points: 17414 More actions · Answer 1

I would guess they meant to say a Trigger instead of a View. Once one field in your tblStage2 is populated, (obviously you would want to pick a field that is meaniningful), the Trigger can then populate your normalized table(s) with the fields from that record that you want.

You probably want to wait until all of your necessary fields are populated before starting the Trigger; in other words, put the Trigger on the last field populated in tblStage2.

I wasn't born stupid - I had to study.

Zen_ SSC Journeyman Points: 88 More actions · Answer 2

Actually, there is such a thing as an "Instead Of" Trigger. It supposedly can be used to insert data through a view (SQL2k and up). I think I can figure out this trigger thing if I read a bit. My main question is, If I insert just one column, trigger, then insert the rest of the data into their respective tables, how do I make sure the data of each record is still intact as one record? I cheated on setting up my normalized table structure by creating the single table in access and running the "Analyze Table" Wizard to break it up and create the relationships. I then used the upsize wizard to get it into SQL. Here's how my tables are set up:

tblStage1

--1 record to one column

tblStage2

--10 columns

tblMAIN

--1 data column

--1 ID column

--9 ID column linked to each of the other tables

tbl1, 2, 3, etc...

--1 data column

--1 ID column

dongadoy SSC Eights! Points: 880 More actions · Answer 3

I haven't heard of anything to automatically do that type of transformation. Forget the trigger; it just makes the work more complicated. Add some additional steps to copy data in tblStage2 to the normalized tables. Unfortunately, you have to write those queries to copy the data. Those queries have to handle the IDs that link the table.

dongadoy SSC Eights! Points: 880 More actions · Answer 4

dongadoy

SSC Eights!

Points: 880

November 9, 2004 at 4:20 pm

#529537

Your "normalized" table structure looks whacky.

Zen_ SSC Journeyman Points: 88 More actions · Answer 5

I would just like to point out that I am posting in the newbie section because I am somewhat retarded when it comes to this stuff. I don't see anything wrong with my table structure but I would welcome any reasoning as to why it is whacky and how it should be done correctly. Also, if you guys could reply with some sort of examples to send me in the right direction I would appreciate it. I need a little hand holding here, sorry.

Zen_ SSC Journeyman Points: 88 More actions · Answer 6

OK, let me start over for clarification:

I am attempting to create a database to hold proxy log data for dynamic reporting.

I have imported the flat files into a single field table using BULK INSERT.

tblStage1

[rawdata] varchar (8000) NULL

I then parse the data from tblStage1 to tblStage2

tblStage2

127.0.0.1 varchar (15) NULL,

[Username] varchar (20) NULL,

[Division] varchar (20) NULL,

[Date] datetime NULL,

[Time] datetime NULL,

varchar (1000) NULL,

[Code] int NULL,

[bytes] int NULL

I'm not exactly sure what I'm doing at this point but I know I would like to remove the duplication that exists in all fields. This is how I was thinking I should split up the table structure:

tblURLS

[URLID] int NOT NULL IDENTITY PRIMARY KEY,

varchar (1000) NOT NULL,

[IPID] int NULL,

[UID] int NULL,

[DIVID] int NULL,

[DID] int NULL,

[TID] int NULL,

[CID] int NULL,

[BID] int NULL

tblIP

[IPID] int NOT NULL IDENTITY PRIMARY KEY,

127.0.0.1 varchar (15) NOT NULL

tblUSER

[UID] int NOT NULL IDENTITY PRIMARY KEY,

[Username] varchar (20) NOT NULL

tblDIV

[DIVID] int NOT NULL IDENTITY PRIMARY KEY,

[Division] varchar (20) NOT NULL

tblDATE

[DID] int NOT NULL IDENTITY PRIMARY KEY,

[Date] datetime NOT NULL

tblTIME

[TID] int NOT NULL IDENTITY PRIMARY KEY,

[Time] datetime NOT NULL

tblCODE

[CID] int NOT NULL IDENTITY PRIMARY KEY,

[Code] int NOT NULL

tblBYTES

[BID] int NOT NULL IDENTITY PRIMARY KEY,

[bytes] int NOT NULL

My major problem here is that I don't know how to get the data from tblStage2 to the normalized structure without screwing up my data. I'm a beginner at SQL so please provide examples in your post. Any and all suggestions welcome.

dongadoy SSC Eights! Points: 880 More actions · Answer 7

The actual data and how it is used really determines how to structure it. Unfortunately, I'm familiar with neither.

Just looking at the tblStage2, I see no reason to split it into multiple tables. The only column of spacial significance is URL with it's 1000 characters. But that apparently is unique based on tblURLS and would not benefit being in a separate table. The other columns look like attributes of the URL. And they're small enough byte-wise to appear like codes that don't need to be represented by yet another code (ID). On the surface, I don't see any relation between the columns that warrant pulling them out into a separate table.

Perhaps all you need are some good indexes to improve your reporting.

Zen_ SSC Journeyman Points: 88 More actions · Answer 8

I see what your saying but I don't really know how indexes work. I figured I'd split up the data to make queries faster. Like if I wanted all divisions I could just SELECT Division FROM tblDIV instead of SELECT DISTINCT Division FROM tblStage2 (this would have to look through all records and I'm looking at several Million)

dongadoy SSC Eights! Points: 880 More actions · Answer 9

I see your point and it's a good one. Do you need that functionality for date, time, and bytes?

If you are just reporting, I'd say just use the "select distinct". But maybe you're putting a UI in front of this reporting and letting users select division from a drop-down list. Generally, we have the UI (webserver) just cache the "select distinct" results.

Is the data always loaded in bulk? Or is the data being continually changed with row-by-row inserts/updates/deletes? Or will the data be bulk loaded initially, with further changes done row-by-row?

Zen_ SSC Journeyman Points: 88 More actions · Answer 10

No, I'm guessing date, time and bytes wouldn't need that... I was splitting them out just to reduce redundancy, but I guess that's kindof pointless seeing that I would just be replacing them with redundant key values. I would like that functionality with all other fields though.

The reports are dynamic and will have a UI, which UI I have yet to determine (I'll deal with that later)

Data is bulk inserted on a weekly basis and records are never changed from their original data.

dongadoy SSC Eights! Points: 880 More actions · Answer 11

Keep tblStage2. Add an identity column and make that the primary key. Add indexes on the columns you will need for querying - probably ip, username, division, code, date/time.

If a list of divisions is needed for performance, use "select distinct" to fill a table of divisions. Do that after the weekly bulk insert and after truncating the table. Make division the primary key in the table. Do the same for similar data (ip, username, code).

I understand the desire to reduce redundancy. But keep in mind complexity. This is a reporting database, so make it work best for reporting. Having all that "normalization" would make the weekly bulk insert difficult, as well as hurting most queries due to joining many tables.

Zen_ SSC Journeyman Points: 88 More actions · Answer 12

Thanks for the suggestions. I think I jumped in to the normalization just for the sake of it.