June 30, 2014 at 11:33 am
We are trying to extract flat files into a staging database. The extracted data from the flat files should be "raw" in the Kimball sense. That means there shouldn't be any changes to the data as they are being extracted. The staging tables should have the exact format as the source flat files and there should be a table for each type of flat file.
The boss wants an additional table in the staging database: a data dictionary. There should be data flow task that populates the data dictionary table the same time as the staging tables are populated. I have copied and pasted the script that creates the data dictionary table below.
CREATE TABLE [dbo].[DataDictionary](
[ExtractDate] [datetime] NULL,
[TblName] [varchar](65) NOT NULL,
[ColName] [varchar](50) NOT NULL,
[MacroName] [varchar](30) NULL,
[LongName] [varchar](65) NULL,
[Description] [varchar](500) NULL,
[PKPos] [smallint] NOT NULL,
[Datatype] [varchar](50) NULL,
[Length] [int] NULL,
[Prec] [smallint] NULL,
[Scale] [smallint] NULL,
[IsNullable] [bit] NOT NULL,
[IsIdentity] [bit] NOT NULL,
[DefaultValue] [varchar](100) NULL,
[Active] [bit] NOT NULL,
[CreatedByObj] [varchar](50) NULL,
[IgnoreOnCompare] [char](1) NOT NULL,
[ChangeHashBytes] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Here are my questions: 1) do we need this data dictionary table in our staging database that is basically the flat files in tables? 2) Why or why not?
June 30, 2014 at 11:13 pm
imani_technology (6/30/2014)
We are trying to extract flat files into a staging database. The extracted data from the flat files should be "raw" in the Kimball sense. That means there shouldn't be any changes to the data as they are being extracted. The staging tables should have the exact format as the source flat files and there should be a table for each type of flat file.The boss wants an additional table in the staging database: a data dictionary. There should be data flow task that populates the data dictionary table the same time as the staging tables are populated. I have copied and pasted the script that creates the data dictionary table below.
CREATE TABLE [dbo].[DataDictionary](
[ExtractDate] [datetime] NULL,
[TblName] [varchar](65) NOT NULL,
[ColName] [varchar](50) NOT NULL,
[MacroName] [varchar](30) NULL,
[LongName] [varchar](65) NULL,
[Description] [varchar](500) NULL,
[PKPos] [smallint] NOT NULL,
[Datatype] [varchar](50) NULL,
[Length] [int] NULL,
[Prec] [smallint] NULL,
[Scale] [smallint] NULL,
[IsNullable] [bit] NOT NULL,
[IsIdentity] [bit] NOT NULL,
[DefaultValue] [varchar](100) NULL,
[Active] [bit] NOT NULL,
[CreatedByObj] [varchar](50) NULL,
[IgnoreOnCompare] [char](1) NOT NULL,
[ChangeHashBytes] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Here are my questions: 1) do we need this data dictionary table in our staging database that is basically the flat files in tables? 2) Why or why not?
It looks like a log table to me and I think it's a way to keep track of what files where loaded and how they were handled. Something like this is super important during audits and investigations.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 30, 2014 at 11:40 pm
+1 Jeff,
By the way, did your boss shared any information when he asked for this table. He might have shared something didn't he?
July 1, 2014 at 2:30 am
Yes there are uses for these files. It logs what has been delivered and lets others see what fields are available.
It also gives you a way of checking if the file exists before you try to import it.
July 1, 2014 at 10:53 pm
P Jones (7/1/2014)
Yes there are uses for these files. It logs what has been delivered and lets others see what fields are available.It also gives you a way of checking if the file exists before you try to import it.
Just to be clear, it's not a file. It's a table. And, sorry but I have to ask, how does the simple existence of a table help you with determining if the file exists before you try to import it?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2014 at 2:34 am
Jeff Moden (7/1/2014)
Just to be clear, it's not a file. It's a table. And, sorry but I have to ask, how does the simple existence of a table help you with determining if the file exists before you try to import it?
Presumably one could check the table to see whether a file of the same name had already been imported, but I agree, it won't do anything to determine the actual existence of the file. I'm not so sure that the purpose of the table is to track imported files though; it looks more like a table that holds data about other tables to me - a metadata table if you will.
Regards
Lempster
July 2, 2014 at 2:39 am
Sorry, it is a metadata table, not a logging table. It contains information about the structure of the source system. The question is whether a metadata table of the source system belong in a staging database. My boss (who has never done ETL) seems to think so. I'm not so sure.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply