Issue on Flat Files to tables with fixed positioning: Thanks

  • Hi,

    I do have couple of questions on the task which I trying to break my head and it would be greatly appreciated for any answer:

    I do have a flat file(csv) which comes has a input and I have to extract the data from the csv file and load into the destination table with a specific format based on position. For example if I have order_id,Total_sales,Date_Ordered with some data in it. I have to extract the data and load it in a table which has a format in which for the first field of fixed length of 2 with numeric as a datatype and total_sales into the column of total_sales in the table with numeric of length 10 and then date as datatime in format which would be different than that of the flat file like ccyy-mm-dd.hh.mm.ss.xxxxxxxx (here x has to be filled up with zeros).

    May be I am not getting it in right idea to solve this - any solution would be appreciated.

    I have tried using the following ways:

    - Used a flat file source to get the "csv" file and then gave it has an input to OLE db destination with a table of fixed data types created. But the problem here is that the columns are loaded but I have to fill up with zeros in case of the date when it is been loaded or in most of the columns if I am not utilizing the total length then it has to preceded with zeros in it for example if I have a Orderid of length 4 and in the flat file i have an order id like "201" then it has to be changed to "0201" when it is loaded in the table.

    - I also tried another way of using a flat file source and created a variable which takes the total row as an input and tried to separate it with derived columns I was to an extent successful in getting it but at last the data type in the derived column got fixed to Boolean type explicitly which I am not able to change to the data type which i would want to.

    Please give some suggestions on how to handle this issue..

    thanks in advance,

    varun

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • varun_2003 (7/28/2011)


    Hi,

    I do have couple of questions on the task which I trying to break my head and it would be greatly appreciated for any answer:

    I do have a flat file(csv) which comes has a input and I have to extract the data from the csv file and load into the destination table with a specific format based on position. For example if I have order_id,Total_sales,Date_Ordered with some data in it. I have to extract the data and load it in a table which has a format in which for the first field of fixed length of 2 with numeric as a datatype and total_sales into the column of total_sales in the table with numeric of length 10 and then date as datatime in format which would be different than that of the flat file like ccyy-mm-dd.hh.mm.ss.xxxxxxxx (here x has to be filled up with zeros).

    May be I am not getting it in right idea to solve this - any solution would be appreciated.

    I have tried using the following ways:

    - Used a flat file source to get the "csv" file and then gave it has an input to OLE db destination with a table of fixed data types created. But the problem here is that the columns are loaded but I have to fill up with zeros in case of the date when it is been loaded or in most of the columns if I am not utilizing the total length then it has to preceded with zeros in it for example if I have a Orderid of length 4 and in the flat file i have an order id like "201" then it has to be changed to "0201" when it is loaded in the table.

    - I also tried another way of using a flat file source and created a variable which takes the total row as an input and tried to separate it with derived columns I was to an extent successful in getting it but at last the data type in the derived column got fixed to Boolean type explicitly which I am not able to change to the data type which i would want to.

    Please give some suggestions on how to handle this issue..

    thanks in advance,

    varun

    I have to ask... what business rule is it that requires you to make the extremely severe mistake of storing formatted data in a table? And, no... I'm not slamming you... I just want you and the people you work for to understand that storing formatted data in a table is one of the absolute worst things you can do with a Database. It leads to more problems than you can shake a stick at.

    My recommendation is to store the data in an appropriate datatype for the data. The only time you should actually format the data is when you actually generate a report and then the formatting should usually be done in the presentation layer by the reporting tool.

    The only exceptions to that rule are 1) you don't have a reporting tool or 2) you're writing to a file.

    Like I said... I'm not calling anyone any names on this. I'm just trying to make you aware of the potentially huge mistake this will all be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for your response - @ Stewartc

    I am using SSIS - I will try your suggestion. I am in the process of more on flat files - could you please help in explaining the process of doing a task where you receive a flat file with some data as i told before like Orderid, totalsales, date - if there are lots of files in this format in a specific folder. I need to extract these files and load it into a db where orderid is a primary key - so when these files are loaded into a table it has to check with the primary key and update only the records which are not repeated. I hope this might be easy task but i would really appreciate your response on the same..thanks

  • @jeff - Thanks for your response and I do understand the reason behind why you had mentioned.. I am in the learning process and correct myself as needed with the suggestions.. I would take this and will discuss on why there is a need of formatted data - if i could get a valid reason i would definitely share - thanks..

    Could you please help in explaining the process of doing a task where you receive a flat file with some data as i told before like Orderid, totalsales, date - if there are lots of files in this format in a specific folder. I need to extract these files and load it into a db where orderid is a primary key - so when these files are loaded into a table it has to check with the primary key and update only the records which are not repeated. I hope this might be easy task but i would really appreciate your response on the same..thanks - Varun

  • >>if there are lots of files in this format in a specific folder. I need to extract these files and load it into a db where orderid is a primary key

    Use a Foreach Loop Container in SSIS

    Load the files into a staging table.

    Create an UPDATE statement that joins the staging table with the table you want to update on the key field.

  • varun_2003 (7/29/2011)


    @Jeff - Thanks for your response and I do understand the reason behind why you had mentioned.. I am in the learning process and correct myself as needed with the suggestions.. I would take this and will discuss on why there is a need of formatted data - if i could get a valid reason i would definitely share - thanks..

    Could you please help in explaining the process of doing a task where you receive a flat file with some data as i told before like Orderid, totalsales, date - if there are lots of files in this format in a specific folder. I need to extract these files and load it into a db where orderid is a primary key - so when these files are loaded into a table it has to check with the primary key and update only the records which are not repeated. I hope this might be easy task but i would really appreciate your response on the same..thanks - Varun

    My apologies, Varun. I could explain (and in quite some detail) but it would likely do you no good because I don't use SSIS for such things. For such things, I use only T-SQL especially for high volume ETL. Try running the same SSIS package more than once concurrently and see part of the reason why.

    I will tell you something that may help, though. Part of my process is to never import into the final table directly. Way too much can go wrong and it's not just with things like having a possibly duplicated OrderID being used as a PK. For example, what are you supposed to do if you have two or more OrderID's in the import file but the data they contain is different? How would you handle such "dupes". Heh... and don't say it'll never happen... carbon based life-forms are involved which means anything can happen. 😛

    I ALWAYS (and I mean that) import into a staging table which is almost always a Temp Table. Then, I can "clean" the data, check for dupes, isolate dupe keys that have different data, etc, etc, ad infinitum. As I'm cleaning/validating and doing other "pre-processes" on the data, I'm also leaving "breadcrumbs" in the form of a status column so I know what to do with each row (as part of a set) when I'm done cleaning/validating the data. Some will go into the final table(s). Some may go to a "recycle" table. Others may go to an "errata" table to be manually repaired. To be sure, though, none of the data moves from the staging table until all cleansing, validation, and other preprocessing is complete. Then, there's a very quick final thrust of inserts, updates, and, perhaps, archival steps that occur and I'm in and out of the final tables before any one can tell.

    And all of that works very, very well for the type of files and data you speak of and, rumor has it, that you can do such things quite nicely in SSIS. I just don't know how to do it there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff,,

    Thanks for your time and I am in the learning process - will understand the concept and implement the same at my work place..

    varun

  • Chrissy321 (7/29/2011)


    >>if there are lots of files in this format in a specific folder. I need to extract these files and load it into a db where orderid is a primary key

    Use a Foreach Loop Container in SSIS

    Load the files into a staging table.

    Create an UPDATE statement that joins the staging table with the table you want to update on the key field.

    Thanks for your response -- I did understand this task..

    I am trying to do another task which seems to give me wrong answer..

    For example I am declaring a var at the package level and reading the rows from the flat file into the variable like "1000,200,01/01/2011"(which is actually (Orderid,cost,date). How do you use derived column to separate it into different columns like for example I want to derive the orderid as a separate column with length of 5 with a 0 in front of it i tried to use the following expression to derive the orderid but it was giving a solution of False in the column.

    (DT_STR,5,1252)(@[User::Var] == REPLICATE("0",(5 - LEN(SUBSTRING([Orderid,Total,Date],1,FINDSTRING([Orderid,Total,Date],",",1))))) + SUBSTRING([Orderid,Total,Date],1,FINDSTRING([Orderid,Total,Date],",",1)))

    What would be the possible error in my above expression?

    Thanks a lot for your help in advance.

    -varun

  • varun_2003 (7/30/2011)


    For example I am declaring a var at the package level and reading the rows from the flat file into the variable like "1000,200,01/01/2011"(which is actually (Orderid,cost,date). How do you use derived column to separate it into different columns

    You shouldn't and there's no need to. I don't know precisely what it's called in SSIS but I think it's called something like a Bulk Load task or some such. It will easily load comma separated values into a staging table. THEN you can add the required formatting, such as leading zero's, either by using the computed columns previously suggested or on th fly during a transfer to the final table.

    Again... I'm no Ninja in SSIS but, having worked with some folks that are, I know this is possible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply