DTS - CSV to Database

  • Hi,

    I am trying to import a csv file into a Database thorugh DTS.Every day i get a file with the name <file_name>.yyyymmdd.csv and i have to run a DTS job daily to get the data into the table. Before importing the data into DB , i need to check for duplicate data and if there is one , the entire import has to be cancelled.

    ex: file has x,y,z column and table also has x,y,z column.Before inserting data i need to check if x+y (for all the rows in csv file) is already there in the database.

    How should i go about it and i want to avoid opening a excel file object and do a checking or putting the data into a temporary table before actually inserting.Is it possible to loop thorugh csv columns and write a t-sql query within DTS to check duplicates?

     

    any help on this would be highly appreciated .

     

    thanks,

    deb

     

     

  • Usually you import the data into a temp table in the first dts stepts. Then you can do validation and import/cancel the operation as needed from there.

  • Thanks for the suggestion

    how can  i create a temoprary table every time the DTS is run without haivng a permanent temporary table and store the data from csv file into ti. Could you please outline that how it can be done.

     

    regards,

    deb

  • Why can't you have a permanent table, its standard practice for loading data?

    If you can't have a permanent table then add an ExecuteSQL step at the start to create the table and another ExecuteSQL step at the end to drop the table.

     

    --------------------
    Colt 45 - the original point and click interface

  • Hi,

    This seems to be the exact job that I am doing for a client.

    .csv files come in, the DTS packages process them. In between there is a temp table that gets created each time to validate, etc before data is inserted or updated in the main live tables.

    So, you can create temp tables.

    Hope this is a good explanation.

  • "how can  i create a temoprary table every time the DTS is run without haivng a permanent temporary table and store the data from csv file into ti. Could you please outline that how it can be done."

     

    If I understand your question, I think you are asking how to use a table each time you run the DTS package without having the data from the previous run still in a table with the same name.

    A temporary table is dropped as soon as it goes out of scope. If you name the table #CSVImport in your code, the # sign tells SQL Server to create a temporary table name #CSVImport which you can use just like a permanent table.  When the package completes, this table is automatically dropped (deleted).  When your code runs the next time, it is as if the first instance of the table never existed.

    Hope this is what you were asking.  If I can get a few minutes today, I will work up a short example.

  • Thanks everybody . Superb responses. I am going for a temp table approach.

    1.Execute SQL task (create a temp table at the same DB)

    2. On Success - > migrate the data into that table.

    3. Check for duplicates in the existing temptable( which is basically the.csv file data) with an inner join.

    4. On success -> executing a VB script to check the datain the temp table with the existing data in the DB((a join with the exisitng table in the DB). avoided both cursor or recordset .I can also get the duplicate rows , if  any, and write to a log file.

    5. On success ( no dupes) -> Execute SQL task to do a plain insert from temp table into the existing temp table and Delete the temp table.

    6. On error on any steps above, wrote a VBscript (using Cdonts) to send mail with appropiate error messages.

     

    Thats it. Please let me know if it is an optimum solution.

    Thanks again to everybody.

    Regards,

    Deb

  • Hello,

    This approach should work, but you might want to consider using T-SQL for steps 4 and 5 combined.  I'm not sure what the advantage would be in using VBS to check for duplicates.

    jg

     

  • Deb,

    I took basically the same approach you describe but I combined steps 3,4,5 in one SQL task  (text below).

    If you want to use VB, I'm always in favor of using the method the developer finds easiest to maintain.  Unless your datasets are huge or you do something really ineficient, the savings in processing time are usually erased by maintenance hassles.

    This is the SQL, assumes a previous step imported the CSV file into a table named NewData with same structure as OldData (the existing table), and  that you only want to cancel if col X + col Y is duplicated.  Obviously you can alter the join condition if you are also concerned with Col Z.

    IF EXISTS (SELECT N.ColX

    FROM OldData  O

    INNER JOIN NewData N

       ON N.ColX = O.ColX AND N.ColY = O.ColY)

    BEGIN

     TRUNCATE TABLE NewData

    END

    ELSE

    BEGIN

     INSERT INTO OldData SELECT * FROM NewData

    END

     

    Good luck.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply