ADD column storeID while bulk inserting

  • Hi All

    I have a question that is realy Urgent!!! Please reply

    I have flat files to be inserted to a table using Bulk Insert.

    Each flat file is coming from a different sources(Retail stores) and each file names has the retail store Id

    on it like "SalesData0024"....0024 to be the store ID. Each flat file may have one or more rows in it(comma separated ','and new line '').

    THE question is: while we import the data it is required to extract the storeId from the file name and add the storeID as a new column.If the file has more than one row the same store id should be repeated for each row.

    Let me give you the stored procedure which perfectly loops through the files and insert the rows to a table.

    alter procedure [dbo].[BINS_ImportMultipleFiles] @filepath varchar(500),

    @pattern varchar(100), @TableName varchar(128)

    as

    set quoted_identifier off

    declare @query varchar(1000)

    declare @max1 int

    declare @count1 int

    Declare @filename varchar(100)

    set @count1 =0

    create table #x (name varchar(200))

    set @query ='master.dbo.xp_cmdshell ''dir '+@filepath+@pattern +' /b'''

    insert #x exec (@query)

    delete from #x where name is NULL

    select identity(int,1,1) as ID, name into #y from #x

    drop table #x

    set @max1 = (select max(ID) from #y)

    While @count1 <= @max1

    begin

    set @count1=@count1+1

    set @filename = (select name from #y where [id] = @count1)

    set @Query ='BULK INSERT '+ @Tablename + ' FROM '''+ @Filepath+@Filename+'''

    WITH ( FIELDTERMINATOR = '','',ROWTERMINATOR = '''')'

    exec (@query)

    end

    drop table #y

    Note: SSIS cannot be used. If single rows had been in each file , creating identity column and joining it to store table which will have storeId and identity column would have been easier.

    sql 2000

    Thanks,

    woine21@yahoo.com

  • Use an XML format file and OPENROWSET BULK. This allows you to modify the bulk load stream without losing any of the benefits of bulk loading - it will still be minimally logged if you follow the rules.

    You did not provide data for me to test with, so I won't both providing you with a tailored solution. You can find a pretty full example I wrote on a recent thread here on SSC:

    http://www.sqlservercentral.com/Forums/FindPost916633.aspx

  • Thanks for responding

    please see sample of data given and the result required:

    CREATE TABLE [dbo].[OrdersBulk](

    [CustomerID] [int] NULL,

    [CustomerName] [varchar](32) NULL,

    [OrderID] [int] NULL,

    [OrderDate] [smalldatetime] NULL

    [storeID] varchar(5) not null

    ) ON [PRIMARY]

    sample data

    fileName1: saleData0024

    1,foo,5,20031101

    3,blat,7,20031101

    5,foobar,23,20031104

    fileName2: saleData0056

    12,abc,5,20031101

    result set expected :

    CustomerIDCustomerNameOrderIDOrderDatestoreId

    1 foo 52003-11-01 00:00:000024

    15 blat72003-11-01 00:00:000024

    5 foobar 232003-11-04 00:00:00 0024

    12 abc 52003-11-01 00:00:00 0056

    and ofcourse there are hundreds of files coming from retail stores

    thanks

  • Why don't you have a decent crack at it first?

    See how you get on.

  • Paul White NZ (5/29/2010)


    Use an XML format file and OPENROWSET BULK. This allows you to modify the bulk load stream without losing any of the benefits of bulk loading - it will still be minimally logged if you follow the rules.

    You did not provide data for me to test with, so I won't both providing you with a tailored solution. You can find a pretty full example I wrote on a recent thread here on SSC:

    http://www.sqlservercentral.com/Forums/FindPost916633.aspx

    Did you notice near the bottom of the original post where it says "SQL 2000"? I hate it when they do that on a 2k8 forum.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/29/2010)


    Did you notice near the bottom of the original post where it says "SQL 2000"? I hate it when they do that on a 2k8 forum.

    Gah! No I didn't - but looking back, I'm not surprised I missed it! And,yes, it is a bit irritating. I did notice the 'SSIS cannot be used' comment, but I figured that was just a lack of knowledge thing. Sigh.

    You're right then: The BULK option was added to OPENROWSET in 2005, so it's not going to help here 🙁

    I can't immediately think of a neat solution for this problem in 2000.

    I know there are a lot of 2000 systems out there, but come on people, you're two-and-a-half major versions back now, and out of support!

  • Unfortunatley our company uses DB2 as main database ,

    but they have sql server on side which is version 2000.

    I ask apology for posting it here,but that was the only option I had to get quick answer, since it was very urgent need to my company..........:-D:hehe:

    any intelegent idea to solve this would be advisable

    Thanks Guys

  • daniarchitect (5/30/2010)


    Unfortunatley our company uses DB2 as main database ,

    but they have sql server on side which is version 2000.

    I ask apology for posting it here,but that was the only option I had to get quick answer, since it was very urgent need to my company..........:-D:hehe:

    any intelegent idea to solve this would be advisable

    Thanks Guys

    Wrong statement.

    With just a little more effort you could have figured that this is a SS2K8 forum and that there is a SS2K forum, too.

    Furthermore, since you have a request marked as "urgent" you might want to get a consultant in. Not only this person would be available almost immediately, you could rely on a fast answer, too. Please note that it's a holiday weekend in the U.S. This probably will limit the number of people seeing your post.

    And yet another option to get help faster is to provide ready to use sample data...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Why is SSIS not an option? Your databases do not need to be SQL 2005 to be able to use SSIS on them.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • daniarchitect (5/30/2010)


    Unfortunatley our company uses DB2 as main database ,

    but they have sql server on side which is version 2000.

    I ask apology for posting it here,but that was the only option I had to get quick answer, since it was very urgent need to my company..........:-D:hehe:

    any intelegent idea to solve this would be advisable

    Thanks Guys

    As you can see, posting in the wrong forum really doesn't help get an answer quicker. It fact, it makes things worse because people offer solutions that won't work in the version you want (even though you posteded the correct version near the end of your post) and it ends up pissing people off that they wasted their time trying to help someone.

    You've also asked for an ETL system to handle hundreds of files meaning that it would be nice if it were automatic and you shouldn't expect "quick" answer on such a thing.

    I'm working on a simple solution that you'll need to expand upon a bit... be patient. "intelegent" ideas take a bit. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As requested, here's a very simple SQL Server 2000 solution and it will load "hundreds of files" provided that they are, in fact, all named the same except for StoreID and all have the same data structure.

    I normally don't bring data directly into a "final" table. I normally bring the data into a "staging" table, validate the data, and THEN insert from the staging table to the final table. This problem is no exception. The only thing I haven't done is to write any validation code. You can do that based on whatever your requirements are.

    As usual, the comments in the code are important. Make sure you read them.

    As a side bar, if you have problems that require sample files, you should attach them to your post to save the people who are trying to help you some time. And don't try to "cheat" a quick answer anymore. Most of us monitor all forums. Always post your question in the correct forum.

    {EDIT} Note that most of the following code is NOT dynamic SQL but the software on this forum seems to thing it is. I had to use CHAR(92) in one spot where I could have just used a backslash to "make it pretty" on this forum. You, of course, could change it back.

    Here's the code...

    --===========================================================================

    -- Presets

    --===========================================================================

    --===== Conditionally dropany Temp Tables to make reruns easier.

    -- Notice that #FileList will be built on the fly using SELECT/INTO

    IF OBJECT_ID('TempDB..#DirList','U') IS NOT NULL

    DROP TABLE #DirList

    ;

    IF OBJECT_ID('TempDB..#FileList','U') IS NOT NULL

    DROP TABLE #FileList

    ;

    IF OBJECT_ID('TempDB..#OrdersBulkStaging','U') IS NOT NULL

    DROP TABLE #OrdersBulkStaging

    ;

    --===== This is the same as your original import table

    -- with one small change

    CREATE TABLE #OrdersBulkStaging

    (

    [CustomerID] [int] NULL,

    [CustomerName] [varchar](32) NULL,

    [OrderID] [int] NULL,

    [OrderDate] [smalldatetime] NULL

    )

    ;

    --===== This table will collect all file names and directory

    -- names from the "current" directory.

    CREATE TABLE #DirList

    (

    ObjectName SYSNAME,

    Depth TINYINT,

    IsFile TINYINT

    )

    ;

    --===== Declare the local variables

    DECLARE @Counter INT,

    @FileCount INT,

    @StoreID VARCHAR(5),

    @sql NVARCHAR(4000)

    ;

    --===========================================================================

    -- Get the file names we need to process. In the process, we also

    -- isolate the StoreID from the file name.

    --===========================================================================

    --===== Collect the file names in the "current" directory.

    -- OBVIOUSLY, THE DIRECTORY NAME WILL NEED TO CHANGE.

    INSERT INTO #DirList

    (ObjectName, Depth, IsFile)

    EXEC Master.dbo.xp_DirTree 'C:\Temp' , 1 , 1

    ;

    --===== Isolate only the file names we want to work with.

    -- Notice how the StoreID is split out at this time.

    -- This is were #FileList is built/populated on the fly.

    SELECT IDENTITY(INT,1,1) AS RowNum,

    ObjectName AS FileName,

    SUBSTRING(ObjectName, 9, 4) AS StoreID

    INTO #FileList

    FROM #DirList

    WHERE ObjectName LIKE 'saleData[0-9][0-9][0-9][0-9]'

    ;

    --===== Remember how many files names that we need to address

    SELECT @FileCount = @@ROWCOUNT

    ;

    --===== Since we're working with SQL Server 2000, we don't have VARCHAR(MAX).

    -- Instead, we need to loop through the file names. We use the "loader

    -- view" here to skip the missing StoreID information.

    -- As a side bar, this is one of the very few places where a WHILE LOOP

    -- should actually be used in SQL Server 2000.

    SELECT @Counter = 1

    ;

    WHILE @Counter <= @FileCount

    BEGIN

    --===== This loads the data, sets up the StoreID variable, and increments the counter

    -- OBVIOUSLY, THE DIRECTORY NAME WILL NEED TO CHANGE.

    SELECT @sql = N'BULK INSERT #OrdersBulkStaging FROM ''C:\Temp' + CHAR(92) + FileName + ''' '

    + 'WITH (FIELDTERMINATOR = '','',ROWTERMINATOR = '''', CODEPAGE = ''RAW'', DATAFILETYPE = ''CHAR'')',

    @StoreID = StoreID,

    @Counter = @Counter + 1

    FROM #FileList

    WHERE RowNum = @Counter

    EXEC (@SQL)

    ;

    --===== Do the insert to the final table from the staging table.

    -- This step also applies the correct StoreID

    INSERT INTO dbo.OrdersBulk

    (CustomerID, CustomerName, OrderID, OrderDate, StoreID)

    SELECT CustomerID, CustomerName, OrderID, OrderDate, @StoreID

    FROM #OrdersBulkStaging

    ;

    --===== All done for this file. Truncate the staging table.

    TRUNCATE TABLE #OrdersBulkStaging

    END

    ;

    --===== Show that we've successfully loaded the OrdersBulk table from all files

    -- with names like 'saleData[0-9][0-9][0-9][0-9]'

    SELECT * FROM dbo.OrdersBulk

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CirquedeSQLeil (5/30/2010)


    Why is SSIS not an option? Your databases do not need to be SQL 2005 to be able to use SSIS on them.

    Even if SSIS isn't per se an option, I haven't head anything that couldn't be done through DTS.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (5/30/2010)


    CirquedeSQLeil (5/30/2010)


    Why is SSIS not an option? Your databases do not need to be SQL 2005 to be able to use SSIS on them.

    Even if SSIS isn't per se an option, I haven't head anything that couldn't be done through DTS.....

    True enough... someone please post a complete DTS solution or the steps to accomplish it. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I no longer have a functional version of DTS (my 2K server is history). That said (from memory, so bear with me):

    - First would be an ActiveX task, to check the designated directory and pull all files names. Dump the files into a table

    - upon completion of step 1, fire a data-driven task, set up to fire a BulkInsert task for each row in the table populated by step one (clearly you'd need a completion status, etc...).

    - if you're inserting into a staging table - once everything has inserted - scrub the data appropriately, then push to the permanent table. That would be a "regular" sql Script task, then a SQL destination.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (5/30/2010)


    I no longer have a functional version of DTS (my 2K server is history). That said (from memory, so bear with me):

    - First would be an ActiveX task, to check the designated directory and pull all files names. Dump the files into a table

    - upon completion of step 1, fire a data-driven task, set up to fire a BulkInsert task for each row in the table populated by step one (clearly you'd need a completion status, etc...).

    - if you're inserting into a staging table - once everything has inserted - scrub the data appropriately, then push to the permanent table. That would be a "regular" sql Script task, then a SQL destination.

    Well done, especially from memory. You'd also need some loop code to decide if you where done or not and the actual "go back to the beginning" loop code. It all takes "Active X" (VBS) just about every step of the way. A "decent" example of how to do this in DTS can be found at http://www.sqldts.com/246.aspx. I put "decent" in quotes because it takes a lot of code to make those little icons actually work properly.

    I think it's long, ugly, requires more than 1 language, and totally unnecessary. The tricks you have to go though in DTS (or even SSIS) to do something like a simple loop have always amazed me as to how easy they could have been instead of having to write a module to enable/disable legs. Even though SSIS is an improvement, I feel much the same way about it... long, ugly, requires more than 1 language, and totally unnecessary. Heh... Even a looping DOS batch file would be simpler. 😛

    I guess I'm going to have to pick back up on VB.NET or C#, write a decent and permanent replacement for xp_DirTree, add a tight little "archive move" module, and make it so that it's so easy to do in T-SQL (already really easy) that no one will even want to import flat files or XML files using DTS or SSIS ever again.

    Now, if we could just get MS to write a decent replacement for the JET drivers, we'd be all set. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply