Adding RowID to existing table = Wildly Inconsistent Results!

  • I have a really STRANGE issue that's driving me nuts. I am getting inconsistent results when BULK INSERTING data from a tab-delimited text file. As part of my testing, I run the same code on the same file again and again, and I get different results every time! I get this on SQL 2005 and SQL 2012 R2.

    We have an application that imports data from a spreadsheet. The sheet contains section headers with account numbers and detail rows with transactions by date:

    AAAA.1234 /* (account number)*/

    1/1/2015 $150 First Transaction

    1/3/2015 $24.233 Second Transaction

    BBBB.5678

    1/1/2015 $350 Third Transaction

    1/3/2015 $24.233 Fourth Transaction

    My Import program saves this spreadsheet at tab-delimited text, then I use BULK INSERT to bring the data into a generic table full of varchar(255) fields. There are about 90,000 rows in each day's data; after the BULK INSERT about half of them are removed for various reasons.

    Next I add a RowID column to the table with the IDENTITY (1,1) property. This gives my raw data unique row numbers.

    I then run a routine that converts and copies those records into another holding table that's a copy of the final destination table. That routine parses though the data, assigning the account number in the section header to each detail row. It ends up looking like this:

    AAAA.1234 1/1/2015 $150 First Purchase

    AAAA.1234 1/3/2015 $24.233 Second Purchase

    BBBB.5678 1/1/2015 $350 Third Purchase

    BBBB.5678 1/3/2015 $24.233 Fourth Purchase

    My technique: I use a cursor to get the starting RowID for each Account Number: I then use the upper and lower RowIDs to do an INSERT into the final table. The query looks like this:

    SELECT RowID, SUBSTRING(RowHeader, 6,4) + '.UBC1' AS AccountNumber

    FROM GenericTable

    WHERE RowHeader LIKE '____.____%'

    Results look like this:

    But every time I run the routine, I get different numbers!

    Needless to say, my results are not accurate. I get inconsistent results EVERY TIME. It's maddening!

    Here is my code, with table, field and account names changed for business confidentiality. I’m open to suggestions; I’m truly stuck. This is a high profile project at my company; any help would be deeply appreciated.

    TRUNCATE TABLE GenericImportTable;

    ALTER TABLE GenericImportTable DROP COLUMN RowID;

    BULK INSERT GenericImportTable FROM '\\SERVER\General\Appname\DataFile.2015.05.04.tab.txt'

    WITH (FIELDTERMINATOR = '\t', ROWTERMINATOR = '', FIRSTROW = 6)

    ALTER TABLE GenericImportTable ADD RowID int IDENTITY(1,1) NOT NULL

    SELECT RowID, SUBSTRING(RowHeader, 6,4) + '.UBC1' AS AccountNumber

    FROM GenericImportTable

    WHERE RowHeader LIKE '____.____%'

  • Are you expecting that the rowID's be consistent based on the physical order within the given file? Unfortunately you would need to set up the ordering mechanism to KEEP them in that order during the initial bulk loading process since the text file is the only thing that has any form of a physical order.

    Tables do not have a "natural" or physical order, so just because data was loaded organized in a given way in the text file does not guarantee that the table will retain said order. Additionally, queries are designed to optimize performance and will use whatever fetching mechanism gets them the data in the most expedient way (so you don't always get the data coming back in the same order UNLESS you enforce it with an ORDER BY clause).

    That said - depending on how you're intaking the content, there are ways to insert into the initial table in such a way as to establish and ordering key. If you were leveraging SSIS - you might care to look for "row number generation SSIS" in google. or grab some existing built components such as this one:

    http://www.sqlis.com/sqlis/post/Row-Number-Transformation.aspx

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yes, that was what I was expecting. This was a hard lesson, but boy did I learn it!

    My VB.NET app creates the source file by saving an Excel spreadsheet to tab-delimited text; I'll add a bit that inserts a row number into that text file before importing it. At that point my rows will be numbered properly.

    Thanks!

    Barry

  • It would work as expected if you did your BULK INSERT into a table that already has an IDENTITY column. You would need to either put a view on the table to not include the IDENTITY column or create a BCP format file to ignore the IDENTITY column.

    That, notwithstanding, it is better if your VB app adds the numbering as it reads the rows in a serial fashion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply