Data in tthe same order in which it is inserted

  • As a side bar, it's always amazed me that people find importance in the order of the data in a file without actually having something in the file to preserve that order during transfers and actual usage.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/30/2012)


    As a side bar, it's always amazed me that people find importance in the order of the data in a file without actually having something in the file to preserve that order during transfers and actual usage.

    that was my biggest hurdle in "getting" RDBMS theory. the Physical order does not mater and if it does its most likely bad design. once i got that if i want a logical order to things i better add it to the data every thing clicked.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Matt Miller (#4) (3/30/2012)


    Lynn Pettis (3/29/2012)


    pietlinden (3/29/2012)


    add an incrementing identity column to the destination table before running inserts?

    And then when you select from the table be sure to use an ORDER BY clause specifying that identity column.

    Since we're coming in from an external flat file - one way to do this without the order by is...to use SSIS. And no - that doesn't mean to incorporate your existing SQL script in SSIS, you need rebuild the data file import process using the SSIS components.

    Using a script component, you can implement a script-based row number which will in essence create the sequence number in the order of the rows within the file. You don't have to drop your batch commit size to 1 in this scenario.

    You'd end up with the data ordered, which you then use in some way to bring it in to permanent table.

    Now this will bring in an entirely different can of worms, so be sure the effort is worth it. That said - it does the job.

    It will actually do it properly, provided the commit size is 1. I said batch size, and this is logically correct, but in terms of tools like SSIS, bcp and BULK INSERT it is known as commit size.

    Here is a complete proof of concept I did a while back. The file has been lost, but essentially the file was just a text file with a number per line corresponding to it's relative line number in the file. It shows you can load a file and have it loaded in an orderly fashion without first pre-processing the file.

    http://www.sqlservercentral.com/Forums/FindPost1127805.aspx

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jeff Moden (3/30/2012)


    As a side bar, it's always amazed me that people find importance in the order of the data in a file without actually having something in the file to preserve that order during transfers and actual usage.

    Amazes me too why data consumers don't think to ask for this when asking providers for files. Sometimes the consumer only thinks order is important, or realizes it too late. Other times, to the consumer order is in fact very important, and it is important to the data provider as well, but the data provider couldn't care less how the data consumer deals with it.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • capn.hector (3/30/2012)


    PaulB-TheOneAndOnly (3/30/2012)


    opc.three (3/30/2012)


    The only way to ensure the data is not scrambled when inserting into a table is to reduce the "bacth commit size" to 1. Whether bcp, SSIS or BULK INSERT, there is an option to reduce the batch size to 1. Now, what does that do to performance? It kills it! It's analogous to calling an insert proc n-times in a row...but, it will avoid the overhead of adding a line number to the file to use to correlate your data file line number to the database row.

    Please correct me if I'm wrong by I do not think rdbms engine will gently insert rows in order just because batch_commit_size is set to 1 - rows will be inserted into the proper table and that's all we can be sure will happen. Physical order can only be preserved by having a clustered unique index on a column that defines the order you want the rows to be physically stored.

    and even that is not bullet proof. the best way to accomplish this is some how adding an identity column to your tables. with SSIS you can do some transformation of the data (Add the identity column) and base that off of data all ready in a table (read last identity number and start there +1). or alter the database for an identity column and hope setting MAXDOP (1) will keep every thing in order. i know which one i would go with if i had to be sure every thing was in the same order.

    of course every query on the table will have to have an order by (ColmnToSortBy) to ensure the order comming out of the table.

    You are right about needing an IDENTITY column to provide a reference, i.e. to tag the data in the order it was imported, but you do not need to pre-process the file or do any custom coding to achieve the ordered insert via SSIS. Please see the proof of concept in my earlier post.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jeff,

    I thought I said that... maybe he didn't believe me. :hehe:

    (Believe me now?)

  • pietlinden (3/30/2012)


    Jeff,

    I thought I said that... maybe he didn't believe me. :hehe:

    (Believe me now?)

    not just jeff. every? poster has said he needs to some way add identity to the data. some have offered various methods all of which will work (to some extent or another).


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Hector,

    very true.

    Horse, water.

    Water, horse.

  • pietlinden (3/30/2012)


    Jeff,

    I thought I said that... maybe he didn't believe me. :hehe:

    (Believe me now?)

    Said what?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I said that to maintain the insert order of the records, he needed an Identity column. Looked like he didn't believe me.

  • He may not have believed you because you only gave him a partial solution. As the OP explained they had experienced, after only adding an identity column to the table and changing nothing else about their bulk load settings, in certain scenarios the data coming from the flat file will still be committed to the table in a different order than it was read from the file. Complete solutions provided so far were:

    1. Mazharuddin Ehsan proposed rewriting the data file adding a line # to each data line before bulk loading

    2. Matt Miller (#4) proposed pre-processing the data in SSIS within a Script Component to append a line # to each data line before bulk loading

    3. I proposed loading the data from the file into a table with an IDENTITY column one row at a time using a 'max commit size' of 1 per the built-in options on either bcp, SSIS OLE DB Destination or BULK INSERT

    Take your pick. There are probably others too. These three have their pluses and minuses, but all should get the desired result.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi,

    Thanks alot for the suggestions.

    The table already had identity column, SSIS inserts the data to this table every week. but it was not following the sequence which i was expecting.

    I took the backup and deleted all the data, recreated the index and inserted the data back, since then, whenever i was inserting the data, i could find it in the same order when I sort by identity:) . But it was not doing before. I guess, somone has done the delete operation on that table which created gaps between the identity. May be that is why, it was inserting in different way.

    Thanks!!!!!!!

  • opc.three (3/30/2012)


    It will actually do it properly, provided the commit size is 1

    It's been a while but, IIRC, if you set MAXDOP to 1 and you have no indexes to start with, you can get an IDENITY column to number the rows correctly. Of course, that pretty much defines a staging table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mohammed_suhan (3/30/2012)


    Hi,

    Thanks alot for the suggestions.

    The table already had identity column, SSIS inserts the data to this table every week. but it was not following the sequence which i was expecting.

    I took the backup and deleted all the data, recreated the index and inserted the data back, since then, whenever i was inserting the data, i could find it in the same order when I sort by identity:) . But it was not doing before. I guess, somone has done the delete operation on that table which created gaps between the identity. May be that is why, it was inserting in different way.

    Thanks!!!!!!!

    Does your SQL Server have more than one CPU? If so, empty table or not, table with gaps in the identity or not, if you do not set your max commit size to 1 you will eventually see your file be committed to the table in a different order than it exists in your file. It may be committed in order 99 times out of 100, but if it is important to the correctness of your process then that one time it is not committed in the same order as the file will create an issue for you.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jeff Moden (3/30/2012)


    opc.three (3/30/2012)


    It will actually do it properly, provided the commit size is 1

    It's been a while but, IIRC, if you set MAXDOP to 1 and you have no indexes to start with, you can get an IDENITY column to number the rows correctly. Of course, that pretty much defines a staging table.

    Sorry, I missed this post because it came in at the time when I was writing my last post and I just saw the email notification.

    I think you're right, Jeff! I knew there had to be other ways as well. If my understanding is correct the MAXDOP 1 hint will force SQL Server to BULK INSERT the data using only 1 CPU, thereby guaranteeing no parallelism, and no commits happening out of order from what is in the file. As far as I know this type of setting is not available when using SSIS or bcp (although -h ORDER BY looks interesting) which is why you must set the commit size to 1. If it were available in SSIS or bcp I would love to know.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply