Order by While inserting

  • Matt Miller (#4) (2/14/2012)


    ACtually - using an ORDER BY during the insert would guarantee the ID's be generated in the correct order. Conor Cunningham confirmed that to be true a long while back.

    Note that that is the ONLY guarantee. Has nothing to do with which order they were physicially being inserted in.

    So the functionality the OP is looking for IS in fact accomplished by adding an order by into the INSERT.

    ...

    (Yes - you still need the ORDER by in the results if you want the rows to show up in the correct order, but the ID's were assigned in the expected order this time).

    So replacing SELECT ... INTO with INSERT would yield the expected results? (as MSDN is pretty explicit that ORDER BY doesn't work on SELECT INTO)

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Be careful with phrasing: MSDN doesn't say it doesn't work, they say it's not guaranteed to work.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • It just seems to be too much work when you still need to retrieve results in some order. If you don't then who cares?

    You should care when inserting into a table with a clus index and you can ORDER BY the clus index column(s). You should do so, to avoid random INSERTs, page splits, etc..

    Unless for a given INSERT statement, you want random activity vs sequential inserts ... not sure when that would occur, but I guess it's possible.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Koen Verbeeck (2/14/2012)


    Matt Miller (#4) (2/14/2012)


    ACtually - using an ORDER BY during the insert would guarantee the ID's be generated in the correct order. Conor Cunningham confirmed that to be true a long while back.

    Note that that is the ONLY guarantee. Has nothing to do with which order they were physicially being inserted in.

    So the functionality the OP is looking for IS in fact accomplished by adding an order by into the INSERT.

    ...

    (Yes - you still need the ORDER by in the results if you want the rows to show up in the correct order, but the ID's were assigned in the expected order this time).

    So replacing SELECT ... INTO with INSERT would yield the expected results? (as MSDN is pretty explicit that ORDER BY doesn't work on SELECT INTO)

    I'm basing this on Conor's assurances (Conor was in charge of the team that would have implemented this stuff at the time). Here's the reference to it:

    http://blogs.msdn.com/b/sqltips/archive/2005/07/20/441053.aspx

    I have not needed this a lot, but it's held true in the few instances where I HAVE used it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • ScottPletcher (2/14/2012)


    Be careful with phrasing: MSDN doesn't say it doesn't work, they say it's not guaranteed to work.

    I believe that in this case, the clustered index is on the identity. So, the order of insertion does not matter upon insertion because it will by definition be in the order of the clus index.

    Jared
    CE - Microsoft

  • Very interesting post by Conor.

    I referred to the clus index only. I assumed you can't control any other index(es), nor a heap (the ORDER BY would be irrelevant to a heap anyway).

    I know you get much better performance, and far less clus index fragmentation, by doing an ORDER BY as part of the SELECT for the INSERT. I understand it's not guaranteed to insert in that exact sequence, but it happens often enough to be worthwhile.

    I should have stated explicitly that I assumed the table was not partitioned, and the INSERT was not running in parallel.

    I'm not sure how parallel operations would work with this, although I would hope that each parallel "segment" would typically in and of itself be in clus index (sort) order, even though of course each "segment" could have key value(s) that overlapped with other "segment".

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • SQLKnowItAll (2/14/2012)


    ScottPletcher (2/14/2012)


    Be careful with phrasing: MSDN doesn't say it doesn't work, they say it's not guaranteed to work.

    I believe that in this case, the clustered index is on the identity. So, the order of insertion does not matter upon insertion because it will by definition be in the order of the clus index.

    Oh, be careful now. It's not the order in the table that you're trying to establish. It's the values of the IDENTITY column compared to the order of the ORDER BY column(s) that you're trying to establish. The order in the table simply doesn't matter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher (2/14/2012)


    Very interesting post by Conor.

    I referred to the clus index only. I assumed you can't control any other index(es), nor a heap (the ORDER BY would be irrelevant to a heap anyway).

    I know you get much better performance, and far less clus index fragmentation, by doing an ORDER BY as part of the SELECT for the INSERT. I understand it's not guaranteed to insert in that exact sequence, but it happens often enough to be worthwhile.

    I should have stated explicitly that I assumed the table was not partitioned, and the INSERT was not running in parallel.

    I'm not sure how parallel operations would work with this, although I would hope that each parallel "segment" would typically in and of itself be in clus index (sort) order, even though of course each "segment" could have key value(s) that overlapped with other "segment".

    The truth is, none of that matters. You actually don't care about what order the inserts occur. You only care that the values in the IDENTITY column are created in the same order as the ORDER BY column(s). As I said previously, there are only two things in a non-partitioned table that are likely to interfere with that order. Parallelism (which can be fixed using MAXDOP) and indeterminant values in the ORDER BY.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/14/2012)


    SQLKnowItAll (2/14/2012)


    ScottPletcher (2/14/2012)


    Be careful with phrasing: MSDN doesn't say it doesn't work, they say it's not guaranteed to work.

    I believe that in this case, the clustered index is on the identity. So, the order of insertion does not matter upon insertion because it will by definition be in the order of the clus index.

    Oh, be careful now. It's not the order in the table that you're trying to establish. It's the values of the IDENTITY column compared to the order of the ORDER BY column(s) that you're trying to establish. The order in the table simply doesn't matter.

    I guess my point was that it does not matter for splitting the CI. I still have a hard time understanding WHY someone would want to do this. Many cases, it is because they don't understand CIs and how SQL stores data and assume they can then drop an ORDER BY from future statements. Which saves them about 1.2 seconds, depending on your typing speed. However, I could see someone using the identity in a temp table to order separate batches differently so that they can be inserted and grouped and finally ordered according to different sorting; i.e. sorting a list of veggies by name alpha, but a list of fruits by color (I didn't say my EXAMPLE was practical). However, other techniques could still be used for that.

    Jared
    CE - Microsoft

  • I think the order does matter for the clus index and whether it gets fragmented or not. SQL is coded to recognize sequential inserts and not cause page splits for those, but if the inserts are random, you will get page splits. Assuming the clus index key is not a "default" identity column but a more useful key with business values.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (2/15/2012)


    I think the order does matter for the clus index and whether it gets fragmented or not. SQL is coded to recognize sequential inserts and not cause page splits for those, but if the inserts are random, you will get page splits. Assuming the clus index key is not a "default" identity column but a more useful key with business values.

    Absolutely! I think we were agreeing without really knowing it. I was saying "IF" the CI is on the identity column, then it is silly to order beforehand if only for the reason of preventing non-sequential inserts. I guess knowing more about the OP's business case we cannot really understand the point here or point to a better solultion or understand the practicality.

    Jared
    CE - Microsoft

  • ScottPletcher (2/15/2012)


    I think the order does matter for the clus index and whether it gets fragmented or not. SQL is coded to recognize sequential inserts and not cause page splits for those, but if the inserts are random, you will get page splits. Assuming the clus index key is not a "default" identity column but a more useful key with business values.

    IIRC, a single insert of many rows will be done in the clustered index order auto-magically.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IIRC, a single insert of many rows will be done in the clustered index order auto-magically.

    Really? Interesting. I don't recall that. Would be nice tho.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jeff Moden (2/15/2012)


    ScottPletcher (2/15/2012)


    I think the order does matter for the clus index and whether it gets fragmented or not. SQL is coded to recognize sequential inserts and not cause page splits for those, but if the inserts are random, you will get page splits. Assuming the clus index key is not a "default" identity column but a more useful key with business values.

    IIRC, a single insert of many rows will be done in the clustered index order auto-magically.

    I love auto-magic. Its the bestest!

    Jared
    CE - Microsoft

  • I thought it depended on whether SQL thought it was "necessary" or not.

    Whether the number of rows, etc., met whatever magic criteria SQL went by -- which are not documented and can change from release to release.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply