Error when using the IN Keyword on 100.000 elements

  • Ron - at some point if someone decides to place the clustered index on another column then that's the one that will be "ordered by" IF THERE ARE NO OTHER indexes present in the table...

    I know that I have several tables with one clustered index and several thousand rows and I've always gotten the right resultsets without specifying "order by"....ie - it has been ordered by clustered index column!

    It has now become my Life's mission to find out the different permutations and combinations of indexes and their retrieval order/scan vs. seek etc....I have this burning (academic) desire to know if I just got lucky with my tables and if (like Joel) I'll get unordered resultsets (regardless of clustered index) after I hit a certain # of rows....

    Maybe I'm belaboring the point too much....just want to look under the hood and try to understand is all....will get back in a couple of days and tabulate my results!







    **ASCII stupid question, get a stupid ANSI !!!**

  • Good luck... we'll be waiting.

  • Ron - at some point if someone decides to place the clustered index on another column then that's the one that will be "ordered by" IF THERE ARE NO OTHER indexes present in the table...

    And if you were not aware of the clustered index change, the order of your results may be very different from what was expected.

    You can dig away under the hood all you want but at the end of the day the only thing you can really count on is the ORDER BY clause if you want rows returned in a specific order.

     

  • "dig away under the hood all you want".....

    dig - dig - dig......will get back to y'all when I reach China!







    **ASCII stupid question, get a stupid ANSI !!!**

  • sushila, I am not in any way involved in this great site, other than that I write articles once in a while and try to hang out here at the forums. I think they read most of what is posted here though, so I think they might see it themselves.

    However, there is absolutely nothing to write about. Now, once and for all, there is no guarantee whatsoever of how the rows in a resultset will be ordered if you do not use an ORDER BY clause in the query. No matter what indexes exist or do not exist. For all we know a programmer at Microsoft could decide to add a Random Order Generator[TM] based on the current system time or something, that would be executed for every query (not using ORDER BY of course) right before returning the results. It would still be perfectly 'legal' according to ANSI SQL and the relational model.

    As you have noted, for many queries the rows will actually be returned in the order of the clustered index, but you must not ever rely on that being true. A number of things could change the order of the rows, even when executing the same query multiple times after each other. For instance, there is a feature in Enterprise Edition of SQL Server that lets several concurrent threads make use of a scan operation. Lets say that session A executes a query where the plan requires a clustered index scan for object C1. As you have tested yourself, most often session A will receive it's output ordered by the order of clustered index C1, even if the query does not have an order by clause. However, if session B executes the same query while the scan of clustered index C1 is under way, after it has started but before it is finished, then the query engine will 'hook in' session B to the same results. Now session B will get the rows that follow in the scan, and after the scan is finished it will only need to rescan the rows at the start that it missed. And the output will seem to be in 'random' order. I will demonstrate this in another way:

    Lets say we have the time, marked by moments called T1, T2...Tn. We have a table with a clustered index containing rows from R1, R2...Rn that are to be scanned and returned to the client that executes the query. We have to clients here, sessions S1 and S2. Here is how it goes:

    T1. S1 executes query, and the optimizer finds the plan that says scan.

    T2. R1 is scanned and placed in the result for S1.

    T3. R2 is scanned and placed in the result for S1.

    T4. S2 executes the same query, and gets the same plan.

    T5. R3 is scanned, and placed in the results for both S1 and S2.

    T6. R4 is scanned, and placed in the results for both S1 and S2.

    T7. Rn is scanned, and placed in the results for both S1 and S2.

    T8. Results are returned to S1. The order of the results are R1, R2, R3, R4..Rn.

    T9. The results for S2 are not complete, so R1 is now scanned again and placed in S2 results.

    T10. R2 is scanned for S2.

    T11. Results are returned to S2. The order of the results are R3, R4..Rn, R1, R2.

    Do you see now why you can never rely on the clustered index or anything else to make the results of a query to always be returned in the same order? If you need the rows that are returned from a query to be in a certain order you must always use an ORDER BY clause.

Viewing 5 posts - 31 through 34 (of 34 total)

You must be logged in to reply to this topic. Login to reply