Time for a ROWID?

  • I like the way you think Steve, but you are much more informed than I am.

    Would a ROWID save us from those dastardly heaps? So a heap would no longer considered an oversight and would be respectable? Or, maybe SQL would, in effect, be heapless? 🙂

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • Might be a nice option when creating a table but I would not like it to be there whether it was needed/wanted or not.

    Personally, I don't see why you can't just add an identity column to every table if it makes you feel all warm and fuzzy to have one. I actually see that a lot in table designs by people that don;t know any better (they must be teaching that in college courses these days...) I don't prescribe to that philosophy myself. Why waste space and resources on a column you don't need?

    If there is no natural primary key and the row needs to be uniquely identified for updates or whatever then fine put one in there. I suppose it really doesn't matter for small databases/tables but when you get into millions of rows and high transaction rates every bit of overhead needs to be justifiable or it is out.

    The probability of survival is inversely proportional to the angle of arrival.

  • Steve Jones - SSC Editor (7/27/2011)


    Eric M Russell (7/27/2011)


    I can see how the physical row id would be useful for debugging or performance tuning purposes, like perhaps identifying which table or table partition a specific row in a resultset was contained in. However, it can't be stored anywhere else and used as a real key value, because it's an actual physical locator. Only a logical identifier, like an identity or timestamp, could move with the row as the table is reorganized.

    Can anyone suggest a practical use for it in an application or analytical query; some scenario where it would be more applicable than an identity, rank(), or row_number()?

    Replication is the big one I can see it being important for.

    I'm not sure that an application would need it, though I would expect it could be exposed through some function.

    The row id would have to be something both static and transferrable to the replicated instance, so it couldn't be a matter of simply exposing the the physical location id that already exists internally. It sounds like what we're suggesting would be something like an indentity columns that gets added by default to every table. Celko would call it a retro 1960's attempt at adding a "record number" to each row.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Steve Jones - SSC Editor (7/27/2011)


    The Dixie Flatline (7/27/2011)


    Steve, maybe it's because I haven't had enough caffeine this morning, but I have a question: How does the ROWID column that you propose differ from an identity column?

    Identity isn't necessarily unique, it's a property that can be changed. I'd envision something a little stronger that identifies the row.

    Okay.... if I understand correctly, you're saying you want an int or bigint that is attached to each row at insertion time and cannot be changed afterwards. This would have no relationship to anything except the order of insertion.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • mtillman-921105 (7/27/2011)


    I like the way you think Steve, but you are much more informed than I am.

    Would a ROWID save us from those dastardly heaps? So a heap would no longer considered an oversight and would be respectable? Or, maybe SQL would, in effect, be heapless? 🙂

    I would think so

  • I believe that SQL Server should finally catch up to (or exceed) Oracle and DB2, both of which allow the DBA to select a block size on a granular basis (DB2 allows page size on at least a per tablespace basis, and Oracle allows multiple block sizes), and allow the DBA's to choose block sizes on a per-database or, more logically, per-filegroup basis.

    Leave the default at a reasonable setting, but let the pure OLTP random folks have their 1KB or 2KB pages, and let the big sequential access folks have 256KB or 1MB pages.

    The best option is to have a good default, but let careful DBA's put each heap or index on a filegroup with the best page size for their particular environment, based on their own testing.

    DB2 table space reference: http://www.ibm.com/developerworks/data/library/techarticle/0212wieser/[/url]

    Oracle reference on selecting multiple block sizes in specific situations: http://www.dba-oracle.com/t_multiple_blocksizes_summary.htm

    ETA: I would also say that the ~8000 byte row limit should be based on page size as well; big pages can have big rows within the page. I.e. a 256KB page would have meaningful fillfactor choices even for 20KB rows.

  • sturner (7/27/2011)


    Might be a nice option when creating a table but I would not like it to be there whether it was needed/wanted or not.

    Personally, I don't see why you can't just add an identity column to every table if it makes you feel all warm and fuzzy to have one. I actually see that a lot in table designs by people that don;t know any better (they must be teaching that in college courses these days...) I don't prescribe to that philosophy myself. Why waste space and resources on a column you don't need?

    You don't get to create every table. Lots of applications don't use/want identities. Replication fails here.

    Not having it consistent is what we have now.

  • Eric M Russell (7/27/2011)


    The row id would have to be something both static and transferrable to the replicated instance, so it couldn't be a matter of simply exposing the the physical location id that already exists internally. It sounds like what we're suggesting would be something like an indentity columns that gets added by default to every table. Celko would call it a retro 1960's attempt at adding a "record number" to each row.

    Yes and no. There already is a physical pointer, just not consistently among different structures. I'm thinking more of an internal locator that consistently exists and identifies a row. I'd say GUID, but I don't necessarily want to limit it to that structure. Something that can be a consistent handle to the row.

    Note that I'm not necessarily advocating exposing this by default (select *), but it might be available if needed.

  • Steve Jones - SSC Editor (7/27/2011)


    mtillman-921105 (7/27/2011)


    I like the way you think Steve, but you are much more informed than I am.

    Would a ROWID save us from those dastardly heaps? So a heap would no longer considered an oversight and would be respectable? Or, maybe SQL would, in effect, be heapless? 🙂

    I would think so

    I'm just thinking that, if the page size is 256K or 1MB, then how would that impact page splits and logical fragmentation within the table? More or less?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (7/27/2011)


    I'm just thinking that, if the page size is 256K or 1MB, then how would that impact page splits and logical fragmentation within the table? More or less?

    Good question. I'm not sure. I'd think fragmentation would go down, but maybe not. Full pages are full pages, and inserts/updates in the middle cause issues.

  • Steve Jones - SSC Editor (7/27/2011)


    Eric M Russell (7/27/2011)


    The row id would have to be something both static and transferrable to the replicated instance, so it couldn't be a matter of simply exposing the the physical location id that already exists internally. It sounds like what we're suggesting would be something like an indentity columns that gets added by default to every table. Celko would call it a retro 1960's attempt at adding a "record number" to each row.

    Yes and no. There already is a physical pointer, just not consistently among different structures. I'm thinking more of an internal locator that consistently exists and identifies a row. I'd say GUID, but I don't necessarily want to limit it to that structure. Something that can be a consistent handle to the row.

    Note that I'm not necessarily advocating exposing this by default (select *), but it might be available if needed.

    If the primary key is static, which ideally and most always it is, then that's the consistent identifier for the row.

    Are really just advocating that the coding technique for the existing internal row id be retrofitted so it's always static and sequantial?

    If the table is clustered on a unique key, would this static row id still exist?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Steve Jones - SSC Editor (7/27/2011)


    Eric M Russell (7/27/2011)


    I'm just thinking that, if the page size is 256K or 1MB, then how would that impact page splits and logical fragmentation within the table? More or less?

    Good question. I'm not sure. I'd think fragmentation would go down, but maybe not. Full pages are full pages, and inserts/updates in the middle cause issues.

    It gives you a lot more room in your fillfactor choices, particularly for tables/indexes with larger rows, which in turn impacts fragmentation and page splits.

    As was just mentioned, we don't create every table; sometimes we work with what we're given for the schema. Fillfactor and page size choice, though, is closer to our court, even for third party apps.

  • Eric M Russell (7/27/2011)


    If the table is clustered on a unique key, would this static row id still exist?

    Not smart enough to answer that. Potentially the unique key would be the PK, but not sure. Perhaps it would still exist, but would it matter if it did? My argument is the overhead might be small.

  • Steve Jones - SSC Editor (7/27/2011)


    I'd say GUID, but I don't necessarily want to limit it to that structure. Something that can be a consistent handle to the row.

    I don't know that using a GUID would be best since it's so large. And if you're putting in something that, in essence, can function as a surrogate key I would think you want to make it easy to use for that. If a GUID were used it would have to be sequential since otherwise the link between the GUID and the location on disk would need to be stored someplace adding to the overhead.

  • Steve Jones - SSC Editor (7/27/2011)


    Eric M Russell (7/27/2011)


    The row id would have to be something both static and transferrable to the replicated instance, so it couldn't be a matter of simply exposing the the physical location id that already exists internally. It sounds like what we're suggesting would be something like an indentity columns that gets added by default to every table. Celko would call it a retro 1960's attempt at adding a "record number" to each row.

    Yes and no. There already is a physical pointer, just not consistently among different structures. I'm thinking more of an internal locator that consistently exists and identifies a row. I'd say GUID, but I don't necessarily want to limit it to that structure. Something that can be a consistent handle to the row.

    Note that I'm not necessarily advocating exposing this by default (select *), but it might be available if needed.

    When a table is edited in SSMS, say to add a column or alter it's datatype, the operation will often times be scripted out in the form of a table drop and re-create followed by a re-insert from a temporary table. In order to retain the rowid through that process, it would have to be exposed as a column and the rowid would have to be insertable, much the same way that an identity can be inserted. For a rowid to be static and retained transparently through all the various database maintenace operations that can occur would perhaps require that it be something like a unique MD5 hash or binary checksum of the row's primary key.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 31 through 45 (of 67 total)

You must be logged in to reply to this topic. Login to reply