July 27, 2011 at 12:27 pm
Eric M Russell (7/27/2011)
When a table is edited in SSMS, say to add a column or alter it's datatype, the operation will often times be scripted out in the form of a table drop and re-create followed by a re-insert from a temporary table. In order to retain the rowid through that process, it would have to be exposed as a column and the rowid would have to be insertable, much the same way that an identity can be inserted. For a rowid to be static and retained transparently through all the various database maintenace operations that can occur would perhaps require that it be something like a unique MD5 hash or binary checksum of the row's primary key.
I would say in a case like this you have a new table so it shouldn't be expected that the rowid would be the same.
July 27, 2011 at 1:00 pm
The InnoDB engine has a default page size of 16K which is global to the engine.
The MySQL web site says you can recompile the code to anything between 4K and 64K depending on your needs.
I don't have a copy of SQL2000 to hand but I seem to remember one of the system stored procs used to retrieve the page size from a system table. That isn't to say you could change it but obviously M$ had considered it before.
I'm not sure about the wasted space thing. Don't small tables share SGAM pages?
As to the RowID thing I seem to remember Joe Celko waxing lyrical about records in tables not being rows and fields in tables are not columns. The whole point of SQL is that you are dealing with sets of records.
I also seem to remember that there is a case where SQL Server will append a hidden unique identifier to a record. It wasn't anything to do with replication, it was something to do with compound primary keys. I think it was in connection with bulk parallel loads giving key violation messages and the gist of it was that an internal 5 byte reference was added to a table so if you were bulk loading something with a compound key under the hood you could get a clash with the 5 byte reference.
July 27, 2011 at 1:01 pm
Eric M Russell (7/27/2011)
When a table is edited in SSMS, say to add a column or alter it's datatype, the operation will often times be scripted out in the form of a table drop and re-create followed by a re-insert from a temporary table. In order to retain the rowid through that process, it would have to be exposed as a column and the rowid would have to be insertable, much the same way that an identity can be inserted. For a rowid to be static and retained transparently through all the various database maintenace operations that can occur would perhaps require that it be something like a unique MD5 hash or binary checksum of the row's primary key.
1st trivial case: no primary key or other unique element at all.
2nd trivial case: All hashes with fewer unique possibilities than the data they're based can have collisions (and will, given sufficient data + [bad] luck).
I'd say stick with the simple; something very like a BIGINT IDENTITY on the table, not preserved after a table drop or truncate. Preferably one that'll wrap around to negative in normal twos complement fashion, should we get that many rows, and/or use the leading bit as a signal of requiring a second BIGINT in UTF-16 variable size fashion.
July 27, 2011 at 1:21 pm
Steve Jones - SSC Editor (7/27/2011)
Ewald Cress (7/27/2011)[hr
I wonder how committed Microsoft is to the 8K page size. A quick look in BOL shows that it is "officially documented", although DMV documentation tends to shy away from saying the number out loud.
Then again, maybe 8K is a "sweet spot" that is unlikely to move after all. By and large, even though data volumes are growing alongside processing power and storage, the growth is in number of rows, and not number/size of columns - (desired) rows per page aren't increasing. There never was an "8K barrier" that needed to be broken through in the past, and I don't imagine we're feeling anything like it now.
It's changed in the past. Used to be 2k.
It's not necessarily about data size, it's more that this could match up more efficiently with retrievals. Lots of disk reads are 64k already.
But isn't this just an argument for avoiding physical fragmentation? If the goal is to ensure that 64k of data is contiguous on disk because we anticipate wanting to read it all, it's a moot point whether this is organised as one large page or as eight contiguous smaller pages. Once it hits buffer cache, I suspect it doesn't make any difference (and we don't even need contiguous memory because of scatter-gather I/O).
I haven't seen the physical design of the old 2k pages, but it sounds likely that the ratio of header to data was rather high.
One argument against really large pages would be the impact on latching; if a "section" of a table is subject to a lot of inserts and updates, latch contention would increase as the page size and number of hot rows increases.
July 27, 2011 at 1:23 pm
Nadrek (7/27/2011)
Eric M Russell (7/27/2011)
When a table is edited in SSMS, say to add a column or alter it's datatype, the operation will often times be scripted out in the form of a table drop and re-create followed by a re-insert from a temporary table. In order to retain the rowid through that process, it would have to be exposed as a column and the rowid would have to be insertable, much the same way that an identity can be inserted. For a rowid to be static and retained transparently through all the various database maintenace operations that can occur would perhaps require that it be something like a unique MD5 hash or binary checksum of the row's primary key.1st trivial case: no primary key or other unique element at all.
2nd trivial case: All hashes with fewer unique possibilities than the data they're based can have collisions (and will, given sufficient data + [bad] luck).
I'd say stick with the simple; something very like a BIGINT IDENTITY on the table, not preserved after a table drop or truncate. Preferably one that'll wrap around to negative in normal twos complement fashion, should we get that many rows, and/or use the leading bit as a signal of requiring a second BIGINT in UTF-16 variable size fashion.
I too agree that an exposed surrogate rowid isn't really needed. An identity serves the same purpose, can be scripted out and re-inserted if needed. When the identity, or any other columns(s), are defined as a unique clustered key, SQL Server will substitute that key in place of the internal rowid. For performance reasons, it can also be beneficial to have that same surrogate key serve as the row identifier in non-clustered indexes.
As mentioned earlier, other RDMS engines like Oracle expose ROWID as a column, but it serves no common practical use for PL/SQL developers. I personally don't miss it in SQL Server.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
July 27, 2011 at 1:49 pm
Eric M Russell (7/27/2011)
Nadrek (7/27/2011)
I'd say stick with the simple; something very like a BIGINT IDENTITY on the table, not preserved after a table drop or truncate. Preferably one that'll wrap around to negative in normal twos complement fashion, should we get that many rows, and/or use the leading bit as a signal of requiring a second BIGINT in UTF-16 variable size fashion.I too agree that an exposed surrogate rowid isn't really needed. An identity serves the same purpose, can be scripted out and re-inserted if needed. When the identity, or any other columns(s), are defined as a unique clustered key, SQL Server will substitute that key in place of the internal rowid. For performance reasons, it can also be beneficial to have that same surrogate key serve as the row identifier in non-clustered indexes.
As mentioned earlier, other RDMS engines like Oracle expose ROWID as a column, but it serves no common practical use for PL/SQL developers. I personally don't miss it in SQL Server.
Actually, I'd very much like it to be exposed; for certain cases (say, overlapping date range detection), having an efficient, guaranteed unique per row value is of great use. Even better if it's available on every table even on third party applications with bad designs.
July 27, 2011 at 1:58 pm
Ewald Cress (7/27/2011)
Back on the secondary subject of larger pages, I guess the obvious argument against them would be buffer cache wastage. If you have hundreds of often-used lookup/dimension tables with only a few narrow rows in each, quite a lot of buffer cache will essentially be useless, with the wasted amount being proportional to the page size. A similar argument will apply where there have been lots of page splits.
Huh? Does each buffer cache page hold records from only one table? Even if that's true, and even though I sometimes see narrow lookup tables, I rarely see any that are much smaller than an 8K page in size. If your lookup tables are entirely cached (because they are narrow and don't have too many rows), that might be a good thing for performance.
Still, I thought table data could share cache pages. But I could be wrong.
July 27, 2011 at 2:03 pm
Nadrek (7/27/2011)
Eric M Russell (7/27/2011)
Nadrek (7/27/2011)
Actually, I'd very much like it to be exposed; for certain cases (say, overlapping date range detection), having an efficient, guaranteed unique per row value is of great use. Even better if it's available on every table even on third party applications with bad designs.
I use rank() heavily for doing things like querying the most recent version of a row from audit tables as of a certain cutoff date and also for things like "return the patient's primary phone number" and there are multiple phone numers on file with nothing to indicate which is "primary". However, in all these cases there is a unique sequential ID or date/time column that can be leveraged for ranking.
How would you leverage ROWID for detecting "overlapping date ranges"; doesn't that typically involve comparing two or more date columns?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
July 27, 2011 at 3:04 pm
David Walker-278941 (7/27/2011)
Ewald Cress (7/27/2011)
Back on the secondary subject of larger pages, I guess the obvious argument against them would be buffer cache wastage. If you have hundreds of often-used lookup/dimension tables with only a few narrow rows in each, quite a lot of buffer cache will essentially be useless, with the wasted amount being proportional to the page size. A similar argument will apply where there have been lots of page splits.Huh? Does each buffer cache page hold records from only one table? Even if that's true, and even though I sometimes see narrow lookup tables, I rarely see any that are much smaller than an 8K page in size. If your lookup tables are entirely cached (because they are narrow and don't have too many rows), that might be a good thing for performance.
Still, I thought table data could share cache pages. But I could be wrong.
Pages are definitely never shared between tables. The closest thing we have to such sharing in SQL Server is mixed extents, where the eight pages in an extent can belong to eight different allocation units (read "tables").
Let's say that your existing lookup tables are all just under 8k, so very little space is wasted. Now we increase the page size to 64k, and suddenly 56k per page is wasted. This would only be an issue in practice if your design involves a really large number of such small tables, and they are all referenced often enough to want to stay in cache. Probably this is not really such a big deal for many designs.
However, let's say you have a lot of large transactional tables, and they happen to be clustered on a really inappropriate key, causing lots of page splits. Immediately after a page split of an 8k page into two, you'd have around 8k empty space between two pages (assuming the new row that caused the split is quite small). But if those pages were 64k to begin with, you've just created 64k of empty space. This is space that not only fills up the buffer cache, but also disk space that needs to be backed up.
July 27, 2011 at 4:15 pm
Ewald Cress (7/27/2011)
David Walker-278941 (7/27/2011)
Ewald Cress (7/27/2011)
Back on the secondary subject of larger pages, I guess the obvious argument against them would be buffer cache wastage. If you have hundreds of often-used lookup/dimension tables with only a few narrow rows in each, quite a lot of buffer cache will essentially be useless, with the wasted amount being proportional to the page size. A similar argument will apply where there have been lots of page splits.Huh? Does each buffer cache page hold records from only one table? Even if that's true, and even though I sometimes see narrow lookup tables, I rarely see any that are much smaller than an 8K page in size. If your lookup tables are entirely cached (because they are narrow and don't have too many rows), that might be a good thing for performance.
Still, I thought table data could share cache pages. But I could be wrong.
Pages are definitely never shared between tables. The closest thing we have to such sharing in SQL Server is mixed extents, where the eight pages in an extent can belong to eight different allocation units (read "tables").
Let's say that your existing lookup tables are all just under 8k, so very little space is wasted. Now we increase the page size to 64k, and suddenly 56k per page is wasted. This would only be an issue in practice if your design involves a really large number of such small tables, and they are all referenced often enough to want to stay in cache. Probably this is not really such a big deal for many designs.
However, let's say you have a lot of large transactional tables, and they happen to be clustered on a really inappropriate key, causing lots of page splits. Immediately after a page split of an 8k page into two, you'd have around 8k empty space between two pages (assuming the new row that caused the split is quite small). But if those pages were 64k to begin with, you've just created 64k of empty space. This is space that not only fills up the buffer cache, but also disk space that needs to be backed up.
I agree with all of that. I didn't know that pages weren't shared between tables. 8K sounds like a good enough disk extent size to me; I don't personally think that size needs to be expanded, but I learned something about SQL buffers today.
Disk drives are just recently changing from 512 byte allocation units to 4K. (Block size is typically 4K, but allocation units are still 512 bytes until you get to Advanced Format disks.)
Now, I wonder when Internet Protocol and all of the routers in the world will go from ~1500 bytes per I/O to 4K...
July 28, 2011 at 1:59 am
krowley (7/27/2011)
Forgive a relative newbie to SQL internals but why can't the page size be variable per database or even per table? This would give the best of all worlds as a small table or one where selects usually only pull one record at a time could have a very small page size, and larger tables or ones where multiple rows are often pulled at the same time could have a larger page size.
To some extend this sort of thing already happens, what is not discussed so far is that pages are NOT allocated by themselfs, but come in clusters of pages, so called extents. Each extent being 64KB to my knowledge. Large tables and indexes have whole extents are dedicated to them, for tiny lookup tables and the sort, extents contain pages for mixed objects. But a single page itself is always dedicated to a single object!
Now when it comes to reading and writing to pages, I think this happens in amounts of 512 bytes, so called sectors, even that is no term that applies to the database itself, it refers to the undelying file system. Now this size I can see go to 4KB in the not to distant future. As storage size gets larger more and more disks will use 4KB cluster size by default as their optimum.
In short I see no reason to to go 64KB pages, it has drawbacks (and advantages) too other then those mentioned so far. Think of improved compression, the size of internal page references. The internal coding in a page in short has to be changed for good or worse.
I see more befefits in differend methods of storing rows. We now have heaps and indexes, and soon column stores. I always missed plain old true clusered indexes that do not use a B-Tree. Indexing can he accomplished by a binary search, assisted by a histogram or some sort of external mini index. It won't be as fast for random searches as a B-Tree would be I believe, but the data density would be much better (more rows per page) and certain type of queries benefit more from this sort of storage!
As to the issue of rownumbers, the window functions do work just fine. I only wished generated select fields, such as the window functions are could be accessed elsewhere in the query without having to use a CTE or derived table to act on their outcome.
July 28, 2011 at 9:13 am
Eric M Russell (7/27/2011)
How would you leverage ROWID for detecting "overlapping date ranges"; doesn't that typically involve comparing two or more date columns?
It's the trivial case of "exclude each row from overlapping with itself" when you have no unique index/constring, and even when you have 100% duplicate rows. Likewise, it would allow the trivial case of keep one row (the MIN or MAX of the ROWID) and delete the others.
For the page size discussion: no, some people see no value in changing page or extent size. That's fine; let those folks keep the default. For those that do see value in smaller or larger pages on a per-database or per-filegroup basis, give us the option within SQL Server, rather than having to go to DB2 or Oracle to optimize for our particular situation. Even PostgreSQL and MySQL with InnoDB gives the option to select a page size, though it looks like only on a server-wide basis for those products.
July 28, 2011 at 10:10 am
peter-757102 (7/28/2011)
krowley (7/27/2011)
Forgive a relative newbie to SQL internals but why can't the page size be variable per database or even per table? This would give the best of all worlds as a small table or one where selects usually only pull one record at a time could have a very small page size, and larger tables or ones where multiple rows are often pulled at the same time could have a larger page size.To some extend this sort of thing already happens, what is not discussed so far is that pages are NOT allocated by themselfs, but come in clusters of pages, so called extents. Each extent being 64KB to my knowledge. Large tables and indexes have whole extents are dedicated to them, for tiny lookup tables and the sort, extents contain pages for mixed objects. But a single page itself is always dedicated to a single object!
Now when it comes to reading and writing to pages, I think this happens in amounts of 512 bytes, so called sectors, even that is no term that applies to the database itself, it refers to the undelying file system. Now this size I can see go to 4KB in the not to distant future. As storage size gets larger more and more disks will use 4KB cluster size by default as their optimum.
In short I see no reason to to go 64KB pages, it has drawbacks (and advantages) too other then those mentioned so far. Think of improved compression, the size of internal page references. The internal coding in a page in short has to be changed for good or worse.
I see more befefits in differend methods of storing rows. We now have heaps and indexes, and soon column stores. I always missed plain old true clusered indexes that do not use a B-Tree. Indexing can he accomplished by a binary search, assisted by a histogram or some sort of external mini index. It won't be as fast for random searches as a B-Tree would be I believe, but the data density would be much better (more rows per page) and certain type of queries benefit more from this sort of storage!
As to the issue of rownumbers, the window functions do work just fine. I only wished generated select fields, such as the window functions are could be accessed elsewhere in the query without having to use a CTE or derived table to act on their outcome.
SQL data is allocated on disk in 8 kilobyte chunks. The 512 byte sector size of the underlying physical disk does not affect this *for writing SQL data*.
July 28, 2011 at 10:38 am
Nadrek (7/28/2011)
Eric M Russell (7/27/2011)
How would you leverage ROWID for detecting "overlapping date ranges"; doesn't that typically involve comparing two or more date columns?It's the trivial case of "exclude each row from overlapping with itself" when you have no unique index/constring, and even when you have 100% duplicate rows. Likewise, it would allow the trivial case of keep one row (the MIN or MAX of the ROWID) and delete the others.
The rank() or row_number() functions can be used for things like de-duplicating a resultset or selecting top X, even if there is no proper ID or exposed ROWID. For deleting duplicates from a table, it would be best to create a new table containing a proper ID or unique date/time column, and then select into that table a ranked and filtered resultset from the original junk table.
select col1, row_number() over (partition by col1 order by col1) col1_row_number
from
(
select '2011-05-01' col1 union all
select '2011-05-02' col1 union all
select '2011-05-03' col1 union all
select '2011-05-03' col1
) as x;
col1 col1_row_number
---------- --------------------
2011-05-01 1
2011-05-02 1
2011-05-03 1
2011-05-03 2
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
July 28, 2011 at 5:30 pm
For deduping duplicates in a heap-table, I would just delete all offending rows, then re-insert one record. Since a heap cannot have any foreign-keys pointing to it, this will succeed and when done within a transaction the temporarily non-existence of the row is never noticed by the application.
If you have a bad design containing heaps and an even worse designed application inserting duplicates, well, I feel sorry for you having to do this dirty cleanup-technique, but it works. (it is the very reason why I NEVER create heaps and ALL my tables have at least a surrogate primary key, having the engine force the existence of a ROWID as PK is fine with me).
Viewing 15 posts - 46 through 60 (of 67 total)
You must be logged in to reply to this topic. Login to reply