July 29, 2020 at 12:36 pm
I noticed that for those tables with non-unique cluster key columns, SQL adds an internal unique identifier. Is there a way to retrieve a single row using this internal unique identifier which seems to be an Int32 number.
July 29, 2020 at 1:35 pm
Can you provide an example of such a table, using DDL and some INSERT statements? Also, why would you not add your own such column to such a table?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 29, 2020 at 5:00 pm
No, you cannot access nor specify the "uniquifier" value added by SQL Server.
Often it's best just to use identity or some other value to insure uniqueness yourself, and you would then, of course, also know that value and be able to specify it.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 29, 2020 at 5:29 pm
As far as I know, if a row has no guaranteed way to be unique, SQL adds a hidden column onto the end to make it unique. Since you didn't make that column and didn't tell SQL you needed it, SQL marks it entirely internal and there is no way to query it.
It is just about how the data is stored on disk. SQL needs to have some way to know that a row is unique. if you don't provide one, it does so in the background.
If you REALLY need to query it, you would need to read the data on the page, store that into a table (so you have the uniquifier column) and then you could query it. But if you are going to the trouble of doing that, you could get the same benefit of just adding your own unique row on there. Something like a ROW_NUIMBER() OVER (PARTITION BY columnA, columnB, columnC ORDER BY columnA, columnB, columnC). May not give you the SAME ordering as what is done at the page level, but will make each row unique.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
July 29, 2020 at 5:51 pm
I noticed that for those tables with non-unique cluster key columns, SQL adds an internal unique identifier. Is there a way to retrieve a single row using this internal unique identifier which seems to be an Int32 number.
What practical reason do you have for wanting to do this?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 29, 2020 at 6:24 pm
Nice find, Brian. An excellent article that demonstrates that the uniqueifier value is unique within a given set of otherwise duplicated values, from the point of view of the index involved. Thus you can have up to just over 2 billion (int32) duplicates for any given unique value in the index. It would be handy to be able to access that value, but it's not worth the trouble when ROW_NUMBER() could have provided a similar functionality as a column in the table to begin with.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 29, 2020 at 7:20 pm
Hi all, thanks for all interesting replies. The practical reason is to work at the transaction log level. For updates, the transaction log has this uniquefier number to represent a row and I need to retrieve the corresponding row, which has specific use cases. May be, pulling the data by page is the only option.
July 29, 2020 at 7:53 pm
Hi all, thanks for all interesting replies. The practical reason is to work at the transaction log level. For updates, the transaction log has this uniquefier number to represent a row and I need to retrieve the corresponding row, which has specific use cases. May be, pulling the data by page is the only option.
Have to wonder why you would want to use the transaction log when you could just pull the data? Or is the database design the reason you need to use the log, on account of the number of dupes being potentially large for a given set of values at the index level? I'd much sooner fix the database design than resort to using the tlog.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 29, 2020 at 7:56 pm
This is an entirely different use case. I need to generate some events when an insert or delete occurs and this system needs to be pluggable to any SQL database. So I can't really fix the database and I don't intend to do so.
July 29, 2020 at 8:01 pm
> I need to generate some events when an insert or delete occurs and this system needs to be pluggable to any SQL database. <<
Presumably you would want to do that with a (well-written!) trigger. A trigger would have access to the entire row (not the uniquifier value, but every data column in the row).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 29, 2020 at 8:10 pm
Okay, so let me get this straight. Your data is insufficiently unique, so rather than actually fixing the real problem, you'll go to the transaction log and DBCC page commands rather than solve it the right way, and then you expect me to accept that idea as if it were in some way a normal thing to do?
And Scott's suggestion of using a trigger might be a good alternative until the design can be corrected.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 29, 2020 at 8:15 pm
One of the requirements is that there is no altering of the database (like adding a trigger as you suggested).
July 29, 2020 at 8:22 pm
Well good luck. I would hate to have to look at actual physical data pages to do data comparisons -- and I'm a long-time DBA!
It wasn't too bad having to look at actual physical pages back in the day to determine page usage before a backup, to decide whether a differential backup was not worth it. But what you want to do goes far beyond that.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 29, 2020 at 8:25 pm
Thank you for your best wishes. I am already successful up to this point. I hope I can manage this issue by directly looking at the pages. I understand the whole thing is a bit challenging, but that's fun.
July 29, 2020 at 8:26 pm
you say that the system needs to be pluggable into "any SQL database" then talk about using reading from the data page and the transaction log files. Not sure I follow the logic here as different SQL database systems (Oracle, Postgresql, MySQL, SQL Server, etc) are going to handle this differently. And if you mean literally any SQL database, some may not even put a uniquifier in and may allow rows that are 100% duplicate. Feels like a weird design, but there might be a SQL database system out there that can do that.
Plus, if I am not mistaken, permissions are going to factor into this as well. Reading the tlog and data pages is not something just anyone who can connect to SQL is allowed to do.
It may not hurt to go back to the person who originally made these requirements and figure out why they are in place. Adding a column to the table (ID INT IDENTITY(1,1) NOT NULL for example) or the trigger, would solve the problem for you without needing to dig through things at the page level and dig through the transaction log.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply