Lots of Key Lookups vs. UniqueIdentifier Clustered Index

Question

Lots of Key Lookups vs. UniqueIdentifier Clustered Index

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply

David Lafayette SSCommitted Points: 1559 More actions · Answer 1

opc.three:

Currently we have an integer identity, so I imagine I could draw a line in the sand and say all records with an identity of < X go to partition 1, otherwise partition 2.

Right now we do generate some of the UUIDs in the application layer. I imagine I'll have the application reach out to the dB for the UUID... not ideal, but I can't think of a better way to do this.

For generating sequential GUIDs I'm using a sequence and newID() and mashing them together. I do this instead of using newSequentialID() because I need some randomness in my UUID.

This is what I've come up with for getting the randomized, sequential UUID:

CREATE FUNCTION makeGUID

(

@GUID AS UNIQUEIDENTIFIER,

@number AS BIGINT

)

RETURNS UNIQUEIDENTIFIER

AS

BEGIN

RETURN CAST(LEFT(@guid, 24) + RIGHT(CONVERT(VARCHAR(16), CAST(@number AS VARBINARY(50)), 2), 12) AS UNIQUEIDENTIFIER)

END

GO

A couple of things I don't like about it. One, I'd rather use bitwise operators to mash these values up rather than string functions but the data is too wide (128 bits).

Also, I have to pass newID() and NEXT VALUE FOR in as parameters because I can't seem to find a tricky way of using them in a UDF.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 2

Why do you need randomness in your UUID?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

David Lafayette SSCommitted Points: 1559 More actions · Answer 3

Our web-based app exposes these IDs in the URL. Although their is additional security checking when executing queries based on the passed in URL UUID, in our shop it is generally viewed as additional security to have these randomized to some extent to prevent the end user from making simple changes and getting the "next" record.

After reading further on Books Online (thanks @jeff Moden), it appears that NEWSEQUENTIALID() doesn't provide sequential UUIDs between restarts, so I have to build some sort of function for sequential UUIDs, whether or not I require them to be random-sequential or not.

I did clean up my function a bit, removing variable length types, implicit casting, tightening up the lengths and removing some of the string manipulation:

RETURN CAST(LEFT(CAST(@guid AS CHAR(36)), 24) + CONVERT(CHAR(16), CAST(@Number AS BINARY(6)), 2) AS UNIQUEIDENTIFIER)

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 4

dave-L (11/21/2012)
Our web-based app exposes these IDs in the URL. Although their is additional security checking when executing queries based on the passed in URL UUID, in our shop it is generally viewed as additional security to have these randomized to some extent to prevent the end user from making simple changes and getting the "next" record.
After reading further on Books Online (thanks @jeff Moden), it appears that NEWSEQUENTIALID() doesn't provide sequential UUIDs between restarts, so I have to build some sort of function for sequential UUIDs, whether or not I require them to be random-sequential or not.
I did clean up my function a bit, removing variable length types, implicit casting, tightening up the lengths and removing some of the string manipulation:
RETURN CAST(LEFT(CAST(@guid AS CHAR(36)), 24) + CONVERT(CHAR(16), CAST(@Number AS BINARY(6)), 2) AS UNIQUEIDENTIFIER)

Uggh, you have been dealt a tough hand. You are spot on with regard to the "prediction" security issue, and on the "restart issue" with NEWSEQUENTIALID() which was the motivation behind my question about randomness and which partitioning key you would choose. Even if you employed partitioning there is a chance that after a restart NEWSEQUENTIALID() could theoretically spill into your "frozen partition." I think you are on the right track developing your own sequential (yet random :-)) UUID generator.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

David Lafayette SSCommitted Points: 1559 More actions · Answer 5

Thanks for your help opc.three.

One last thing: I hate to leave less than optimal code out there for posterity so here is my last, optimized function with all string manipulation removed 🙂

CREATE FUNCTION makeGUID

(

@GUID AS UNIQUEIDENTIFIER,

@number AS BIGINT

)

RETURNS UNIQUEIDENTIFIER

AS

BEGIN

RETURN CAST(CAST(@guid AS BINARY(10)) + CAST(@Number AS BINARY(6)) AS UNIQUEIDENTIFIER)

END

GO

TomThomson SSC Guru Points: 104773 More actions · Answer 6

dave-L (11/21/2012)
Thanks for your help opc.three.
One last thing: I hate to leave less than optimal code out there for posterity so here is my last, optimized function with all string manipulation removed 🙂
CREATE FUNCTION makeGUID
(
@GUID AS UNIQUEIDENTIFIER,
@number AS BIGINT
)
RETURNS UNIQUEIDENTIFIER
AS
BEGIN
RETURN CAST(CAST(@guid AS BINARY(10)) + CAST(@Number AS BINARY(6)) AS UNIQUEIDENTIFIER)
END
GO

One query on that: is @GUID providing the randomness while number is just a counter? If so, I don't understand why you have them in that order in the returned value.

Tom

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 7

Now wait a minute here.... I thought you had to use the UUID! If you can get away creating your own number, then why are we even worried about this problem? Preserve the external GUID in a column and do everything internally with an IDENTITY value.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Lafayette SSCommitted Points: 1559 More actions · Answer 8

Hi Tom.

It's counter intuitive, but that is how SQL orders UniqueIdentifiers.

David Lafayette SSCommitted Points: 1559 More actions · Answer 9

Jeff, there are zillions of lines of app code that have queries joining on and referencing the UUID.

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 10

dave-L (11/21/2012)
Jeff, there are zillions of lines of app code that have queries joining on and referencing the UUID.

So why are you talking about building a function? The Key Lookups just aren't going to be that bad. Certainly no worse than you maing a function.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TomThomson SSC Guru Points: 104773 More actions · Answer 11

dave-L (11/21/2012)
Hi Tom.
It's counter intuitive, but that is how SQL orders UniqueIdentifiers.

:blush: One of my thicker moments! :blush:

Tom

David Lafayette SSCommitted Points: 1559 More actions · Answer 12

Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

Dave

David Lafayette SSCommitted Points: 1559 More actions · Answer 13

CELKO,

I'm learning a lot though this process, thinking much more carefully about how I should organize my CIs.

Dave

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 14

CELKO (11/22/2012)

Most often non-RDBMS people use their generated physical locator as the cluster because this is how a magnetic tape or disk access modle of data would have done it.

That may be true for non-RDBMS people but others realize the value of it in preventing page splits of the data involved in the CI. Page splits are horribly expensive things both CPU and I/O wise on heavily inserted OLTP tables. Page splits are a frequent cause of massive GUI timeouts.

If a table is mostly static, then I agree... the clustered index should be used for something else. If the table suffers a lot of inserts, then a CI on a narrow, ever increasing, and unique column such as an IDENTITY column or a DATETIME column and an IDENTITY column as a uniquefier is generally the way to go. And, yes... it's very much like mag tape requirements. It worked for mag tape and it works here.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 15

dave-L (11/22/2012)
Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.
True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.
Dave

I'm still confused about it all. I thought you said that the GUID from the GUI was absolutely required and has been used in thousands of places throughout the code. The only way that you'd get rid of the Key Lookups is to do one of two things. Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts or build a covering index with the GUID as the leading column. All of this other work will do nothing to get rid of the Key Lookups.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)