March 26, 2020 at 5:38 pm
Hi, hope everyone is well.
I need to create a unique nonclustered index but that key lenght is 17oo bytes.
I know the work around is to create the index with include.
My question is: would that index with unique be unique? Will an insert fail if it is breaking the logic of the unique nonclustered index with include?
Thanks
Astrid
March 26, 2020 at 6:53 pm
Maximum length for a nonclustered index in SQL Server 2017 is 1700 bytes. Since you have not provided enough information it is hard to say what you problem may be. Perhaps if you provided the full text of the error message and/or the DDL for the table and the index you are trying to create we could provide a better answer.
Edit: This is for the actual indexed columns. You can add additional columns that are not a part of the indexed columns as included columns.
March 26, 2020 at 9:47 pm
My question is: would that index with unique be unique? Will an insert fail if it is breaking the logic of the unique nonclustered index with include?
No - included columns are not a part of the index or unique constraint. I am curious about how you ended up with a table where the only way to identify a unique row is 1700 characters long. How many columns are a part of this index - and what are the data types of each column?
What does this table represent?
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
March 27, 2020 at 1:14 pm
Thanks for the reply. This is the table I need to make - we follow the datavault.
I am able to build the table but I get a warning:
Warning! The maximum key length for a nonclustered index is 1700 bytes. The index 'UC_HubHistoryDataNew' has maximum length of 1802 bytes. For some combination of large values, the insert/update operation will fail.
This is the table code.
CREATE TABLE [Raw].[HubHistoryDataNew](
[HistoryDataHashKey] [char](40) NOT NULL,
[LoadDate] [datetime] NULL,
[RecordSource] [nvarchar](200) NULL,
[LastSeenDate] [datetime] NULL,
[CreateDate] [datetime2](7) NULL,
[UserID] [nvarchar](20) NULL,
[TableName] [nvarchar](50) NULL,
[ColumnName] [nvarchar](50) NULL,
[PrimaryKey] [nvarchar](254) NULL,
[Action] [nvarchar](15) NULL,
[OldValue] [nvarchar](254) NULL,
[NewValue] [nvarchar](254) NULL,
CONSTRAINT [HubHistoryDataNew_PK] PRIMARY KEY NONCLUSTERED
(
[HistoryDataHashKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [INDEX],
CONSTRAINT [UC_HubHistoryDataNew] UNIQUE NONCLUSTERED
(
[CreateDate] ASC,
[UserID] ASC,
[TableName] ASC,
[ColumnName] ASC,
[PrimaryKey] ASC,
[Action] ASC,
[OldValue] ASC,
[NewValue] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [INDEX]
) ON [DATA]
GO
March 27, 2020 at 1:39 pm
if you are following Data Vault design then you have too many columns on that hub. most of the columns there look like attributes of your business key and are not, as far as I can think of, a business key on its own.
Also, and if you can, don't define the columns as nvarchar - unless you use data that is unicode based stick with varchar types as it will save space.
edit: and looking at the columns this looks like it should be a link table (split into link and satellite) rather than a hub.
March 27, 2020 at 1:42 pm
This is a special table, that the column are the composition of that primary hash key. That is why are all there.
I could check about checking between nvarchar and varchar.
Thanks
March 27, 2020 at 2:20 pm
Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK. That would certainly be within the limits of a CI.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 27, 2020 at 2:23 pm
i did not choose the type of table it is, I don't even know what the table is for. I was just asked to create a hub
March 27, 2020 at 2:24 pm
Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK. That would certainly be within the limits of a CI.
what do you mean? we have already hash all the columns to create a hash.
March 27, 2020 at 3:02 pm
Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK. That would certainly be within the limits of a CI.
what do you mean? we have already hash all the columns to create a hash.
Ah... I see. I missed the fact that you posted the code for the table.
There's no way to make an index with 1700 bytes of keys without a possibility of future failure, period, because everyone forgets about the row-headers within the indexes. Even on 2017, it's likely to come up a bit short because of that (although I've not tested it specifically because a 1700 byte keyed index isn't practical, IMHO). You'll need to identify the minimum of what makes the index unique and use INCLUDE for the rest if you want to come even close. Certainly the old and new value columns are not needed to be included in the keys.
The real key here (no pun intended) is that this is a column based AUDIT/HISTORY table and, ostensibly, it's being fed only by triggers (if you're using CLR TRIGGERs, that's a HUGE performance mistake... obviously, I've been there and done that and it needs to be fixed if you have those or anything dynamic within your triggers and there's a way around that). That being said, you really shouldn't need a UNIQUE index on the columns that you're trying to include as keys. At best, you only need the TableName, ColumnName, PrimaryKey, and CreateDate for keys and, with the understanding that it's a huge and wasteful duplication of data, use INCLUDE for the other columns.
Since we have also have this form of auditing at work, I can tell you that your trigger is also being bogged down by converting old and new values to NVARCHAR(). This is one of the very few places in the world where SQL_Variant datatype works a treat (there are some minor caveats but works perfect for everything I need such an audit table for).
--Jeff Moden
Change is inevitable... Change for the better is not.
March 27, 2020 at 3:42 pm
Looking at the table you are creating I have to agree, this is not a Hub table in the DataVault methodology. A HUB table will have the natural business key; if using DataVault 2.0 it will also have an MD5 hash of the business key which will be used as the primary key in the hub, satellite, and link tables; the source system, the date first seen, the date deleted or last seen. It will have no other data. All other data will be in one or more satellite tables and possibly in link tables.
Also, using the DataVault methodology there would be no reason to have columns to hold old and new values. The DataVault methodology is an insert only into the raw data vault even when a physical delete of data occurs in any of the source systems.
If you need to insure uniqueness across all the columns you have listed as index columns you can use CHECKSUM, BINARY_CHECKSUM, or HASHBYTES using all those columns and store that as a persisted column and index on that column. Just remember that you have to consider the data, especially the character data. For example, HASHBYTES will treat the character data as case sensitive data when it hashes the data. This means "SEPTEMBER" and "September" and "September" will generate different hashes.
March 27, 2020 at 6:55 pm
i did not choose the type of table it is, I don't even know what the table is for. I was just asked to create a hub
Then go back to who asked you to create it and get full and correct definition. Under DataVault Methodology you do need to follow the rules - otherwise you just creating a mess.
without doing a full and proper analysis, and based just on the table definition you supplied the following would be a possible representation of the data under DV method.
and i'm stil not happy with the link representation - and Table entry most likely could also be split into table further tables (hub for table, hub for column, link between them - potentially another hub for PrimaryKey and yet another link to join all together)
Hub - User
UserHash
BusinessKey
-- UserID
Hub - TableEntry
TableEntryHash
BusinessKey
--TableName
--ColumnName
--PrimaryKey
Link - TableEntryChanges
LinkHash
UserHash
TableEntryHash
Link Sat
LinkHash
LinkAttributes
--CreateDate
--Action
--OldValue
--NewValue
March 27, 2020 at 7:00 pm
And on a different note - there is no need for that extra unique index - if the hash is done correctly and it is based on those columns then it is already unique by definition as you set it as Primary Key.
and when accessing the table to see if a record already exists it should ALWAYS go there with the hash, never the individual columns, so that index would not be really that useful
March 27, 2020 at 10:37 pm
The other constraint on the size of unique keys is the number of columns, which I believe is 16. Bigint is only 8 bytes. So to the extent all of the OP's nvarchar columns could be substituted by surrogate keys... it's nowhere near the limit of what Sql Server can handle.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply