Lookup tables with Id as clustered PK with long difference may degrade performance?

Question

Post reply

Lookup tables with Id as clustered PK with long difference may degrade performance?

shamshad.ali

Hall of Fame

Points: 3850
More actions
March 2, 2016 at 4:34 am

#316482

We have several lookup tables where Id column is PK auto increment.
Lets say big INT column datatype with records 1,2,3,4, and so on. Due to migration, developer added one max Id
based on the data type in that table for a standard row that is identified for migration process and added Id No. 9223372036854775807 and then reseed the Id to its auto incremental number from which system generated last time.
My question is, would this be OK? or we need to face the performance issue each time we added a record in this table? like page splitting, I/O, wait etc.
Kindly explain or share any article so that I can explain to developers if applicable.
Shamshad

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

David McKinney SSChampion Points: 10472 More actions · Answer 1

No....I don't see any performance impacts...it's just a valid piece of data for the column's datatype.

That's my view anyway.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

shamshad.ali (3/2/2016)
We have several lookup tables where Id column is PK auto increment.
Lets say big INT column datatype with records 1,2,3,4, and so on. Due to migration, developer added one max Id
based on the data type in that table for a standard row that is identified for migration process and added Id No. 9223372036854775807 and then reseed the Id to its auto incremental number from which system generated last time.
My question is, would this be OK? or we need to face the performance issue each time we added a record in this table? like page splitting, I/O, wait etc.
Kindly explain or share any article so that I can explain to developers if applicable.
Shamshad

This is quite an interesting issue, and one that I wish I had the time today to benchmark and blog about. Of course that would require me to actually HAVE a blog though. :hehe: It would be trivial to capture page split activity while inserting to this table.

Anyway, I am leaning towards this causing splits because it is no longer a monotonically-increasing value. The inserted identity ALWAYS has a value higher than it is on the tail page of the clustered index. Assuming this is the case and you did like 99.9998% of the clients and users I have come across and started with IDENTITY (1,1), I would switch to the OPPOSITE end of the value range (or maybe even -1 or -99, which are also common values chosen for this scenario).

Oh, the above holds whether your PK is nonclustered or, again as virtually everyone does, clustered. The second choice is way worse for the page split thing though because it represents fragmenting the actual table itself.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 3

Is it just the one row? Or are you regularly going to be inserting rows at the end of the chain along with rows at the start? If the former, I wouldn't worry about it at all. If the latter, I'm with Kevin, it's going to lead to some odd splits in the index. This is only an issue if you see lots of scans against the table though.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

shamshad.ali Hall of Fame Points: 3850 More actions · Answer 4

As mentioned, rows could be added from auto increment onward.

1

2

3

4

9223372036854775807

reseed to 5

then it will start adding

5

6

7

As it is one time addition of this large number in PK due to migration activity but it will be after all indexed at tail and in case users will add a record 5, 6 and so on, it will move the long number in index each time. Is it not true? or its OK ? Keep in mind developers used BIG INT means lots of additions will be going in this table..

Please guide!!

Shamshad

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 5

The big value will get added to the last page, and then everything else will be inserted behind it. Without testing I couldn't see just to what degree having one row will affect page splits, but I suspect it's going to have an extremely small effect. All other inserts will always fall behind it. That one value will always be an outlier. If that's all that ever happens, I think the effects are negligible to invisible.

The one interesting note is in the statistics. Since you've got the bottom and top of the range running the entire gamut of the data type, no inserts will ever fall outside the range of statistics. I wonder if this is a solution to the statistics issues with identity inserts? That could be a really interesting test.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 6

If this was a fact table with hundreds or thousands of inserts per second, I would be concerned about the impact of the additional page splits.

For a lookup table, though, I cannot imagine this ever being a problem.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

Grant Fritchey (3/2/2016)
The one interesting note is in the statistics. Since you've got the bottom and top of the range running the entire gamut of the data type, no inserts will ever fall outside the range of statistics. I wonder if this is a solution to the statistics issues with identity inserts? That could be a really interesting test.

I didn't test it, but I do know that the update statistics logic is failrly good at detecting outliers. I would expect a histogram entry for just the high value and another histogram entry for the second highest value as the last value of the "increase by one" range. In fact, I would not even be surprised to see just three histogram entries anytime - first value, last "normal" value, and outlier value.

Newly added values will be in the range for the outlier, which probably has avg_range_rows 0. So I doubt that this will really solve the ascending key problem. It might even make it worse because the workarounds MS built are crippled by this high value.

All the above is speculation, please test to confirm or deny.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

shamshad.ali Hall of Fame Points: 3850 More actions · Answer 8

Can you please let me know how can I test it and share the stats? then we can decided if it is feasible to go ahead?

Shamshad

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 9

shamshad.ali (3/3/2016)
Can you please let me know how can I test it and share the stats? then we can decided if it is feasible to go ahead?
Shamshad

Set up the situation. Insert 10 or 15 rows, then add the row way at the end. Update the statistics, check the index fragmentation, reseed the column. Then, insert another 10 or 15 rows, check the existing statistics (don't update) and check the index fragmentation. Then, really test it. Insert 1,000 rows, recheck, 1,000,000 row, recheck.

Then, compare it to a regular table. Do the same tests and measures again, but this time without the wild variance of having a row at the top of the range.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning