Performance of clustered index and clustered primary key

  • I have a table with millions of rows that stores measurement data.  Data get inserted in batches of thousand of records every 5 minutes as we received web service messages.  I have no control over the data submitter, so it is possible that I have duplicate or multiple data sets for the same widget.  I'm storing the data in a table that structurally looks like:

    CREATE TABLE MyTable (
    DataInsertTime DATETIME2(2),
    WidgetID INT,
    Value INT
    )

    Because of the possibility of duplicate data, I can't create a PK, unless I add an IDENTITY column or some other value that guarantees uniqueness. I always query the data for a given widget and point in time, so my queries always include DataInsertTime.  My question is what would perform better? The table with no PK and a clustered index in DataInsertTime and WidgetID, or add the IDENTITY column and a clustered PK on DataInsertTime, WidgetID, ID (IDENTITY COLUMN)?

    I have always stayed away from creating the PK if I didn't have to.  The PK gives me nothing and I always thought that there would be an adverse impact on performance if SQL Server needs to assure uniqueness and manage an index with a additional column. Someone told me that adding the PK would always improve performance, even if I retrieve data from non-unique portion of the PK, such as DataInsertTime and WidgetID.

    I would appreciate your comments.  Thanks

  • You don't need a PK per se, but you should definitely create a clustered index headed by DateInsertTime.  I think in this case you'll also need an IDENTITY column to make the clustering key unique yourself, otherwise I fear you'll end up with too many ghost rows if/when you delete data from the table; like you, I'd rather avoid the overhead of that, but it's probably best to do it in this case.  

    /*If data compression isn't available / you don't use it, naturally remove that part.*/
    CREATE UNIQUE CLUSTERED INDEX MyTable__CL ON dbo.MyTable ( DateInsertTime, WidgetID, ID ) WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 99 ) ON [PRIMARY]; /*change filegroup name if needed*/

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Even if you don't add an explicit unique value to the clustered index, SQL Server will add a uniquifier to the clustered index if the values of the clustered index aren't unique.  This is a requirement of the clustered index.
    If you decide to a identity column make it the trailing column of the clustered index.  You then have two choices, make the clustered index your primary key (yes, a primary key may have more than one column) or use the identity column as the primary key but be sure to make it a nonclustered index.

  • Thanks for the response.  Good point about data compression, I didn't think about that. Compression will slow down data inserts, but it will speed up data retrieval as it will reduce I/O.

  • Again on data compression - eventually the data will have to be archived in a database on a different server.  If both the operational and archived tables are compressed, how will SQL Server transfer the data from operational to archival?  Will it decompress during the select and re-compress on the insert, or is it smart enough to know that both the source and destination tables on the insert/select are compressed and it doesn't decompress/re-compress?  Again, the archive database is on a different server and connected through linked server.  The archival is triggered from the archive server.

    Any additional info is appreciated.

  • N_Muller - Thursday, March 30, 2017 12:25 PM

    Thanks for the response.  Good point about data compression, I didn't think about that. Compression will slow down data inserts, but it will speed up data retrieval as it will reduce I/O.

    Definitely on data retrieval.  And honestly, in many cases for us, it has actually sped up inserts.  I'm certain that's because of (1) reduced I/O for writes, and (2) our CPUs have spare capacity most of the time, so we don't see what for many could be a hit to CPU responsiveness.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • N_Muller - Thursday, March 30, 2017 12:39 PM

    Again on data compression - eventually the data will have to be archived in a database on a different server.  If both the operational and archived tables are compressed, how will SQL Server transfer the data from operational to archival?  Will it decompress during the select and re-compress on the insert, or is it smart enough to know that both the source and destination tables on the insert/select are compressed and it doesn't decompress/re-compress?  Again, the archive database is on a different server and connected through linked server.  The archival is triggered from the archive server.

    Any additional info is appreciated.

    I believe SQL decompresses and re-compresses, as I believe it copies rows rather than pages, but I can't 100% say for sure.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply