March 30, 2017 at 10:37 am
I have a table with millions of rows that stores measurement data. Data get inserted in batches of thousand of records every 5 minutes as we received web service messages. I have no control over the data submitter, so it is possible that I have duplicate or multiple data sets for the same widget. I'm storing the data in a table that structurally looks like:
CREATE TABLE MyTable (
DataInsertTime DATETIME2(2),
WidgetID INT,
Value INT
)
Because of the possibility of duplicate data, I can't create a PK, unless I add an IDENTITY column or some other value that guarantees uniqueness. I always query the data for a given widget and point in time, so my queries always include DataInsertTime. My question is what would perform better? The table with no PK and a clustered index in DataInsertTime and WidgetID, or add the IDENTITY column and a clustered PK on DataInsertTime, WidgetID, ID (IDENTITY COLUMN)?
I have always stayed away from creating the PK if I didn't have to. The PK gives me nothing and I always thought that there would be an adverse impact on performance if SQL Server needs to assure uniqueness and manage an index with a additional column. Someone told me that adding the PK would always improve performance, even if I retrieve data from non-unique portion of the PK, such as DataInsertTime and WidgetID.
I would appreciate your comments. Thanks
March 30, 2017 at 11:32 am
You don't need a PK per se, but you should definitely create a clustered index headed by DateInsertTime. I think in this case you'll also need an IDENTITY column to make the clustering key unique yourself, otherwise I fear you'll end up with too many ghost rows if/when you delete data from the table; like you, I'd rather avoid the overhead of that, but it's probably best to do it in this case.
/*If data compression isn't available / you don't use it, naturally remove that part.*/
CREATE UNIQUE CLUSTERED INDEX MyTable__CL ON dbo.MyTable ( DateInsertTime, WidgetID, ID ) WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 99 ) ON [PRIMARY]; /*change filegroup name if needed*/
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 30, 2017 at 12:24 pm
Even if you don't add an explicit unique value to the clustered index, SQL Server will add a uniquifier to the clustered index if the values of the clustered index aren't unique. This is a requirement of the clustered index.
If you decide to a identity column make it the trailing column of the clustered index. You then have two choices, make the clustered index your primary key (yes, a primary key may have more than one column) or use the identity column as the primary key but be sure to make it a nonclustered index.
March 30, 2017 at 12:25 pm
Thanks for the response. Good point about data compression, I didn't think about that. Compression will slow down data inserts, but it will speed up data retrieval as it will reduce I/O.
March 30, 2017 at 12:39 pm
Again on data compression - eventually the data will have to be archived in a database on a different server. If both the operational and archived tables are compressed, how will SQL Server transfer the data from operational to archival? Will it decompress during the select and re-compress on the insert, or is it smart enough to know that both the source and destination tables on the insert/select are compressed and it doesn't decompress/re-compress? Again, the archive database is on a different server and connected through linked server. The archival is triggered from the archive server.
Any additional info is appreciated.
March 30, 2017 at 12:42 pm
N_Muller - Thursday, March 30, 2017 12:25 PMThanks for the response. Good point about data compression, I didn't think about that. Compression will slow down data inserts, but it will speed up data retrieval as it will reduce I/O.
Definitely on data retrieval. And honestly, in many cases for us, it has actually sped up inserts. I'm certain that's because of (1) reduced I/O for writes, and (2) our CPUs have spare capacity most of the time, so we don't see what for many could be a hit to CPU responsiveness.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 30, 2017 at 12:45 pm
N_Muller - Thursday, March 30, 2017 12:39 PMAgain on data compression - eventually the data will have to be archived in a database on a different server. If both the operational and archived tables are compressed, how will SQL Server transfer the data from operational to archival? Will it decompress during the select and re-compress on the insert, or is it smart enough to know that both the source and destination tables on the insert/select are compressed and it doesn't decompress/re-compress? Again, the archive database is on a different server and connected through linked server. The archival is triggered from the archive server.Any additional info is appreciated.
I believe SQL decompresses and re-compresses, as I believe it copies rows rather than pages, but I can't 100% say for sure.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply