August 16, 2022 at 2:01 pm
I recommend using another db -- let's call it the "history_db" -- to hold the prior months' data. As each month passes, you move its table to the history_db. Also, you do this transparently to the app, by using a synonym to point older tables to the history_db. That is, as far the application "knows", it still references all tables in the current db. I'd say use delayed transaction durability on the history_db to get max performance. There's no risk there, since you don't delete the original table in the current db until you've verified that the new copy is present and fully populated in the history_db.
Since the history_db will change only when a new monthly table gets added to it, you only need to do a full backup of the history_db then. At other times, you can just do differential backups, "just in case". Since nothing is changing on the history_db, the diff backups should be extremely fast.
You should analyze converting to columnstore table format when you move the table from current to history. If you decide not to go with columnstore, then definitely page compress the data and COMPRESS lob data. A trigger can be used to make this transparent to the app also. That is, the app does not need to "know" about or ever use COMPRESS/DECOMPRESS functions itself, that all happens automatically behind the scenes. You could even selectively decide which LOB column(s) to COMPRESS or not, if for some reason you don't want to compress a specific one(s), all without ever needing to make any change to the app code using the tables.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 16, 2022 at 2:32 pm
To summarize, do you believe that column store would be worthwhile not only for compression but for lookup performance?
If your tables are there for occasional "what happened?" queries, I think the performance would probably be adequate, but if you need many rapid queries, performance could be impacted by random distribution of the data required to assemble a row. I don't claim to understand it very well, but there are row groups that store around a million rows. Each row group is highly indexed, but the optimizer doesn't know what data are in what row groups, unless you partition the table.
Ideally you could order the index by column and column id so that the data needed for a specific row would be close together in row groups, and the optimizer would know where they are, but sorting isn't supported until SQL 2022 and even then it doesn't work very well (according to Brent Ozar https://www.brentozar.com/archive/2022/07/columnstore-indexes-are-finally-sorted-in-sql-server-2022/).
I would test it out to see if you get the performance you need. I have not used non clustered indexes on columnstore tables, maybe an index on row identifier and column id would help, but I don't how that would impact loading the table.
August 16, 2022 at 5:08 pm
What sort of overhead does it have on the server when running, and any idea of duration on a 2.1 billion ow table?
There have been a lot of ideas suggested... what is "IT" to you in this case?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 16, 2022 at 6:37 pm
I'm going with Page compression and simulated on my test server took around 1hr to complete.
Thanks for all suggestions and replies.
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply