June 14, 2019 at 7:10 pm
Hi all,
I inherited a table that I am thinking to change some columns data types. Table has 100+ mln records and 96 columns and serves as a fact table in our data warehouse.
Thanks
June 14, 2019 at 7:29 pm
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 14, 2019 at 7:50 pm
Thanks, Scott.
For #3: They use as_of_month not just for display but a lot in where clauses and join conditions, like a.as_of_month = b.as_of_month
And about money data type, aren't it deprecated?
June 14, 2019 at 8:26 pm
Don't know if it's officially deprecated, but it has lots of issues, so, yeah, probably better to stick to decimal.
As to as_of_month, you'd be better off converting that to go against as_of_date, but, if you can't do that, then you might indeed have to store it separately so that you can index it separately.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 14, 2019 at 11:04 pm
You first need to find out what the code that uses the table looks like. DW tables are NOT usually "proper normalized tables". For example, removing the "month" column or converting it into a date will surely gum up a shedload of reporting code. Even something as seemingly innocuous as changing multiple "flag" columns to the BIT datatype could really put the screws to reporting code if they're doing aggregates on those columns because you can't do things like SUM() on a BIT column.
You also need to be really careful when changing things like FLOAT to something else. For example, if you change them to Decimal(p,s) without know how they are used, you could end up with some really unfortunate silent rounding problems that can be as bad as the imprecision of the FLOAT datatype and are sometimes worse. Please see the following MS documentation on that.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2019 at 8:57 pm
You can also use a computed column for as_of_month, there's no need to physically store it again.
as_of_date date NOT NULL,
as_of_month AS CONVERT(varchar(6), as_of_date, 112),
That column is fully usable by all queries and other code, it's just not physically stored, it's generated on the fly whenever code actually needs the value.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 19, 2019 at 6:31 pm
Money is deprecated, but never going away. I wouldn't worry about it.
June 19, 2019 at 7:14 pm
You can also use a computed column for as_of_month, there's no need to physically store it again. as_of_date date NOT NULL, as_of_month AS CONVERT(varchar(6), as_of_date, 112), That column is fully usable by all queries and other code, it's just not physically stored, it's generated on the fly whenever code actually needs the value.
You can, however, increase performance a fair bit if it is materialized as a PERSISTED computed column and use CHAR(6) instead of VARCHAR(6). It can also be indexed quite nicely.
Of course, I probably not do either. I'd use the tried and true >= and < method to do the lookups and do the grouping in the code.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2019 at 8:44 pm
ScottPletcher wrote:You can also use a computed column for as_of_month, there's no need to physically store it again. as_of_date date NOT NULL, as_of_month AS CONVERT(varchar(6), as_of_date, 112), That column is fully usable by all queries and other code, it's just not physically stored, it's generated on the fly whenever code actually needs the value.
You can, however, increase performance a fair bit if it is materialized as a PERSISTED computed column and use CHAR(6) instead of VARCHAR(6). It can also be indexed quite nicely. Of course, I probably not do either. I'd use the tried and true >= and < method to do the lookups and do the grouping in the code.
The easiest of them all...
SELECT CONVERT(int, (YEAR(GETDATE()) * 100) + MONTH(GETDATE()));
June 19, 2019 at 9:07 pm
Jeff Moden wrote:ScottPletcher wrote:You can also use a computed column for as_of_month, there's no need to physically store it again. as_of_date date NOT NULL, as_of_month AS CONVERT(varchar(6), as_of_date, 112), That column is fully usable by all queries and other code, it's just not physically stored, it's generated on the fly whenever code actually needs the value.
You can, however, increase performance a fair bit if it is materialized as a PERSISTED computed column and use CHAR(6) instead of VARCHAR(6). It can also be indexed quite nicely. Of course, I probably not do either. I'd use the tried and true >= and < method to do the lookups and do the grouping in the code.
The easiest of them all...
SELECT CONVERT(int, (YEAR(GETDATE()) * 100) + MONTH(GETDATE()));
Fewest number of bytes so far, as well. If you want to go real short, convert to the first of the month and store it as a DATE.
Just to be sure, though... I know it's just an example but have to say it outloud... GETDATE() will make an indeterminate formula that cannot be persisted.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2019 at 9:09 pm
As regards the original "1. Key column ...", it should at least be demoted to a secondary key (to make the clustering key unique).
This table almost certainly should be clustered first on as_of_date, particularly if it's (almost) always specified in WHERE conditions. That's one reason having as_of_month in the table is such a terrible idea.
Now, yes, the user should be able to conveniently enter whole months, such as 2019[-]04 and 2019[-]05. But the app should produce a query like this:
WHERE ((as_of_date >= '20190401' AND as_of_date < '201900501') OR (as_of_date >= '20190501' AND as_of_date < '20190601')
rather than this:
WHERE as_of_month IN ('201904', '201905')
The query doesn't have to directly reflect the user input.
With as_of_month present in the table, it naturally gets used, and thus it gets indexed, as noted above. Then, you query a month and it is way too slow (since it hit the "tipping point" and caused a table scan). So, to avoid that, you end up having to build multiple covering indexes headed by as_of_month, to cover all queries that use as_of_month. Lots of overhead that overall causes query delays and wastes resources vs clustering by as_of_date and querying by it.
Identity keys have their uses, but not for every table, or even the majority of them. Review the situation carefully before making a final decision on the oh-so-critical clustering key.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 20, 2019 at 4:53 am
Fewest number of bytes so far, as well. If you want to go real short, convert to the first of the month and store it as a DATE. Just to be sure, though... I know it's just an example but have to say it outloud... GETDATE() will make an indeterminate formula that cannot be persisted.
I do know that the old school DATETIME & SMALLDATETIME are fundamentally different in their byte patterns than the newer DATE, TIME & DATETIME2 datatypes... So it does stand to reason that some sort of implicit conversion would need to take place if you using the original date/time functions like GETDATE(). That said, it seems they have that mapping figured out fairly well... To the point that converting an predicate column from datetime to date doesn't prevent SARGability.
Out of curiosity, do you see the same behavior if you swap out GETDATE() for SYSDATETIME?
Also, if you want to get really crafty any really shave bytes, just copy the datetime paradigm... but use "months from zero" instead of days and save it as a smallint. That'll take you all the way out to 4630-08... or... if you don't need to go negative... use a binary(2) like an unsigned small int and go all the way to 7361-04
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply