Hi all,
I want to get the basics right, and I am in the middle of pulling 1 million rows approx into a table below.
For effective querying in the future can someone please advise the data types I should use for the below columns when creating initial table.
I have displayed an example of what the field looks like for each column. And the precision & scale as well please. There are 32 columns in total. Attached Excel for reference if filling that out helps.
Much appreciated guys! 🙂
Column Name
Example
Data Type
[Financial Year]
2018
[Financial year/period]
M1
[Business Day]
01.08.2017
[A vs B vs L vs F]
Actual
[Distribution]
Qsr
[D Group]
Distributor
[Sub Channel]
Distributor
[Key Account Number]
123456
[Key Account Name]
Piggy Bank
[Product Category]
Hair Care
[Product Sub-Category]
Shampoo
[Product Brand]
Head & Shoulders
[Promo Group]
Head & Shoulders 250 ml
[Product Number]
123456
[Product Name]
Head & Shoulders 12X250 ml
[Customer Number]
789456
[Customer Name]
Guiness
[Quantity in Litres]
3.024
[Quantity in Eaches]
6.048
[Volume]
3.023999929
[CostPrice]
23.28494
[Total Expense]
4.32439
[Non Commercial Expense]
2.93769
[Commercial Spend]
1.3867
[COGS]
16.4595
[Adjustments]
-1.09953
[Storage]
1.15773
[Distribution]
-0.14466
[Advertisement]
0
[Admin Costs]
1.54397
[NonOPEX]
-0.07118
[Foreign Transactions]
-0.69121
July 2, 2020 at 12:03 am
And the precision & scale as well please! Thanks
To be able to say what is the correct data type, one needs to know the business domain, which no one here does. What I can give you are general guidelines.
Dates - always store date-only value as date, never anything else.
Date and time - always use the datetime2 data type, never the old datetime. Make a conscious use of scale, since the default is 7 which is 100 ns. For values captured inside SQL Server with sysdatetime() use datetime2(3). For values from the outside second precision may be enough, that is datetime2(0).
Strings - For strings that are codes use varchar, since they typically are ASCII only. (And it would make sense to always use a binary collation for these for efficiency, but it may not be worth the hassle.) For strings that are names of things, use nvarchar, so that you can handle data in any script of any language. This is an expensive to fix as an afterthought. As for the length, only use MAX when there is a direct requirement. MAX types cost in performance. Determining whether you should have nvarchar(50), nvarchar(100) or whatever can be difficult. You will need to ask for the business rules, but don't expect to always get a good answer.
For numbers that are ids or counts of thing, use an integer data type. For ids you need to have a feeling for whether you need bigint. Selecting int and having to change later when you are close to hit the ceiling will be expensive! Since space equates to time in a large table, there is reason to use smallint and tinyint where applicable.
Note: I see that you have a ProductNumber in your list. This is a kind of data where you need to check the business rules to get the right data type. Just a single sample is not enough. They may be true numbers, but they be formatted or have characters in them. Or leading zeroes.
Non-integer numbers - this is the most difficult. decimal numbers are exact, but if you pick a to narrow precision and scale you may not be able to store data. If you use float, you evade that issue, but then you may get rounding errors which can disturb users. For things that are technical measurements, float is usually not a problem, whereas it can be problematic for financial data. If you go for decimal, you will need talk with the business side and see what the requirements they have. Expect blank stares when you ask, and sinister looks when they get an error because you in lieu of information picked something to narrow.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
July 6, 2020 at 5:56 pm
What Erland said
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply