November 1, 2017 at 9:04 am
I'm curious what everyone thinks is the best way to store data where the history of aggregated data for an ID is stored all in one table. Our current way of doing so is something like this create table ID_Dates
(
ID int null
,[2017_01] int null
,[2017_02] int null
,[2017_03] int null
,[2017_04] int null
,[2017_05] int null
)
insert into ID_Dates
values (2324, 1, 0, 4, 0, 1)
,(3593, 56, 3, 7, 21, 8)
,(5497, 0, 0, 0, 34, 0)
,(1568, 34, 5, 6, 22, 1)
,(9841, 6, 11, 4, 0, 34)
,(8737, 0, 99, 8, 3, 0)
,(4391, 2, 5, 18, 0, 3)
where each ID is unique and there is a column for each year/month combo containing the sum or count of the aggregate data. In this case a new column is added for every month that needs reported on so the table can grow quite wide if you need history back to 2000.
I thought about a couple different options but I'm not sure if they are better as far as performance and maintainability go.
Option 1 :create table ID_Dates2
(
ID int null
,[Year] int null
,[Month] int null
,Counts int null
)
insert into ID_Dates2
values (2324, 1, 2017, 1)
,(2324, 2, 2017, 0)
,(2324, 3, 2017, 4)
,(2324, 4, 2017, 0)
,(2324, 5, 2017, 1)
,(3593, 1, 2017, 56)
,(3593, 2, 2017, 3)
,(3593, 3, 2017, 7)
,(3593, 4, 2017, 21)
,(3593, 5, 2017, 8)
,(5497, 1, 2017, 0)
,(5497, 2, 2017, 0)
,(5497, 3, 2017, 0)
,(5497, 4, 2017, 34)
,(5497, 5, 2017, 0)
,(1568, 1, 2017, 34)
,(1568, 2, 2017, 5)
,(1568, 3, 2017, 6)
,(1568, 4, 2017, 22)
,(1568, 5, 2017, 1)
,(9841, 1, 2017, 6)
,(9841, 2, 2017, 11)
,(9841, 3, 2017, 4)
,(9841, 4, 2017, 0)
,(9841, 5, 2017, 34)
,(8737, 1, 2017, 0)
,(8737, 2, 2017, 99)
,(8737, 3, 2017, 8)
,(8737, 4, 2017, 3)
,(8737, 5, 2017, 0)
,(4391, 1, 2017, 2)
,(4391, 2, 2017, 5)
,(4391, 3, 2017, 18)
,(4391, 4, 2017, 0)
,(4391, 5, 2017, 3)
Have 12 entries per ID per year and use pivot/unpivot to display the data the same way we currently do. The downfalls I see here are that the IDs are no longer unique and it results in many more rows.
Option 2 :create table ID_Dates3
(
ID int null
,[Year] int null
,Jan int null
,Feb int null
,Mar int null
,Apr int null
,May int null
)
insert into ID_Dates3
values (2324, 2017, 1, 0, 4, 0, 1)
,(3593, 2017, 56, 3, 7, 21, 8)
,(5497, 2017, 0, 0, 0, 34, 0)
,(1568, 2017, 34, 5, 6, 22, 1)
,(9841, 2017, 6, 11, 4, 0, 34)
,(8737, 2017, 0, 99, 8, 3, 0)
,(4391, 2017, 2, 5, 18, 0, 3)
Break out the months and only have one entry per year per ID. Again the downfall I see is that over time the IDs won't be unique if an ID spans a couple of years.
Can anyone provide their input on which method is best or if there is a different method than what I'm thinking? Thanks in advance.
November 1, 2017 at 9:12 am
Use 3NF (3rd Normal Form). You should never be adding extra columns for repeating items, you should be adding extra rows. If you require a unique ID, then add one too. There's nothing wrong with having a composite key.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 1, 2017 at 10:25 am
I would change things a little. Your datatypes can be improved to take less space. Make the first three columns non-nullable so that you can create a PK on them. Don't forget to specify your desired schema.
CREATE TABLE dbo.Aggregates
(
Id INT NOT NULL,
Year SMALLINT NOT NULL,
Month TINYINT NOT NULL,
Counts INT NULL --Maybe smallint would work here too?
);
ALTER TABLE dbo.Aggregates
ADD CONSTRAINT PK_Aggregates
PRIMARY KEY CLUSTERED
(
Id,
Year,
Month
);
INSERT dbo.Aggregates
(
Id,
Month,
Year,
Counts
)
VALUES
(2324, 1, 2017, 1),
(2324, 2, 2017, 0),
(2324, 3, 2017, 4),
(2324, 4, 2017, 0),
(2324, 5, 2017, 1),
(3593, 1, 2017, 56),
(3593, 2, 2017, 3),
(3593, 3, 2017, 7),
(3593, 4, 2017, 21),
(3593, 5, 2017, 8),
(5497, 1, 2017, 0),
(5497, 2, 2017, 0),
(5497, 3, 2017, 0),
(5497, 4, 2017, 34),
(5497, 5, 2017, 0),
(1568, 1, 2017, 34),
(1568, 2, 2017, 5),
(1568, 3, 2017, 6),
(1568, 4, 2017, 22),
(1568, 5, 2017, 1),
(9841, 1, 2017, 6),
(9841, 2, 2017, 11),
(9841, 3, 2017, 4),
(9841, 4, 2017, 0),
(9841, 5, 2017, 34),
(8737, 1, 2017, 0),
(8737, 2, 2017, 99),
(8737, 3, 2017, 8),
(8737, 4, 2017, 3),
(8737, 5, 2017, 0),
(4391, 1, 2017, 2),
(4391, 2, 2017, 5),
(4391, 3, 2017, 18),
(4391, 4, 2017, 0),
(4391, 5, 2017, 3);
SELECT *
FROM dbo.Aggregates a;
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 1, 2017 at 11:53 am
Thanks for the feedback. It's been great help!
November 8, 2017 at 3:23 am
+1 for the suggestions so far.
One thing we might need to consider is whether this is an OLTP (transactional) or OLAP (reporting) database. For transactional systems (EPOS, ticket booking, Hospital administration etc) we definitely want 3NF. If we are in a reporting environment then we are likely to be looking for speed of aggregation of large datasets in which case some level of de-normalisation may be justified.
If this is an OLAP solution you may want to consider having a DATE dimension table and rather than use Year/Month in this table, use the ID from the DIM_DATE with a FK constraint.
It is clear from the original question and suggestions that you are not overly familiar with 3NF but that you have a good gut-feel that there was some code-smell in the proposed designs so kudos there. Get comfortable with 3NF (or even higher forms); It will be a career-saver for you and may help you prevent others from building poor design in the future
November 8, 2017 at 6:09 am
aaron.reese - Wednesday, November 8, 2017 3:23 AM+1 for the suggestions so far.One thing we might need to consider is whether this is an OLTP (transactional) or OLAP (reporting) database. For transactional systems (EPOS, ticket booking, Hospital administration etc) we definitely want 3NF. If we are in a reporting environment then we are likely to be looking for speed of aggregation of large datasets in which case some level of de-normalisation may be justified.
The use of a new column for every month will severely limit the types of reporting you can easily do and, if you write the code correctly, there should be little to no speed disadvantage in reporting from a properly normalized table.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 9, 2017 at 10:30 am
You don't need an ID in this table at all (at least on what's known so far); in fact, as so often is the case, it corrupts what would be the proper table design. Rid your mind of the egregious notion that "every table should have an identity column".
You should be able to cluster on ( year, month ). That looks unique. If it's not, either add a category to make it unique or just don't specify "UNIQUE" on the clus index creation.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 9, 2017 at 11:08 am
ScottPletcher - Thursday, November 9, 2017 10:30 AMYou don't need an ID in this table at all (at least on what's known so far); in fact, as so often is the case, it corrupts what would be the proper table design. Rid your mind of the egregious notion that "every table should have an identity column".You should be able to cluster on ( year, month ). That looks unique. If it's not, either add a category to make it unique or just don't specify "UNIQUE" on the clus index creation.
This is not an ID as you and I might know it. If you look back at the initial post, you'll see this:
... where the history of aggregated data for an ID is stored all in one table
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 9, 2017 at 3:16 pm
Phil Parkin - Thursday, November 9, 2017 11:08 AMScottPletcher - Thursday, November 9, 2017 10:30 AMYou don't need an ID in this table at all (at least on what's known so far); in fact, as so often is the case, it corrupts what would be the proper table design. Rid your mind of the egregious notion that "every table should have an identity column".You should be able to cluster on ( year, month ). That looks unique. If it's not, either add a category to make it unique or just don't specify "UNIQUE" on the clus index creation.
This is not an ID as you and I might know it. If you look back at the initial post, you'll see this:
... where the history of aggregated data for an ID is stored all in one table
Interesting. I saw this part in the initial q:
"where each ID is unique and there is a column for each year/month combo containing the sum or count of the aggregate data."
and therefore figured it was an identity or equivalent.
But, I admit, I'm not really sure what that ID is.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply