February 1, 2006 at 7:36 am
I work primarily with financial applications that deal with transactional data with an accounting period as part of the key (i.e. January 2006) with no regard to the day portion of the accounting period. Over the years I have used several schemes for the format of the accounting period:
1) Use datetime datatype:
AccountPerioddatetime
- Disadvantage with datetime in my mind is using unneeded space and needed to always force a '1' for day portion for housekeeping purposes (e.g. '1/1/2006').
2) Separate year and month columns: AccountYear unljmne
maddog
February 1, 2006 at 8:00 am
You could use an INT (4 bytes) and record February as 200602
You could use SMALLINT (2 bytes) and record February as 602. Of course your would face the a problem in 327 years.
February 1, 2006 at 8:12 am
Didn't finish my post before it magically fired off but you get the drift...anyway thanks for the reply.
I thought of the integer (February 2006 = 200602) approach but I was concerned that a lot of the practical queries will be seeking the year or month portions separately (e.g. SELECT * WHERE SUBSTRING(AccountPeriod, 1, 4) = 2006), but this may not be a big deal since the account period will always be part of an index and I would hope that the parsing script would take advantage of the value in the index and not the table data directly. Another concern was that I would need to always left pad the month portion so that the scale of the numerics was always the same for comparing two periods. Any thoughts?
Best Regards,
Dennis M.
maddog
February 1, 2006 at 8:33 am
I used to build a separate table for accounting periods. We always had weird ones that would sometimes end a day early or late. So I build a table that has period, start date, end date. Join any queries to this table to get the period.
For us the period was 200601, 200602, etc. up to 200613 because we sometimes had 13 periods.
You could do a smallint for the year and a tiny int for the period instead of one field as well.
February 1, 2006 at 10:33 am
Thanks all for your replies.
I ran some statistics, albeit with limited data, on my own using execution plans to see what might the difference be between defining an accounting period as an int datatype versus a char(6). The following is some DDL for the two versions and resulting stats:
Version 1 - AccountPeriod using int datatype
============================================
CREATE TABLE dbo.ForecastTest1
(
ForecastID int IDENTITY NOT NULL,
VersionID int NOT NULL DEFAULT 0,
AcctPeriod int NULL,
Amount decimal(14,2) NULL
)
GO
ALTER TABLE dbo.ForecastTest1
ADD CONSTRAINT PK_ForecastTest1 PRIMARY KEY (ForecastID)
CREATE INDEX XK_Version_AcctPeriod
ON dbo.ForecastTest1 (VersionID, AcctPeriod)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200601, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200602, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200603, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200604, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200605, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200606, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200607, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200608, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200609, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200610, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200611, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200612, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200701, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200702, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200703, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200704, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200705, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200706, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200707, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200708, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200709, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200710, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200711, 10)
INSERT INTO dbo.ForecastTest1 VALUES (1, 200712, 10)
QUERY 1:
SELECT * FROM dbo.ForecastTest1 WHERE LEFT(AcctPeriod, 4) = 2006
EXECUTION PLAN RESULTS 1:
- FILTER: Convert(Substring(Convert([ForecastTest1].[CFPeriod]), 1, 4)) = 2006 (Cost=0% I/O=0 CPU=0.000004)
- CLUSTERED INDEX SCAN: PK_ForecastTest1 (Cost=100% I/O=0.0375 CPU=0.000105)
QUERY 2:
SELECT * FROM dbo.ForecastTest1 WHERE AcctPeriod BETWEEN 200606 AND 200706
EXECUTION PLAN RESULTS 2:
- CLUSTERED INDEX SCAN: PK_ForecastTest1 (Cost=100% I/O=0.0375 CPU=0.000105)
Version 2 - AccountPeriod using char(6) datatype
============================================
CREATE TABLE dbo.ForecastTest2
(
ForecastID int IDENTITY NOT NULL,
VersionID int NOT NULL DEFAULT 0,
AcctPeriod char(6) NULL,
Amount decimal(14,2) NULL
)
GO
ALTER TABLE dbo.ForecastTest2
ADD CONSTRAINT PK_ForecastTest2 PRIMARY KEY (ForecastID)
CREATE INDEX XK_Version_AcctPeriod
ON dbo.ForecastTest2 (VersionID, AcctPeriod)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200601', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200602', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200603', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200604', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200605', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200606', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200607', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200608', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200609', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200610', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200611', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200612', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200701', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200702', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200703', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200704', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200705', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200706', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200707', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200708', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200709', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200710', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200711', 10)
INSERT INTO dbo.ForecastTest2 VALUES (1, '200712', 10)
QUERY 1:
SELECT * FROM dbo.ForecastTest2 WHERE SUBSTRING(AcctPeriod, 1, 4) = '2006'
EXECUTION PLAN RESULTS 1:
- CLUSTERED INDEX SCAN: PK_ForecastTest2 (Cost=100% I/O=0.0375 CPU=0.000105)
QUERY 2:
SELECT * FROM dbo.ForecastTest2 WHERE AcctPeriod BETWEEN '200606' AND '200706'
EXECUTION PLAN RESULTS 2:
- CLUSTERED INDEX SCAN: PK_ForecastTest2 (Cost=100% I/O=0.0375 CPU=0.000105)
SUMMARY:
The primary difference it seems between defining the period as int versus char datatype is that the int version must be filtered first to convert the int datatype back to a char to do a substring operation for the first query as shown in execution plan results #1. Defining the period as a char(6) allowed the direct usage of a substring operation which eliminated the filter step in the execution plan. The second query to locate periods using BETWEEN did not have any differences in the execution plan that I could see.
From what I can see I will most likely use the char(6) representation for the accounting period, but I am defintely open to the experiences and advice of others for potential pitfalls that I am not seeing yet.
Best Regards,
Dennis M.
maddog
February 1, 2006 at 12:06 pm
The INT method does not require any padding.
The user wants to query for a range of dates then
CREATE PROC dbo.MyQuery
@StartYear INT ,
@StartMonth INT ,
@EndYear INT ,
@EndMonth INT
AS
DECLARE @StartPeriod INT ,@EndPeriod INT
SET @StartPeriod = @StartYear * 100 + @StartMonth
SET @EndPeriod = @EndYear * 100 + @EndMonth
SELECT *
FROM dbo.YourTable
WHERE ReportPeriod BETWEEN @StartPeriod AND @EndPeriod
If you just want to query for a year then what you are really asking is ReportPeriod BETWEEN 200601 AND 200612
If your index is clustered then this range query will be very quick.
February 1, 2006 at 1:05 pm
I like the numeric handling you proposed versus the char padding - didn't think of that trick. I can see that the BETWEEN operator will be used heavily as it would work for nearly all types of queries including the year alone as you mentioned. I'm shifting my thinking towards using an int versus a char(6) thanks to your input David.
Best Regards,
Dennis M.
maddog
February 2, 2006 at 6:39 pm
I dunno... 6 bytes, 4 bytes, 8 bytes... for me, I'd still be tempted to use DateTime just because of all the things you can do with DateTime functions without any conversions required... and a modified version of Steve's idea for a "period" table would certainly prove useful. Just in case the original requester didn't know it, it's very easy to group by month and year using DateDiff(mm,0,somedate).
Then, there was that time when the SEC did an audit on my old company and they had to explain why their records weren't timestamped my year, month, day, hour, minute, and second.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 2, 2006 at 11:05 pm
I Haven't seen the DateDiff usage for grouping before, have typically used an approach like 'SELECT SUM(Amount) FROM Forecast WHERE DatePart(yy, acctperiod) BETWEEN @StartYear AND @EndYear' with the acctperiod column being a datetime. A similar usage of DatePart within a GROUP BY expression to get grouped results by month or year. It's late and I'm not thinking clearly, do you have an example of usage of the DateDiff for both filtering and grouping? Thanks Again.
Dennis M.
maddog
February 3, 2006 at 8:11 pm
Sure, Dennis... having a bit of a lack of sleep going on myself, here's an example using the Northwind database...
USE NorthWind
DECLARE @Year CHAR(4)
SET @Year = '1997'
SELECT DATENAME(mm,DATEADD(mm,d.MonthNum,0)) AS SpelledMonth,
DATENAME(yy,DATEADD(mm,d.MonthNum,0)) AS [Year],
d.FreightValue
FROM (
SELECT DATEDIFF(mm,0,ShippedDate) AS MonthNum,
SUM(Freight) AS FreightValue
FROM dbo.Orders
GROUP BY DATEDIFF(mm,0,ShippedDate)
)d
WHERE d.MonthNum >= DATEDIFF(mm,0,'01/01/'+@Year)
AND d.MonthNum < DATEDIFF(mm,0,'01/01/'+@Year)+12
ORDER BY d.MonthNum
--Jeff Moden
Change is inevitable... Change for the better is not.
February 4, 2006 at 3:48 am
If you do WHERE {Function(field)} = {value} then you will probably do a TABLE SCAN because the value isn't an indexed value.
Probably won't matter in a small-ish database but in a larger one this will punish your system.
February 4, 2006 at 9:05 am
Yep, true enough... I've made the same recommendation to others, myself. It's a great example of why you need a primary key, though...
-------------------------------------------------------------------------------------------------------------------------------
|--Compute Scalar(DEFINE: ([Expr1004]=datename(month, dateadd(month, [Expr1002], 'Jan 1 1900 12:00AM')), [Expr1005]=datename(year, dateadd(month, [Expr1002], 'Jan 1 1900 12:00AM'))))
|--Compute Scalar(DEFINE: ([Expr1003]=If ([Expr1012]=0) then NULL else [Expr1013]))
|--Stream Aggregate(GROUP BY: ([Expr1002]) DEFINE: ([Expr1012]=COUNT_BIG([Orders].[Freight]), [Expr1013]=SUM([Orders].[Freight])))
|--Sort(ORDER BY: ([Expr1002] ASC))
|--Filter(WHERE: ([Expr1002]>=datediff(month, 'Jan 1 1900 12:00AM', Convert('01/01/'+[@Year])) AND [Expr1002]<datediff(month, 'Jan 1 1900 12:00AM', Convert('01/01/'+[@Year]))+12))
|--Compute Scalar(DEFINE: ([Expr1002]=datediff(month, 'Jan 1 1900 12:00AM', [Orders].[ShippedDate])))
|--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[PK_Orders]))
(7 row(s) affected)
...It's not the coveted "Clustered Index Seek" but it's no table scan, either.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 4, 2006 at 11:15 pm
Thanks Gents for your responses. Ill take some time to see how the DateDiff approach might be used in my situation now that I think I know what's going on with it.
Oh and..Go Seahawks!
(I'm not a football junkie but I can't escape the fact that my wife's parents live a mile from Seahawk stadium in Seattle and drilled the Seahawk mantra into her from a young age. Just another excuse for me to drink more beer, as if I needed one.)
maddog
February 5, 2006 at 8:12 am
Jeff,
Have you seen the trick where you do a completely superfluous WHERE {ClusterIndex} BETWEEN {ClusterIndex}.MinValue AND {ClusterIndex}.MaxValue
In SQL2000 this can force the clustered index seek even though the clause does nothing in the practical sense. You do gain a performance boost.
In SQL2005 it still forces a clustered index seek but it adds in a filter step into the execution plan so you don't get any performance benefits. SQL2005 execution plans seem to have half the cost of their 2000 equivalents.
February 6, 2006 at 6:09 am
Absolutely... and that, in fact, your suggestion would definitely be the best way to solve the problem, but someone wanted me to demo using DATEDIFF as both a filter and a join. That was the reason for my original post
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply