Seek Table Design Advice\Best Practice: Accounting Period

Question

Post reply

Seek Table Design Advice\Best Practice: Accounting Period

maddogs

SSCrazy

Points: 2474
More actions
February 1, 2006 at 7:36 am

#96578

I work primarily with financial applications that deal with transactional data with an accounting period as part of the key (i.e. January 2006) with no regard to the day portion of the accounting period.   Over the years I have used several schemes for the format of the accounting period:
1) Use datetime datatype:
    AccountPerioddatetime
   - Disadvantage with datetime in my mind is using unneeded space and needed to always force a '1' for day portion for housekeeping purposes (e.g. '1/1/2006').
2) Separate year and month columns: AccountYear unljmne


maddog

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 76001 More actions · Answer 1

You could use an INT (4 bytes) and record February as 200602

You could use SMALLINT (2 bytes) and record February as 602. Of course your would face the a problem in 327 years.

LinkedIn Profile

maddogs SSCrazy Points: 2474 More actions · Answer 2

Didn't finish my post before it magically fired off but you get the drift...anyway thanks for the reply.

I thought of the integer (February 2006 = 200602) approach but I was concerned that a lot of the practical queries will be seeking the year or month portions separately (e.g. SELECT * WHERE SUBSTRING(AccountPeriod, 1, 4) = 2006), but this may not be a big deal since the account period will always be part of an index and I would hope that the parsing script would take advantage of the value in the index and not the table data directly. Another concern was that I would need to always left pad the month portion so that the scale of the numerics was always the same for comparing two periods. Any thoughts?

Best Regards,

Dennis M.

maddog

Steve Jones - SSC Editor SSC Guru Points: 736255 More actions · Answer 3

I used to build a separate table for accounting periods. We always had weird ones that would sometimes end a day early or late. So I build a table that has period, start date, end date. Join any queries to this table to get the period.

For us the period was 200601, 200602, etc. up to 200613 because we sometimes had 13 periods.

You could do a smallint for the year and a tiny int for the period instead of one field as well.

maddogs SSCrazy Points: 2474 More actions · Answer 4

Thanks all for your replies.

I ran some statistics, albeit with limited data, on my own using execution plans to see what might the difference be between defining an accounting period as an int datatype versus a char(6). The following is some DDL for the two versions and resulting stats:

Version 1 - AccountPeriod using int datatype

============================================

CREATE TABLE dbo.ForecastTest1

(

ForecastID int IDENTITY NOT NULL,

VersionID int NOT NULL DEFAULT 0,

AcctPeriod int NULL,

Amount decimal(14,2) NULL

)

GO

ALTER TABLE dbo.ForecastTest1

ADD CONSTRAINT PK_ForecastTest1 PRIMARY KEY (ForecastID)

CREATE INDEX XK_Version_AcctPeriod

ON dbo.ForecastTest1 (VersionID, AcctPeriod)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200601, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200602, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200603, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200604, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200605, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200606, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200607, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200608, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200609, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200610, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200611, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200612, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200701, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200702, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200703, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200704, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200705, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200706, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200707, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200708, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200709, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200710, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200711, 10)

INSERT INTO dbo.ForecastTest1 VALUES (1, 200712, 10)

QUERY 1:

SELECT * FROM dbo.ForecastTest1 WHERE LEFT(AcctPeriod, 4) = 2006

EXECUTION PLAN RESULTS 1:

- FILTER: Convert(Substring(Convert([ForecastTest1].[CFPeriod]), 1, 4)) = 2006 (Cost=0% I/O=0 CPU=0.000004)

- CLUSTERED INDEX SCAN: PK_ForecastTest1 (Cost=100% I/O=0.0375 CPU=0.000105)

QUERY 2:

SELECT * FROM dbo.ForecastTest1 WHERE AcctPeriod BETWEEN 200606 AND 200706

EXECUTION PLAN RESULTS 2:

- CLUSTERED INDEX SCAN: PK_ForecastTest1 (Cost=100% I/O=0.0375 CPU=0.000105)

Version 2 - AccountPeriod using char(6) datatype

============================================

CREATE TABLE dbo.ForecastTest2

(

ForecastID int IDENTITY NOT NULL,

VersionID int NOT NULL DEFAULT 0,

AcctPeriod char(6) NULL,

Amount decimal(14,2) NULL

)

GO

ALTER TABLE dbo.ForecastTest2

ADD CONSTRAINT PK_ForecastTest2 PRIMARY KEY (ForecastID)

CREATE INDEX XK_Version_AcctPeriod

ON dbo.ForecastTest2 (VersionID, AcctPeriod)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200601', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200602', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200603', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200604', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200605', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200606', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200607', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200608', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200609', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200610', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200611', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200612', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200701', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200702', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200703', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200704', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200705', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200706', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200707', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200708', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200709', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200710', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200711', 10)

INSERT INTO dbo.ForecastTest2 VALUES (1, '200712', 10)

QUERY 1:

SELECT * FROM dbo.ForecastTest2 WHERE SUBSTRING(AcctPeriod, 1, 4) = '2006'

EXECUTION PLAN RESULTS 1:

- CLUSTERED INDEX SCAN: PK_ForecastTest2 (Cost=100% I/O=0.0375 CPU=0.000105)

QUERY 2:

SELECT * FROM dbo.ForecastTest2 WHERE AcctPeriod BETWEEN '200606' AND '200706'

EXECUTION PLAN RESULTS 2:

- CLUSTERED INDEX SCAN: PK_ForecastTest2 (Cost=100% I/O=0.0375 CPU=0.000105)

SUMMARY:

The primary difference it seems between defining the period as int versus char datatype is that the int version must be filtered first to convert the int datatype back to a char to do a substring operation for the first query as shown in execution plan results #1. Defining the period as a char(6) allowed the direct usage of a substring operation which eliminated the filter step in the execution plan. The second query to locate periods using BETWEEN did not have any differences in the execution plan that I could see.

From what I can see I will most likely use the char(6) representation for the accounting period, but I am defintely open to the experiences and advice of others for potential pitfalls that I am not seeing yet.

Best Regards,

Dennis M.

maddog

David.Poole SSC Guru Points: 76001 More actions · Answer 5

The INT method does not require any padding.

The user wants to query for a range of dates then

CREATE PROC dbo.MyQuery

@StartYear INT ,

@StartMonth INT ,

@EndYear INT ,

@EndMonth INT

AS

DECLARE @StartPeriod INT ,@EndPeriod INT

SET @StartPeriod = @StartYear * 100 + @StartMonth

SET @EndPeriod = @EndYear * 100 + @EndMonth

SELECT *

FROM dbo.YourTable

WHERE ReportPeriod BETWEEN @StartPeriod AND @EndPeriod

If you just want to query for a year then what you are really asking is ReportPeriod BETWEEN 200601 AND 200612

If your index is clustered then this range query will be very quick.

LinkedIn Profile

maddogs SSCrazy Points: 2474 More actions · Answer 6

I like the numeric handling you proposed versus the char padding - didn't think of that trick. I can see that the BETWEEN operator will be used heavily as it would work for nearly all types of queries including the year alone as you mentioned. I'm shifting my thinking towards using an int versus a char(6) thanks to your input David.

Best Regards,

Dennis M.

maddog

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 7

I dunno... 6 bytes, 4 bytes, 8 bytes... for me, I'd still be tempted to use DateTime just because of all the things you can do with DateTime functions without any conversions required... and a modified version of Steve's idea for a "period" table would certainly prove useful. Just in case the original requester didn't know it, it's very easy to group by month and year using DateDiff(mm,0,somedate).

Then, there was that time when the SEC did an audit on my old company and they had to explain why their records weren't timestamped my year, month, day, hour, minute, and second.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

maddogs SSCrazy Points: 2474 More actions · Answer 8

I Haven't seen the DateDiff usage for grouping before, have typically used an approach like 'SELECT SUM(Amount) FROM Forecast WHERE DatePart(yy, acctperiod) BETWEEN @StartYear AND @EndYear' with the acctperiod column being a datetime. A similar usage of DatePart within a GROUP BY expression to get grouped results by month or year. It's late and I'm not thinking clearly, do you have an example of usage of the DateDiff for both filtering and grouping? Thanks Again.

Dennis M.

maddog

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 9

Sure, Dennis... having a bit of a lack of sleep going on myself, here's an example using the Northwind database...

USE NorthWind

DECLARE @Year CHAR(4)

SET @Year = '1997'

SELECT DATENAME(mm,DATEADD(mm,d.MonthNum,0)) AS SpelledMonth,

DATENAME(yy,DATEADD(mm,d.MonthNum,0)) AS [Year],

d.FreightValue

FROM (

SELECT DATEDIFF(mm,0,ShippedDate) AS MonthNum,

SUM(Freight) AS FreightValue

FROM dbo.Orders

GROUP BY DATEDIFF(mm,0,ShippedDate)

)d

WHERE d.MonthNum >= DATEDIFF(mm,0,'01/01/'+@Year)

AND d.MonthNum < DATEDIFF(mm,0,'01/01/'+@Year)+12

ORDER BY d.MonthNum

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David.Poole SSC Guru Points: 76001 More actions · Answer 10

If you do WHERE {Function(field)} = {value} then you will probably do a TABLE SCAN because the value isn't an indexed value.

Probably won't matter in a small-ish database but in a larger one this will punish your system.

LinkedIn Profile

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 11

Yep, true enough... I've made the same recommendation to others, myself. It's a great example of why you need a primary key, though...

-------------------------------------------------------------------------------------------------------------------------------

|--Compute Scalar(DEFINE: ([Expr1004]=datename(month, dateadd(month, [Expr1002], 'Jan 1 1900 12:00AM')), [Expr1005]=datename(year, dateadd(month, [Expr1002], 'Jan 1 1900 12:00AM'))))

|--Compute Scalar(DEFINE: ([Expr1003]=If ([Expr1012]=0) then NULL else [Expr1013]))

|--Stream Aggregate(GROUP BY: ([Expr1002]) DEFINE: ([Expr1012]=COUNT_BIG([Orders].[Freight]), [Expr1013]=SUM([Orders].[Freight])))

|--Sort(ORDER BY: ([Expr1002] ASC))

|--Filter(WHERE: ([Expr1002]>=datediff(month, 'Jan 1 1900 12:00AM', Convert('01/01/'+[@Year])) AND [Expr1002]<datediff(month, 'Jan 1 1900 12:00AM', Convert('01/01/'+[@Year]))+12))

|--Compute Scalar(DEFINE: ([Expr1002]=datediff(month, 'Jan 1 1900 12:00AM', [Orders].[ShippedDate])))

|--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[PK_Orders]))

(7 row(s) affected)

...It's not the coveted "Clustered Index Seek" but it's no table scan, either.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

maddogs SSCrazy Points: 2474 More actions · Answer 12

Thanks Gents for your responses. Ill take some time to see how the DateDiff approach might be used in my situation now that I think I know what's going on with it.

Oh and..Go Seahawks!

(I'm not a football junkie but I can't escape the fact that my wife's parents live a mile from Seahawk stadium in Seattle and drilled the Seahawk mantra into her from a young age. Just another excuse for me to drink more beer, as if I needed one.)

maddog

David.Poole SSC Guru Points: 76001 More actions · Answer 13

Jeff,

Have you seen the trick where you do a completely superfluous WHERE {ClusterIndex} BETWEEN {ClusterIndex}.MinValue AND {ClusterIndex}.MaxValue

In SQL2000 this can force the clustered index seek even though the clause does nothing in the practical sense. You do gain a performance boost.

In SQL2005 it still forces a clustered index seek but it adds in a filter step into the execution plan so you don't get any performance benefits. SQL2005 execution plans seem to have half the cost of their 2000 equivalents.

LinkedIn Profile

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 14

Absolutely... and that, in fact, your suggestion would definitely be the best way to solve the problem, but someone wanted me to demo using DATEDIFF as both a filter and a join. That was the reason for my original post

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)