April 3, 2003 at 4:06 am
Anyone have any ideas/best practices on partitioning large volumes of timeline based data using a DATETIME field.
I am estimating more than 40 million rows in a single table over a period of 8 months. I have read elsewhere that 40 million rows is considered the theoretical maximum number of rows that should be present in a single table. Hence my thinking of splitting the data up based on DATETIME. My idea was to split it on the month number. ie. into quarters. So in effect I would have data_q1, data_q2, data_q3 and data_q4. Then bringing it all together in a partitioned view called data which would mask the base tables.
Now my problem so far involves defining the CHECK constraint for the base tables. I don't see why DATEPART(MONTH, date) IN (1, 2, 3) is not valid. BOL says only certain operators are permitted for an updatable partitioned view and they don't mention anything about deterministic functions.
Anyone have any ideas on this that could help me?
April 3, 2003 at 7:56 am
Can you post your script?
April 3, 2003 at 9:35 am
Here is an example I quickly threw together to illustrate the problem:
CREATE TABLE dbo.data_q1(
[date] DATETIMENOT NULL CHECK (MONTH([date]) IN (1, 2, 3)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q1 PRIMARY KEY CLUSTERED ([date], refid)
)
GO
CREATE TABLE dbo.data_q2(
[date] DATETIMENOT NULL CHECK (MONTH([date]) IN (4, 5, 6)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q2 PRIMARY KEY CLUSTERED ([date], refid)
)
GO
CREATE TABLE dbo.data_q3(
[date] DATETIMENOT NULL CHECK (MONTH([date]) IN (7, 8, 9)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q3 PRIMARY KEY CLUSTERED ([date], refid)
)
GO
CREATE TABLE dbo.data_q4(
[date] DATETIMENOT NULL CHECK (MONTH([date]) IN (10, 11, 12)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q4 PRIMARY KEY CLUSTERED ([date], refid)
)
GO
CREATE VIEW dbo.data WITH SCHEMABINDING
AS
SELECT[date],
refid,
data
FROMdbo.data_q1
UNION ALL
SELECT[date],
refid,
data
FROMdbo.data_q2
UNION ALL
SELECT[date],
refid,
data
FROMdbo.data_q3
UNION ALL
SELECT[date],
refid,
data
FROMdbo.data_q4
GO
INSERT INTO dbo.data VALUES ('01 Jan 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Feb 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Mar 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Apr 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 May 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Jun 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Jul 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Aug 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Sep 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Oct 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Nov 2003 08:30:00', 0, 0)
INSERT INTO dbo.data VALUES ('01 Dec 2003 08:30:00', 0, 0)
April 3, 2003 at 12:25 pm
you're using the IN operator in your check constrainst which apparently is not allowed for partitioned views. from BOL:
"The CHECK constraints can only use these operators: BETWEEN, AND, OR, <, <=, >, >=, ="
if you want to partition by quarter, how bout writing your constraints something like:
CHECK([DATE] >= '01-JAN-2003' and [DATE] < '01-APR-2003')
etc.
April 3, 2003 at 2:02 pm
The problem in my mind is not that I am using the IN operator, but rather the MONTH() or DATEPART() functions. It seems that functions are not eligible in this case.
Sure, I could make it compare to actual datetime values, but that implies that I need to define a good few years worth to prevent the system falling over.
What I suppose I need is a more mathematical approach that satisfies SQL Server's internal desires. 🙂 You got any more ideas?
April 3, 2003 at 2:25 pm
I modified your script and add a new partitioning column for partition data into different tables. It seems not the operator issue because I used "IN" in check constraint. I would suspect problem comes from datetime datatype.
create TABLE dbo.data_q1(
[date] DATETIMENOT NULL ,
logmonth int CHECK (logmonth IN (1, 2, 3)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q1 PRIMARY KEY CLUSTERED (logmonth, refid ))
GO
create TABLE dbo.data_q2(
[date] DATETIMENOT NULL ,
logmonth int CHECK (logmonth IN (4, 5, 6)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q2 PRIMARY KEY CLUSTERED (logmonth, refid ))
GO
create TABLE dbo.data_q3(
[date] DATETIMENOT NULL ,
logmonth int CHECK (logmonth IN (7,8,9)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q3 PRIMARY KEY CLUSTERED (logmonth, refid ))
GO
create TABLE dbo.data_q4(
[date] DATETIMENOT NULL ,
logmonth int CHECK (logmonth IN (10, 11, 12)),
refid INTEGER NOT NULL,
data INTEGERNOT NULL
CONSTRAINT PK_data_q4 PRIMARY KEY CLUSTERED (logmonth, refid ))
GO
create VIEW dbo.data
AS
SELECT*
FROMdbo.data_q1
UNION ALL
SELECT*
FROMdbo.data_q2
UNION ALL
SELECT*
FROMdbo.data_q3
UNION ALL
SELECT*
FROMdbo.data_q4
GO
INSERT INTO dbo.data VALUES ('01 Jan 2003 08:30:00', DATEPART(mm, '01 Jan 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Feb 2003 08:30:00', DATEPART(mm, '01 Feb 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Mar 2003 08:30:00', DATEPART(mm, '01 Mar 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Apr 2003 08:30:00', DATEPART(mm, '01 Apr 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 May 2003 08:30:00', DATEPART(mm, '01 May 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Jun 2003 08:30:00', DATEPART(mm, '01 Jun 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Jul 2003 08:30:00', DATEPART(mm, '01 Jul 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Aug 2003 08:30:00', DATEPART(mm, '01 Aug 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Sep 2003 08:30:00', DATEPART(mm, '01 Sep 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Oct 2003 08:30:00', DATEPART(mm, '01 Oct 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Nov 2003 08:30:00', DATEPART(mm, '01 Nov 2003 08:30:00'),0, 0)
INSERT INTO dbo.data VALUES ('01 Dec 2003 08:30:00', DATEPART(mm, '01 Dec 2003 08:30:00'),0, 0)
April 3, 2003 at 4:09 pm
i guess i misunderstood what you were after. i assumed that your date was a date entered, date created, or some such, where the range of allowable values is bounded. in such cases you can create the required partitions before they're needed (either manually or preferably through a recurring scheduled job that creates and/or drops tables and then recompiles the view(s)).
sounds like you just wanna hash dates into one of four partitions; though it seems to me this only delays the problem you set out to avoid (hitting 40M rows/table).
and, yeah, you're right 🙂 it appears that IN is perfectly acceptable in updatable partitioned views. this worked fine:
create table t1(
col1 int not null primary key check([col1] in (1)))
create table t2(
col1 int not null primary key check([col1] in (2)))
create view v
as
select col1
from t1
union all
select col1
from t2
insert v values(1)
insert v values(2)
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply