April 11, 2011 at 8:38 am
HI guys,
Am about create a table that will load data every hour almost close to 1 million rows and holds data for 30 days.
The existing table has no good storage model and hence looking for redesign.
Existing Table:
CREATE TABLE [dbo].[Parameter_hourly](
[DateID] [int] NULL, ---created as int column Ex: 04/03/2011 is converted as 812
[Hour] [tinyint] NULL,
[UID] [int] NULL,
[Lat] [float] NULL,
[Long] [float] NULL,
[UID_C] [int] NULL,
[UID_R] [int] NULL,
[Occurrence] [int] NULL,
[Count_None] [int] NULL,
[Count_Soft] [int] NULL )
The existing table hold data only for 5 days and 3 hours each day.
we also have daily tables based on hourly. The data is selected only for certain days and hours Ex: Mon - thurs 2PM - 6PM
New Table:
We are now expanding to 24 hours a day and 30 days.
The selection would be non contagious Ex: Mon - 2Pm-3Pm,wed - 1Pm-2Pm, Thurs - 4Pm-5Pm
what would be the best storage model for this kind of criteria
Any ideas
Thanks and regards
April 11, 2011 at 9:50 am
first: ALWAYS treat datetime as datetime . converting them to int, varchar or anything else just makes you have to re-convert them back again later, so your DateId should be DateCreated , with the datetime datatype instead. that way the column has the full time..hours ,seconds, milliseconds, etc. which you might need in the future but didn't plan for right now.
you were planning to pull the hour out separately...don't. it's in the CreatedDate column, and you can calculate it at any time.
if you really need that, then use a calculated column, based on the CreatedDate from above:
Hour AS DATEDIFF(hh,DATEADD(dd, DATEDIFF(dd,0,CreatedDate ), 0),CreatedDate ) PERSISTED
put the clustered index on that CreatedDate column, since the data is very date-centric. that will also make it easier for you to delete date ranges to trim the table .
Lowell
April 12, 2011 at 7:23 am
You should definitely look into 2 areas:
1. Table Partioning. This will let you import the data into a partion (by date), then combined with existing data.
2. Bulk-Insert - minimum logging will speed the insert.
Also, dropping indexes, import data sorted by clustered index, then adding the indexes back (clustered first) after the insert.
The table partioning will let you removed old data.
Thanks,
Thomas
Thomas LeBlanc, MVP Data Platform Consultant
April 12, 2011 at 10:07 pm
Thanks a lot for ur replies.
1. I have tried Partitions on Table based on date and had an Non clustered index on UID column the performance was high when compared to the non partitioned table.
But here is the problem am facing(case2)
Case1: When query runs on single date(very high performance) with range of hours
Case2: when Query on multiple dates(time taken is almost four times the regular table) with range of hours
Now how can i increase my query performance for selecting 2 or more date ranges.
Since many of the times we use date ranges
Thanks and regards
April 13, 2011 at 6:43 am
Can you post the Creaste Table for the tale(s) you are talking about?
And the 2 different queries?
Try to get a Show Execution Plan (Estimated and Actual) and look for Missing Index and largest %
Thomas
Thomas LeBlanc, MVP Data Platform Consultant
April 13, 2011 at 6:44 am
Do you have a clustered index on the table?
Thomas
Thomas LeBlanc, MVP Data Platform Consultant
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply