Design a table

  • HI guys,

    Am about create a table that will load data every hour almost close to 1 million rows and holds data for 30 days.

    The existing table has no good storage model and hence looking for redesign.

    Existing Table:

    CREATE TABLE [dbo].[Parameter_hourly](

    [DateID] [int] NULL, ---created as int column Ex: 04/03/2011 is converted as 812

    [Hour] [tinyint] NULL,

    [UID] [int] NULL,

    [Lat] [float] NULL,

    [Long] [float] NULL,

    [UID_C] [int] NULL,

    [UID_R] [int] NULL,

    [Occurrence] [int] NULL,

    [Count_None] [int] NULL,

    [Count_Soft] [int] NULL )

    The existing table hold data only for 5 days and 3 hours each day.

    we also have daily tables based on hourly. The data is selected only for certain days and hours Ex: Mon - thurs 2PM - 6PM

    New Table:

    We are now expanding to 24 hours a day and 30 days.

    The selection would be non contagious Ex: Mon - 2Pm-3Pm,wed - 1Pm-2Pm, Thurs - 4Pm-5Pm

    what would be the best storage model for this kind of criteria

    Any ideas

    Thanks and regards

  • first: ALWAYS treat datetime as datetime . converting them to int, varchar or anything else just makes you have to re-convert them back again later, so your DateId should be DateCreated , with the datetime datatype instead. that way the column has the full time..hours ,seconds, milliseconds, etc. which you might need in the future but didn't plan for right now.

    you were planning to pull the hour out separately...don't. it's in the CreatedDate column, and you can calculate it at any time.

    if you really need that, then use a calculated column, based on the CreatedDate from above:

    Hour AS DATEDIFF(hh,DATEADD(dd, DATEDIFF(dd,0,CreatedDate ), 0),CreatedDate ) PERSISTED

    put the clustered index on that CreatedDate column, since the data is very date-centric. that will also make it easier for you to delete date ranges to trim the table .

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You should definitely look into 2 areas:

    1. Table Partioning. This will let you import the data into a partion (by date), then combined with existing data.

    2. Bulk-Insert - minimum logging will speed the insert.

    Also, dropping indexes, import data sorted by clustered index, then adding the indexes back (clustered first) after the insert.

    The table partioning will let you removed old data.

    Thanks,

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

  • Thanks a lot for ur replies.

    1. I have tried Partitions on Table based on date and had an Non clustered index on UID column the performance was high when compared to the non partitioned table.

    But here is the problem am facing(case2)

    Case1: When query runs on single date(very high performance) with range of hours

    Case2: when Query on multiple dates(time taken is almost four times the regular table) with range of hours

    Now how can i increase my query performance for selecting 2 or more date ranges.

    Since many of the times we use date ranges

    Thanks and regards

  • Can you post the Creaste Table for the tale(s) you are talking about?

    And the 2 different queries?

    Try to get a Show Execution Plan (Estimated and Actual) and look for Missing Index and largest %

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

  • Do you have a clustered index on the table?

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply