Indexes on large tables

  • I have a large table (70M records), currently on a SQL2K development box that we're moving over to SQL2005, that doesn't have any indexes or even a PK - I know, I need them!

    Our system admin hasn't given us the ability to use the Query Tuning Wizard so I need to do by trial and error which is scary on such a large table and for someone who has little experience with SQLServer indexes (I do better with Teradata).

    2 Questions:

    1. I've seen some good articles on building indexes, and one in particular that says a Clustered Index is a must. With my 70M row table, there is one field - the Date - that has only 7 distinct values i.e. I keep only 7 running days of details. Does that make date a good candidate for a clustered index, so that at least everything for a day is together - I build daily summary tables off this big table?

    2. Recommendations for getting up to speed on when to use various kinds of indexes - books, references, etc. As I said, I use SQL2K for development & SQL2005 for production

    Appreciate any help you can provide!

    Regards

  • this date field would not be a good candidate for the clustered index, or any other index, because it is not very selective, with only 7 distinct dates each value can return 10 million rows, so the optimizer will probably table scan anyway.

    Best choice for the clustered index would be any column on which you would want your data ordered, or a column you do range searches on. Try a choose a small column as well as the clustered column is held in all non-clustered indexes

    See which columns are used as filters in your queries (appear in the where clause) or are used a joins to other tables, these are going to be the best candidates for indexes.

    Test in query analyzer using show estimated execution plan and set statistics io on to see the difference adding the index makes.

    see if you get your DBA to put a profiler trace thru the tuning wizard for you.

    ---------------------------------------------------------------------

  • Also....could be useful to post your table definition.

    Is the date column defined as a datetime and are you entering a time as well as a date, or just 00:00:00 for the time? If the time is held this would be a good candidate for the clustered index.

    ---------------------------------------------------------------------

  • Great suggestions. I'll review the majot queries against it and what columns they're joined on and/or are used in where clause.

    The 'date' column is actually a CHAR(10) 'yyyy-mm-dd'. I was thinking that since we usually join 1 day at a time, that having all the data ordered by date would help.

    I really don't understand indexes that well - obviously :unsure:

  • Even in this case creating a Non Cluster index on date (ASC or DESC) will help a lot. As you mentioned that you probably required one day to fetch the record.

  • Since you're doing daily summaries, I might suggest the date combined with a second column as the clustered index. Or I might not. It's hard to say without more info.

    Could you please post the table definition, some of the more frequently run queries on the table (including your daily summary) and a short description of what the data looks like (selectivity, typical values, etc)?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I can't recommend enough that you develop in 2005. Work you do on a 2000 box might not work in 2005 and there are a ton of things you can do in 2005 that you can't do in 2000. You need to be developing in 2005 too.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Gail - here's the table structure for the big 70M record table (TABLE1), which has 10M records per day and one of the summaries (TABLE2). Also, the query that summarizes TBL1 into TBL2.

    TABLE1

    COLUMN_NAMEDATA_TYPE

    ProgramDatechar

    half_hour_idtinyint

    vho_cdchar

    local_channelint

    titlevarchar

    languagechar

    HD_SD_Indchar

    Banchar

    Rcvr_Idchar

    Live_indchar

    Total_MMint

    TABLE2

    ProgramStartDatechar

    half_hour_idtinyint

    vho_cdchar

    local_channelint

    prog_titlevarchar

    prog_langchar

    HD_SD_Indchar

    Live_indchar

    Num_of_Receiversint

    Num_Of_Bansint

    TotalMMint

    INSERT INTO TABLE2 (ProgramStartDate,half_hour_id,vho_cd,local_channel,prog_title,prog_lang,HD_SD_Ind,live_ind,Num_of_Receivers, Num_Of_Bans, TotalMM)

    SELECT a.ProgramStartDate,a.half_hour_id,a.vho_cd,a.local_channel,COALESCE(a.prog_title,'unk'),a.prog_lang,a.HD_SD_Ind,a.live_ind, COUNT(DISTINCT a.rcvr_id), COUNT(DISTINCT a.ban), SUM(Total_MM)

    FROM TABLE1 AS a

    GROUP BY a.ProgramStartDate,a.half_hour_id,a.vho_cd,a.local_channel,COALESCE(a.prog_title,'unk'),a.prog_lang,a.HD_SD_Ind,a.live_ind

  • Are there any other queries running against table1? Inserts? Updates? Deletes?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • After summarizing the data, most of the queries run against the summary table, but here's one that uses both:

    DECLARE @CurrentDate smalldatetime

    SET @CurrentDate = (SELECT CAST(MAX(ProgramStartDate)AS SMALLDATETIME) FROM TABLE2)

    DECLARE @seq_no SMALLINT

    SET @seq_no = (SELECT MIN(seq_no) FROM dbo.IPTV_TimeSlot WHERE day_part = @DayPart

    AND dayOfWeek = DATEPART(dw, @currentDate))

    TRUNCATE TABLE TEMP_TOP10;

    INSERT INTO TEMP_TOP10(ProgramStartDate, prog_title, Day_Part, Total_Stb, Total_Ban)

    SELECT TOP 10 a.ProgramStartDate,a.prog_title,b.day_part,

    COUNT(DISTINCT a.rcvr_id) "Total_STB",COUNT(DISTINCT a.ban)"Total_ban"

    FROM TABLE1 AS a

    JOIN TABLE4 AS c

    ON a.vho_cd = c.vho_cd

    AND a.local_channel = c.local_channel

    JOIN TABLE6 AS b

    ON a.half_hour_id = b.half_hour_id

    AND DATEPART(dw,@currentDate) = b.dayOfWeek

    WHERE c.CHANNEL_GROUP1 <> 'MUSIC'

    AND b.day_part = @DayPart

    AND a.ProgramStartDate = CONVERT(CHAR(10),@CurrentDate,121)

    AND a.prog_title <> 'Paid Programs'

    GROUP BY a.ProgramStartDate, a.prog_title,b.day_part

  • A clustered index on ProgramDate and half_hour_id looks like it might be a good candidate for the clustered index. You use both in Where and Join clauses, and they'll probably help keep the table from fragmenting too badly on inserts, from what you're saying.

    If you then add an index with vho_cd, local_channel, and prog_title, that will give you what's in the Where and Join clauses for Table1 in that particular query. If that index uses the Include clause to add rcvr_id, and ban, it'll be a covering index for that particular query. (I don't think I missed anything.)

    Try those, see if they get you what you need.

    create clustered index CID_Table1 on dbo.Table1 (ProgramDate, half_hour_id)

    create index IDX_Table1_DailyQuery on dbo.Table1 (vho_cd, local_channel, prog_title)

    include (rcvr_id, ban)

    (You'll have to put in your real table name, of course. Assuming it's not actually Table1.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks! I'll it give it a try now.

    Much appreciated

  • Just keep in mind that indexes on large tables take up large amounts of disk space.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • YUp - just came up against that wall: Primary filegroup for my dB is full. :angry:

  • Can you expand the filegroup, or perhaps create a separate filegroup for the indexes?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply