Indexes on large tables

Question

Indexes on large tables

Paul McCurdy

Ten Centuries

Points: 1377
More actions
July 27, 2008 at 2:00 pm

#210584

I have a large table (70M records), currently on a SQL2K development box that we're moving over to SQL2005, that doesn't have any indexes or even a PK - I know, I need them!
Our system admin hasn't given us the ability to use the Query Tuning Wizard so I need to do by trial and error which is scary on such a large table and for someone who has little experience with SQLServer indexes (I do better with Teradata).
2 Questions:
1. I've seen some good articles on building indexes, and one in particular that says a Clustered Index is a must. With my 70M row table, there is one field - the Date - that has only 7 distinct values i.e. I keep only 7 running days of details. Does that make date a good candidate for a clustered index, so that at least everything for a day is together - I build daily summary tables off this big table?
2. Recommendations for getting up to speed on when to use various kinds of indexes - books, references, etc. As I said, I use SQL2K for development & SQL2005 for production
Appreciate any help you can provide!
Regards

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

george sibbald SSC Guru Points: 104210 More actions · Answer 1

this date field would not be a good candidate for the clustered index, or any other index, because it is not very selective, with only 7 distinct dates each value can return 10 million rows, so the optimizer will probably table scan anyway.

Best choice for the clustered index would be any column on which you would want your data ordered, or a column you do range searches on. Try a choose a small column as well as the clustered column is held in all non-clustered indexes

See which columns are used as filters in your queries (appear in the where clause) or are used a joins to other tables, these are going to be the best candidates for indexes.

Test in query analyzer using show estimated execution plan and set statistics io on to see the difference adding the index makes.

see if you get your DBA to put a profiler trace thru the tuning wizard for you.

---------------------------------------------------------------------

george sibbald SSC Guru Points: 104210 More actions · Answer 2

Also....could be useful to post your table definition.

Is the date column defined as a datetime and are you entering a time as well as a date, or just 00:00:00 for the time? If the time is held this would be a good candidate for the clustered index.

---------------------------------------------------------------------

Paul McCurdy Ten Centuries Points: 1377 More actions · Answer 3

Great suggestions. I'll review the majot queries against it and what columns they're joined on and/or are used in where clause.

The 'date' column is actually a CHAR(10) 'yyyy-mm-dd'. I was thinking that since we usually join 1 day at a time, that having all the data ordered by date would help.

I really don't understand indexes that well - obviously :unsure:

Harish-396803 SSC Journeyman Points: 83 More actions · Answer 4

Even in this case creating a Non Cluster index on date (ASC or DESC) will help a lot. As you mentioned that you probably required one day to fetch the record.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Since you're doing daily summaries, I might suggest the date combined with a second column as the clustered index. Or I might not. It's hard to say without more info.

Could you please post the table definition, some of the more frequently run queries on the table (including your daily summary) and a short description of what the data looks like (selectivity, typical values, etc)?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398911 More actions · Answer 6

I can't recommend enough that you develop in 2005. Work you do on a 2000 box might not work in 2005 and there are a ton of things you can do in 2005 that you can't do in 2000. You need to be developing in 2005 too.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Paul McCurdy Ten Centuries Points: 1377 More actions · Answer 7

Gail - here's the table structure for the big 70M record table (TABLE1), which has 10M records per day and one of the summaries (TABLE2). Also, the query that summarizes TBL1 into TBL2.

TABLE1

COLUMN_NAMEDATA_TYPE

ProgramDatechar

half_hour_idtinyint

vho_cdchar

local_channelint

titlevarchar

languagechar

HD_SD_Indchar

Banchar

Rcvr_Idchar

Live_indchar

Total_MMint

TABLE2

ProgramStartDatechar

half_hour_idtinyint

vho_cdchar

local_channelint

prog_titlevarchar

prog_langchar

HD_SD_Indchar

Live_indchar

Num_of_Receiversint

Num_Of_Bansint

TotalMMint

INSERT INTO TABLE2 (ProgramStartDate,half_hour_id,vho_cd,local_channel,prog_title,prog_lang,HD_SD_Ind,live_ind,Num_of_Receivers, Num_Of_Bans, TotalMM)

SELECT a.ProgramStartDate,a.half_hour_id,a.vho_cd,a.local_channel,COALESCE(a.prog_title,'unk'),a.prog_lang,a.HD_SD_Ind,a.live_ind, COUNT(DISTINCT a.rcvr_id), COUNT(DISTINCT a.ban), SUM(Total_MM)

FROM TABLE1 AS a

GROUP BY a.ProgramStartDate,a.half_hour_id,a.vho_cd,a.local_channel,COALESCE(a.prog_title,'unk'),a.prog_lang,a.HD_SD_Ind,a.live_ind

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Are there any other queries running against table1? Inserts? Updates? Deletes?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul McCurdy Ten Centuries Points: 1377 More actions · Answer 9

After summarizing the data, most of the queries run against the summary table, but here's one that uses both:

DECLARE @CurrentDate smalldatetime

SET @CurrentDate = (SELECT CAST(MAX(ProgramStartDate)AS SMALLDATETIME) FROM TABLE2)

DECLARE @seq_no SMALLINT

SET @seq_no = (SELECT MIN(seq_no) FROM dbo.IPTV_TimeSlot WHERE day_part = @DayPart

AND dayOfWeek = DATEPART(dw, @currentDate))

TRUNCATE TABLE TEMP_TOP10;

INSERT INTO TEMP_TOP10(ProgramStartDate, prog_title, Day_Part, Total_Stb, Total_Ban)

SELECT TOP 10 a.ProgramStartDate,a.prog_title,b.day_part,

COUNT(DISTINCT a.rcvr_id) "Total_STB",COUNT(DISTINCT a.ban)"Total_ban"

FROM TABLE1 AS a

JOIN TABLE4 AS c

ON a.vho_cd = c.vho_cd

AND a.local_channel = c.local_channel

JOIN TABLE6 AS b

ON a.half_hour_id = b.half_hour_id

AND DATEPART(dw,@currentDate) = b.dayOfWeek

WHERE c.CHANNEL_GROUP1 <> 'MUSIC'

AND b.day_part = @DayPart

AND a.ProgramStartDate = CONVERT(CHAR(10),@CurrentDate,121)

AND a.prog_title <> 'Paid Programs'

GROUP BY a.ProgramStartDate, a.prog_title,b.day_part

GSquared SSC Guru Points: 260824 More actions · Answer 10

A clustered index on ProgramDate and half_hour_id looks like it might be a good candidate for the clustered index. You use both in Where and Join clauses, and they'll probably help keep the table from fragmenting too badly on inserts, from what you're saying.

If you then add an index with vho_cd, local_channel, and prog_title, that will give you what's in the Where and Join clauses for Table1 in that particular query. If that index uses the Include clause to add rcvr_id, and ban, it'll be a covering index for that particular query. (I don't think I missed anything.)

Try those, see if they get you what you need.

create clustered index CID_Table1 on dbo.Table1 (ProgramDate, half_hour_id)

create index IDX_Table1_DailyQuery on dbo.Table1 (vho_cd, local_channel, prog_title)

include (rcvr_id, ban)

(You'll have to put in your real table name, of course. Assuming it's not actually Table1.)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Paul McCurdy Ten Centuries Points: 1377 More actions · Answer 11

Paul McCurdy

Ten Centuries

Points: 1377

July 28, 2008 at 1:59 pm

#849411

Thanks! I'll it give it a try now.

Much appreciated

GSquared SSC Guru Points: 260824 More actions · Answer 12

Just keep in mind that indexes on large tables take up large amounts of disk space.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Paul McCurdy Ten Centuries Points: 1377 More actions · Answer 13

YUp - just came up against that wall: Primary filegroup for my dB is full. :angry:

GSquared SSC Guru Points: 260824 More actions · Answer 14

Can you expand the filegroup, or perhaps create a separate filegroup for the indexes?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon