July 27, 2008 at 2:00 pm
I have a large table (70M records), currently on a SQL2K development box that we're moving over to SQL2005, that doesn't have any indexes or even a PK - I know, I need them!
Our system admin hasn't given us the ability to use the Query Tuning Wizard so I need to do by trial and error which is scary on such a large table and for someone who has little experience with SQLServer indexes (I do better with Teradata).
2 Questions:
1. I've seen some good articles on building indexes, and one in particular that says a Clustered Index is a must. With my 70M row table, there is one field - the Date - that has only 7 distinct values i.e. I keep only 7 running days of details. Does that make date a good candidate for a clustered index, so that at least everything for a day is together - I build daily summary tables off this big table?
2. Recommendations for getting up to speed on when to use various kinds of indexes - books, references, etc. As I said, I use SQL2K for development & SQL2005 for production
Appreciate any help you can provide!
Regards
July 27, 2008 at 3:00 pm
this date field would not be a good candidate for the clustered index, or any other index, because it is not very selective, with only 7 distinct dates each value can return 10 million rows, so the optimizer will probably table scan anyway.
Best choice for the clustered index would be any column on which you would want your data ordered, or a column you do range searches on. Try a choose a small column as well as the clustered column is held in all non-clustered indexes
See which columns are used as filters in your queries (appear in the where clause) or are used a joins to other tables, these are going to be the best candidates for indexes.
Test in query analyzer using show estimated execution plan and set statistics io on to see the difference adding the index makes.
see if you get your DBA to put a profiler trace thru the tuning wizard for you.
---------------------------------------------------------------------
July 27, 2008 at 3:19 pm
Also....could be useful to post your table definition.
Is the date column defined as a datetime and are you entering a time as well as a date, or just 00:00:00 for the time? If the time is held this would be a good candidate for the clustered index.
---------------------------------------------------------------------
July 27, 2008 at 6:23 pm
Great suggestions. I'll review the majot queries against it and what columns they're joined on and/or are used in where clause.
The 'date' column is actually a CHAR(10) 'yyyy-mm-dd'. I was thinking that since we usually join 1 day at a time, that having all the data ordered by date would help.
I really don't understand indexes that well - obviously :unsure:
July 28, 2008 at 3:12 am
Even in this case creating a Non Cluster index on date (ASC or DESC) will help a lot. As you mentioned that you probably required one day to fetch the record.
July 28, 2008 at 3:29 am
Since you're doing daily summaries, I might suggest the date combined with a second column as the clustered index. Or I might not. It's hard to say without more info.
Could you please post the table definition, some of the more frequently run queries on the table (including your daily summary) and a short description of what the data looks like (selectivity, typical values, etc)?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 28, 2008 at 7:00 am
I can't recommend enough that you develop in 2005. Work you do on a 2000 box might not work in 2005 and there are a ton of things you can do in 2005 that you can't do in 2000. You need to be developing in 2005 too.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 28, 2008 at 8:56 am
Gail - here's the table structure for the big 70M record table (TABLE1), which has 10M records per day and one of the summaries (TABLE2). Also, the query that summarizes TBL1 into TBL2.
TABLE1
COLUMN_NAMEDATA_TYPE
ProgramDatechar
half_hour_idtinyint
vho_cdchar
local_channelint
titlevarchar
languagechar
HD_SD_Indchar
Banchar
Rcvr_Idchar
Live_indchar
Total_MMint
TABLE2
ProgramStartDatechar
half_hour_idtinyint
vho_cdchar
local_channelint
prog_titlevarchar
prog_langchar
HD_SD_Indchar
Live_indchar
Num_of_Receiversint
Num_Of_Bansint
TotalMMint
INSERT INTO TABLE2 (ProgramStartDate,half_hour_id,vho_cd,local_channel,prog_title,prog_lang,HD_SD_Ind,live_ind,Num_of_Receivers, Num_Of_Bans, TotalMM)
SELECT a.ProgramStartDate,a.half_hour_id,a.vho_cd,a.local_channel,COALESCE(a.prog_title,'unk'),a.prog_lang,a.HD_SD_Ind,a.live_ind, COUNT(DISTINCT a.rcvr_id), COUNT(DISTINCT a.ban), SUM(Total_MM)
FROM TABLE1 AS a
GROUP BY a.ProgramStartDate,a.half_hour_id,a.vho_cd,a.local_channel,COALESCE(a.prog_title,'unk'),a.prog_lang,a.HD_SD_Ind,a.live_ind
July 28, 2008 at 10:01 am
Are there any other queries running against table1? Inserts? Updates? Deletes?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 28, 2008 at 11:48 am
After summarizing the data, most of the queries run against the summary table, but here's one that uses both:
DECLARE @CurrentDate smalldatetime
SET @CurrentDate = (SELECT CAST(MAX(ProgramStartDate)AS SMALLDATETIME) FROM TABLE2)
DECLARE @seq_no SMALLINT
SET @seq_no = (SELECT MIN(seq_no) FROM dbo.IPTV_TimeSlot WHERE day_part = @DayPart
AND dayOfWeek = DATEPART(dw, @currentDate))
TRUNCATE TABLE TEMP_TOP10;
INSERT INTO TEMP_TOP10(ProgramStartDate, prog_title, Day_Part, Total_Stb, Total_Ban)
SELECT TOP 10 a.ProgramStartDate,a.prog_title,b.day_part,
COUNT(DISTINCT a.rcvr_id) "Total_STB",COUNT(DISTINCT a.ban)"Total_ban"
FROM TABLE1 AS a
JOIN TABLE4 AS c
ON a.vho_cd = c.vho_cd
AND a.local_channel = c.local_channel
JOIN TABLE6 AS b
ON a.half_hour_id = b.half_hour_id
AND DATEPART(dw,@currentDate) = b.dayOfWeek
WHERE c.CHANNEL_GROUP1 <> 'MUSIC'
AND b.day_part = @DayPart
AND a.ProgramStartDate = CONVERT(CHAR(10),@CurrentDate,121)
AND a.prog_title <> 'Paid Programs'
GROUP BY a.ProgramStartDate, a.prog_title,b.day_part
July 28, 2008 at 12:15 pm
A clustered index on ProgramDate and half_hour_id looks like it might be a good candidate for the clustered index. You use both in Where and Join clauses, and they'll probably help keep the table from fragmenting too badly on inserts, from what you're saying.
If you then add an index with vho_cd, local_channel, and prog_title, that will give you what's in the Where and Join clauses for Table1 in that particular query. If that index uses the Include clause to add rcvr_id, and ban, it'll be a covering index for that particular query. (I don't think I missed anything.)
Try those, see if they get you what you need.
create clustered index CID_Table1 on dbo.Table1 (ProgramDate, half_hour_id)
create index IDX_Table1_DailyQuery on dbo.Table1 (vho_cd, local_channel, prog_title)
include (rcvr_id, ban)
(You'll have to put in your real table name, of course. Assuming it's not actually Table1.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 28, 2008 at 1:59 pm
Thanks! I'll it give it a try now.
Much appreciated
July 28, 2008 at 2:10 pm
Just keep in mind that indexes on large tables take up large amounts of disk space.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 28, 2008 at 2:20 pm
YUp - just came up against that wall: Primary filegroup for my dB is full. :angry:
July 28, 2008 at 2:31 pm
Can you expand the filegroup, or perhaps create a separate filegroup for the indexes?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply