November 27, 2012 at 12:16 am
Question - Is it better to have more SARGABLE Clauses in a WHERE Statement OR less to find a record(s).
Is it better to let the 2008/2012 optimiser choose the best plan from the available indexes or should you put in the least amount of clauses necessary to do the job. I remember in the old days you had to order your clauses with the most selective first and so on and I know that is not the case now and the optimiser is clever enough to know what to do however is there a rule of thumb or is it purely a case of experimentation with speed tests etc.
An example - thinking of a horse racing database - and I want to find last years Grand National at Aintree, which is a National Hunt race not a Flat race
Should I use WHERE Clause A or B? knowing that I have lots of combinations of indexes on my tables and that the record could be found any number of ways without a table / index scan.
In these examples I am using words in some places where I would use numeric IDs for ease of explanation.
a) less
WHERE RaceDate BETWEEN @StartDate AND @EndDate AND RaceName='Grand National'
b) more
WHERE RaceDate BETWEEN @StartDate AND @EndDate AND RaceName='Grand National' AND CourseName = 'Aintree' AND RaceType = 'National Hunt'
The record can be identified by the first WHERE clause alone with an index of race titles AND date ranges BUT it can be narrowed down (in my head at least) quicker by thinking of just looking at National Hunt races at Aintree called the Grand National and then the date range.
By the way I am sure I used to know the answer to this but I've been very ill this year and all the meds I'm on are making my head very very fuzzy so I cannot recall for sure. I think a big chunk of my memory has been killed off by the National Health Service. Any help much appreciated.
November 27, 2012 at 12:53 am
Rob Reid-246754 (11/27/2012)
Should I use WHERE Clause A or B?
obviously both are totally different as they contain different no and type of column's filer so i am 100 % sure that you wont get records or result EQUAL whenever u use them (if the table is haing DML operation) so there is not point to use A instead of B atleast for performance perspective BUT yes u might get same data at any time when any filter or condition is not returing the data.
Rob Reid-246754 (11/27/2012)
I have lots of combinations of indexes on my tables and that the record could be found any number of ways without a table / index scan.
So what first thing a lot of indexes DOESNT mean good performance as they required intensive maintenece practice and also increase resource usage.Another thing sql optimizer will decide whether there will be seek or scan will to get benefit for particular query.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 27, 2012 at 12:54 am
Soon any expert will be here (i would expect gail ) to give you clear and better picture
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 27, 2012 at 1:40 am
Write your queries to fetch just the data that you need. Fetching data to the front end that you don't need and discard is a waste of time, resources, network bandwidth, etc.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 27, 2012 at 2:26 am
HI,
just to make sure I understand:
You could essentially retrieve that unique record by using just 1 or 2 columns in your WHERE clause.
E.g. SELECT EMP_ID, NAME FROM STAFF_LIST WHERE EMP_ID = 123
But the question is then: Would the performance of the query (i.e. the time it takes to return data) improve if you would add a bunch of columns to the WHERE clause to retrieve the same row, e.g.:
SELECT EMP_ID, NAME FROM STAFF_LIST WHERE EMP_ID = 123 AND FIRST_NAME = 'Eric' AND SURNAME = 'Mackin' AND EMP_ROLE = 'Analyst'
Although looking at the query execution plan and the associated stats would provide you with the answer, I would suggest that the more columns are referenced in your SELECT clause or WHERE clause, then the more data has to be transported over the network, the mroe data fetched from the disk, and the more CPU cycles to processes the extended clauses.
I think that the second query would be equivalent (performance/execution plan wise) to:
SELECT EMP_ID, NAME, FIRST_NAME, SURNAME, EMP_ROLE FROM STAFF_LIST WHERE EMP_ID = 123
When it comes to the number of indexes on your table(s) - don't necessarily believe that the more indexes you have, the more your performance will suffer - although there is surely a happy mid point somewhere.
As it goes, the right answer is "it depends" - is your table heavily written to and read relatively infrequently? then small number of indexes is what you want as every write to the table involves a write to the indexes too. Is your table read from very often but not written to very often? then load the table up with indexes.
HTH,
B
November 27, 2012 at 2:57 am
bleroy (11/27/2012)
HI,just to make sure I understand:
You could essentially retrieve that unique record by using just 1 or 2 columns in your WHERE clause.
E.g.
SELECT EMP_ID, NAME FROM STAFF_LIST WHERE EMP_ID = 123
But the question is then: Would the performance of the query (i.e. the time it takes to return data) improve if you would add a bunch of columns to the WHERE clause to retrieve the same row, e.g.:
SELECT EMP_ID, NAME FROM STAFF_LIST WHERE EMP_ID = 123 AND FIRST_NAME = 'Eric' AND SURNAME = 'Mackin' AND EMP_ROLE = 'Analyst'
I think that the second query would be equivalent (performance/execution plan wise) to:
SELECT EMP_ID, NAME, FIRST_NAME, SURNAME, EMP_ROLE FROM STAFF_LIST WHERE EMP_ID = 123
i dont think , it only depends if the any one Non clustrered indexe has all columns from SELECT part plus where clause , other wise there could be key or RID lookup
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 27, 2012 at 3:02 am
I know about having lots of indexes being heavy on the writes but this is a read intensive process and I want to get a report for stats as quickly as possible.
The SELECTS are all in a stored proc writing to a table in that DB - no passing queries from a website to the DB or results back. It's a nightly job to fill a stats table that is then heavily read after from the DB itself.
I know about indexes but as this stats table has a lot of columns there are an appropriate number of indexes to cover the queries that are most heavily used as read time needs to be fast and far outweighs the cost of updating an index when inserting records which happens once a night.
A common query is by racename and date range and other common queries are by race type (NH or Flat) or by Course + race date. Therefore indexes exist on those columns and the data can be found by any mix of these indexes.
Therefore the question is not whether I am passing data too much data over the network or using too many indexes but whether the optimiser can decide for itself that too many clauses have been used and ignore the extra ones.
E.G an index on racename could be very big as you can get long textual strings up to 200+ chars although I could just use that clause I am guessing searching a larger/wider index takes longer than searching narrower ones.
Therefore by passing in courseID and RaceType would that help narrow down the results quicker before having to resort to the racename index OR should I just choose the quickest route even if it means using the biggest index to get there.
I hope I am being clear the big C is screwing me up.
Thanks for your help
November 27, 2012 at 3:08 am
Bhuvnesh
i dont think , it only depends if the any one Non clustrered indexe has all columns from SELECT part plus where clause , other wise there could be key or RID lookup
I think that regardless if the index has the additional columns included or not (using INCLUDE or otherwise), additional columns will generate additional data that has to be piped through one way or another - it doesn't really matter if that additional data comes from the index or from the source table.
Again, the rule really is: the less columns, the better - and I think (but can't confirm just now) that it doesn't matter if these columns are referenced in the WHERE clause or the SELECT statement in terms of performance.
B
November 27, 2012 at 3:11 am
Rob Reid-246754 (11/27/2012)
whether the optimiser can decide for itself that too many clauses have been used and ignore the extra ones.
sql optimizer defintely choose only that index which is covering the most of the columns (which includes select + where + group by + order by columns ) and further more which index is less resource intensive too.
Rob Reid-246754 (11/27/2012)
Therefore by passing in courseID and RaceType would that help narrow down the results quicker before having to resort to the racename index OR should I just choose the quickest route even if it means using the biggest index to get there.
sql optimizer will choose the index not you (even if your not using any index HINT then) and for this if your post query + index definition then you can get more help.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 27, 2012 at 3:20 am
Rob
I'm really sorry to hear about your health problems, and I hope it all works out for you.
It would really help if you could post the DDL for your table, including indexes and constraints. My first thought was that if the race is uniquely identified by its name ("Grand National") then you will have a unique constraint on it and therefore an index as well (assuming the database is properly designed), and it would be sufficient to search on that. Then it occurred to me that your table might store individual instances of the race (Grand National 2012, Grand National 2011 etc). So I think that the answer is indeed "it depends", but we can help a lot more if we know how your table is structured.
John
November 27, 2012 at 4:38 am
Cheers John, genes are genes and my families are full of Big C all the way down.
Here is the main RACES table DDL with indexes.
I use the original Auto Index Recommendation Script (written some time back when 2005 came out with DMV's by some MS bods) to find missing indexes and I DONT always add an index due to one recommendation. I try and cover as many possibilities with as few a indexes as possible as very often you get a Create Missing Index Recommendation one day and a DROP INDEX the next!
However as I stated before this is a READ intensive system so the number of indexes is not a problem as there is only one insert per day and very few updates e.g if a race gets abandoned then that column might change once from FALSE to TRUE but apart from that there is no need for speed inserting into the table or indexes as it happens so rarely. However speed is required obtaining data and this table is the basis of a stats table that I am trying to populate. There is obviously a RUNNERS table that joins to RACES with a Foreign Key of RUNNERS.RaceFK=RACES.RacePK but that isn't needed for this example.
CREATE TABLE [dbo].[RACES](
[RacePK] [int] IDENTITY(1,1) NOT NULL,
[RaceName] [nvarchar](400) NOT NULL,
[CourseFK] [int] NOT NULL,
[RaceDatetime] [datetime] NOT NULL,
[RaceDate] [datetime] NOT NULL,
[RaceTime] [varchar](20) NOT NULL,
[IsHandicap] [bit] NOT NULL,
[Going] [varchar](100) NOT NULL,
[GoingCode] [varchar](50) NOT NULL,
[Distance] [varchar](50) NOT NULL,
[Yards] [int] NOT NULL,
[Surface] [varchar](50) NOT NULL,
[Runners] [int] NOT NULL,
[WinningTime] [varchar](20) NOT NULL,
[WonByFavourite] [bit] NOT NULL,
[WonByMaxForm] [bit] NOT NULL,
[WonByMaxForm3] [bit] NOT NULL,
[EachwayPosPayout] [smallint] NOT NULL,
[EachwayPrize] [decimal](10, 2) NOT NULL,
[Class] [varchar](50) NOT NULL,
[MyClass] [varchar](50) NULL,
[RaceType] [varchar](2) NOT NULL,
[Createdate] [datetime] NOT NULL,
[AllowInPlayBets] [bit] NOT NULL,
[Abandoned] [bit] NOT NULL,
[RunnerURL] [nvarchar](500) NOT NULL,
CONSTRAINT [PK_RACES] PRIMARY KEY CLUSTERED
(
[RacePK] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_CourseFK] DEFAULT ((0)) FOR [CourseFK]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_IsHandicap] DEFAULT ((0)) FOR [IsHandicap]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Going] DEFAULT ('') FOR [Going]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_GoingCode] DEFAULT ('') FOR [GoingCode]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Distance] DEFAULT ('') FOR [Distance]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Yards] DEFAULT ((0)) FOR [Yards]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Surface] DEFAULT ('') FOR [Surface]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Runners] DEFAULT ((0)) FOR [Runners]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_WinningTime] DEFAULT ('') FOR [WinningTime]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_WonByFavourite] DEFAULT ((0)) FOR [WonByFavourite]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_WonByMaxForm] DEFAULT ((0)) FOR [WonByMaxForm]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_WonByMaxForm3] DEFAULT ((0)) FOR [WonByMaxForm3]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Eachwaypospayout] DEFAULT ((0)) FOR [Eachwaypospayout]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_EachwayPrize] DEFAULT ((0)) FOR [EachwayPrize]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Class] DEFAULT ('') FOR [Class]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_MyClass] DEFAULT ('') FOR [MyClass]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_RaceType] DEFAULT ('NA') FOR [RaceType]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Createdate] DEFAULT (getdate()) FOR [Createdate]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_AllowInPlayBets] DEFAULT ((0)) FOR [AllowInPlayBets]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_Abandoned] DEFAULT ((0)) FOR [Abandoned]
GO
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [DF_RACES_RunnerURL] DEFAULT ('') FOR [RunnerURL]
GO
/* INDEXES */
/****** Object: Index [nclidx_RACES_Abandoned] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_Abandoned] ON [dbo].[RACES]
(
[Abandoned] ASC
)
INCLUDE ( [RacePK],
[CourseFK],
[RaceDatetime],
[GoingCode],
[Yards],
[Runners]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_Abandoned_RaceDatetime] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_Abandoned_RaceDatetime] ON [dbo].[RACES]
(
[Abandoned] ASC,
[RaceDatetime] ASC
)
INCLUDE ( [RacePK],
[CourseFK],
[Yards],
[MyClass],
[RaceType]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_Class_RacePK_Eachwaypospayout] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_Class_RacePK_Eachwaypospayout] ON [dbo].[RACES]
(
[Class] ASC
)
INCLUDE ( [RacePK],
[Eachwaypospayout]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_CourseFK_Abandoned] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_CourseFK_Abandoned] ON [dbo].[RACES]
(
[CourseFK] ASC,
[RaceTime] ASC,
[Abandoned] ASC
)
INCLUDE ( [RacePK],
[RaceDatetime],
[Yards],
[MyClass],
[RaceType],
[RaceName],
[Createdate],
[RunnerURL]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_CourseFK_Abandoned_RacePK_RaceDatetime] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_CourseFK_Abandoned_RacePK_RaceDatetime] ON [dbo].[RACES]
(
[CourseFK] ASC,
[Abandoned] ASC,
[RacePK] ASC,
[RaceDatetime] ASC
)
INCLUDE ( [Eachwaypospayout]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_CourseFK_Abandoned_Yards] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_CourseFK_Abandoned_Yards] ON [dbo].[RACES]
(
[CourseFK] ASC,
[Abandoned] ASC,
[Yards] ASC
)
INCLUDE ( [RacePK],
[RaceDatetime]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_CourseFK_RaceFK_EachwayPosPayout] Script Date: 11/27/2012 11:15:02 ******/
CREATE UNIQUE NONCLUSTERED INDEX [nclidx_RACES_CourseFK_RaceFK_EachwayPosPayout] ON [dbo].[RACES]
(
[CourseFK] ASC,
[RacePK] ASC,
[Eachwaypospayout] ASC
)
INCLUDE ( [GoingCode],
[Yards],
[Runners],
[EachwayPrize]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_CourseFK_Wonbyfavourite_ishandicap_going_goingcode] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_CourseFK_Wonbyfavourite_ishandicap_going_goingcode] ON [dbo].[RACES]
(
[CourseFK] ASC,
[WonByFavourite] ASC,
[IsHandicap] ASC,
[Going] ASC,
[GoingCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_GoingCode] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_GoingCode] ON [dbo].[RACES]
(
[GoingCode] ASC
)
INCLUDE ( [RacePK],
[CourseFK],
[Yards],
[Eachwaypospayout]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_GoingCode_RacePK_Eachwaypospayout] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_GoingCode_RacePK_Eachwaypospayout] ON [dbo].[RACES]
(
[GoingCode] ASC
)
INCLUDE ( [RacePK],
[Eachwaypospayout]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_MyClass] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_MyClass] ON [dbo].[RACES]
(
[MyClass] ASC
)
INCLUDE ( [RacePK],
[Eachwaypospayout],
[EachwayPrize],
[AllowInPlayBets],
[Abandoned]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_Racedatetime_CourseFK_RacePK] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_Racedatetime_CourseFK_RacePK] ON [dbo].[RACES]
(
[RaceDatetime] ASC,
[CourseFK] ASC,
[RaceDate] ASC,
[RaceTime] ASC,
[RacePK] ASC
)
INCLUDE ( [RaceName],
[IsHandicap],
[Going],
[GoingCode],
[Distance],
[Surface],
[Runners],
[WinningTime],
[WonByFavourite],
[Class],
[MyClass]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_Racedatetime_RacePK] Script Date: 11/27/2012 11:15:02 ******/
CREATE UNIQUE NONCLUSTERED INDEX [nclidx_RACES_Racedatetime_RacePK] ON [dbo].[RACES]
(
[RaceDatetime] ASC,
[RacePK] ASC
)
INCLUDE ( [RaceName],
[CourseFK],
[RaceTime],
[IsHandicap],
[GoingCode],
[Yards],
[Runners],
[Eachwaypospayout],
[EachwayPrize],
[Class],
[MyClass],
[RaceType]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_RaceName_CourseFK_RaceTime] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_RaceName_CourseFK_RaceTime] ON [dbo].[RACES]
(
[RaceName] ASC,
[CourseFK] ASC,
[RaceTime] ASC
)
INCLUDE ( [IsHandicap],
[Yards],
[Runners],
[AllowInPlayBets],
[Abandoned]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_RaceName_RaceTime] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_RaceName_RaceTime] ON [dbo].[RACES]
(
[RaceName] ASC,
[RaceTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_RacePK_EachwayPosPayout] Script Date: 11/27/2012 11:15:02 ******/
CREATE UNIQUE NONCLUSTERED INDEX [nclidx_RACES_RacePK_EachwayPosPayout] ON [dbo].[RACES]
(
[RacePK] ASC,
[Eachwaypospayout] ASC
)
INCLUDE ( [CourseFK],
[RaceDatetime],
[GoingCode],
[Runners],
[EachwayPrize]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_RaceType_Class] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_RaceType_Class] ON [dbo].[RACES]
(
[RaceType] ASC,
[Class] ASC
)
INCLUDE ( [RacePK]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_RaceType_Class_RacePK_Eachwaypospayout] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_RaceType_Class_RacePK_Eachwaypospayout] ON [dbo].[RACES]
(
[RaceType] ASC,
[Class] ASC
)
INCLUDE ( [RacePK],
[Eachwaypospayout]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_RaceType_MyClass] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_RaceType_MyClass] ON [dbo].[RACES]
(
[RaceType] ASC,
[MyClass] ASC
)
INCLUDE ( [RacePK]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_Runners_RaceType] Script Date: 11/27/2012 11:15:02 ******/
CREATE NONCLUSTERED INDEX [nclidx_RACES_Runners_RaceType] ON [dbo].[RACES]
(
[Runners] ASC,
[RaceType] ASC
)
INCLUDE ( [RacePK],
[Eachwaypospayout]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [nclidx_RACES_Yards_RacePK] Script Date: 11/27/2012 11:15:02 ******/
CREATE UNIQUE NONCLUSTERED INDEX [nclidx_RACES_Yards_RacePK] ON [dbo].[RACES]
(
[Yards] ASC,
[RacePK] ASC
)
INCLUDE ( [CourseFK],
[RaceDatetime],
[RaceDate],
[RaceTime],
[GoingCode]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [PK_RACES] Script Date: 11/27/2012 11:15:02 ******/
ALTER TABLE [dbo].[RACES] ADD CONSTRAINT [PK_RACES] PRIMARY KEY CLUSTERED
(
[RacePK] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
November 27, 2012 at 6:02 am
Wow - that's a lot of indexes, and some of them looking very similar to others. I think I'd look at finding out which of them aren't used, and get rid of them.
That's an aside, though. I think half your problem is that your database design isn't normalised. For instance, you have Going and GoingCode, and you have RaceDateTime and RaceDate and RaceTime. What is the natural key in this table? In other words, what combination of columns uniquely identifies a row. You need to put a unique constraint on those columns to stop the same race being entered twice in the table.
I appreciate that none of this directly answers your question, but you're likely to gain more by looking at the design of your database than you are by tweaking the WHERE clause of a particular query.
John
November 28, 2012 at 5:26 am
Hi
Pretty please don't start a debate about natural keys versus surrogate keys! It will go on forever as I've witnessed before.
I am using a surrogate key - RacePK (a unique identifier) and the Primary Key which is used to join to other tables (int to int) If I wanted to use natural keys it would be a combination of lots of columns.
The database is normalised but over time I have slowly denormalised certain fields to speed it up and please remember a race gets entered once into the table and then read thousands of times so its read not write intensive and needs to be fast.
Normalising and then de-normalising for read intensive systems is a very common technique for improving speed and the table didn't start out that way.
E.G I added Yards in as an integer once it slowed down converting the display distance value of "1m 4f 23y" to a yards value of 2663.
OR extracting the date and time from racedatetime OR wrapping the Going column in a function to return the GoingCode.
It is far quicker to calculate these values ON INPUT which happens once rather than on every OUTPUT which happens thousands of times a day.
The columns you are talking about are like this because its on a 2005 DB server so there are no date or time columns and I store the time of the race as its recorded/displayed e.g 15.10 NOT 15:10.
For example:
Racedatetime = 2010-03-07 14:30:00.000 (used for looking up ranges)
Racedate = 2010-03-15 00:00:00.000 (used for when I need to group by date alone ignoring the time and removing the need for functions to format it in the SELECT statement)
RaceTime = 14.30 (the time stored as it is displayed note the . instead of :) again to stop me having to wrap columns in functions on selects.
Going = "Good (Good to Soft in places;Chase;Good to Soft)" (used for display)
GoingCode = Good to Soft (used for quick analyses of the general type of going e.g Firm, Firm to Good, Good, Good to Soft, Soft, Soft to Heavy, Heavy)
The query was at the start of the thread and I cannot show you all the queries as there are literally thousands which is why there are lots of indexes.
Remember this table is used to find out stats such as but not limited to:
the number of races on a going code at course X or Y,
the no of times a jockey, trainer, owner, horse (in the joined race_runners table RACE_RUNNERS.RaceFk=RACES.RacePK ) has won over a yard distance of between X and Y (lot of combos)
or a Jockey/Trainer/Horse/Owner has won with a FormRating or FormRating3 (again in RACE_RUNNERS) >X or <Y (lots of queries)
how many times the favourite has won, lost, placed at course X with Jockey A or B etc.
There are thousands of stats to collect by horse, jockey, course, trainer, distance, going, no of runners and so on AND combinations of.
There really is too many queries for me to show you which is why I asked a general question with a general query at the beginning.
November 28, 2012 at 7:13 am
Rob
My intention was not to try to persuade you to use a natural key instead of a surrogate key as your primary key and/or clustered index. I do have opinions on that, but they're not relevant to this discussion. What I will say is that if you do use a surrogate key then you should have a unique index on the columns that make up your natural key in order to guarantee the integrity of your data. The reason I made that point in answer to your question is this: suppose that RaceType is your natural key. If you have a unique index on that column, then the optimizer will be aware of its uniqueness and therefore it won't need to look at any other columns or indexes.
I take your point about denormalising for performance. What is the source of the data in your table - does it come from a normalised system or is it entered by an operator? If a human being goes anywhere near the data on its way into your table, how do you guarantee, for example, that you don't end up with a 1760-yard race over 10 furlongs?
Finally, I stand by what I said about indexes. At least one of them is the same as another but with one extra column added. It may be that your queries are such that all your indexes are being used. If that's the case, fine, but if it were me I'd spend a bit of time seeing whether that is indeed the case.
John
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply