Index Selection- Query Optimizer

  • Hi There,

    I have a table Vessel that has got a clustered index Vessel (Name,DwtHigh). I would expect the query below

    to make use of the clustered index. However, this query is making use of a nonclustered index.

    SELECT Vessel.Name [A14Name], Vessel.DwtHigh [A18DwtHigh]

    from vessel

    where (Vessel.Name like 'p%')

    and (Vessel.DwtHigh >= 0)

    and (Vessel.DwtHigh <= 60000)[/center] If i then execute a similar query, [center]select *
    from vessel
    where (Vessel.Name like 'p%')
    and (Vessel.DwtHigh >= 0)

    and (Vessel.DwtHigh <= 60000)[/center]
    the query optimizer starts using the clustered index. What could be the reason for this? In this case what would the
    query optimizer check first before coming up with the execution plan?

    [center]

    Table:

    CREATE TABLE [dbo].[Vessel](
    [ID] [dbo].[dmVessel] NOT NULL,
    [Name] [dbo].[dmName] NOT NULL,
    [SrchName] [dbo].[dmName] NOT NULL,
    [NameType] [dbo].[dmNameType] NOT NULL,
    [ExName1] [dbo].[dmName] NULL,
    [ExName1Srch] [dbo].[dmName] NULL,
    [LloydsNumber] [dbo].[dmLRNo] NULL,
    [MktStatus] [dbo].[dmMktStatus] NULL,
    [TradingStatus] [dbo].[dmTradingStatus] NOT NULL,
    [TradingCategory] [dbo].[dmTradingCategory] NOT NULL,
    [StatusDate] [datetime] NOT NULL,
    [VesselType] [dbo].[dmVesselTypeCode] NOT NULL,
    [DisplayType] [dbo].[dmVesselTypeCode] NOT NULL,
    [BPullDisp] [char](5) NULL,
    [SubTypeDisp] [dbo].[dmDisp] NULL,
    [BuiltMonth] [dbo].[dmDateMonth] NULL,
    [BuiltYear] [dbo].[dmDateYear] NULL,
    [YardCtry] [dbo].[dmCountryCode] NULL,
    [ShipBuilder] [dbo].[dmBldrCode] NULL,
    [Yard] [dbo].[dmYardCode] NULL,
    [DoubleHull] [dbo].[dmFlag] NULL,
    [DoubleBottom] [dbo].[dmFlag] NULL,
    [DoubleSide] [dbo].[dmFlag] NULL,
    [BulkHeads] [tinyint] NULL,
    [Commitment] [dbo].[dmOrgName] NULL,
    [CommitDate] [datetime] NULL,
    [CommitDirect] [dbo].[dmFlag] NULL,
    [CommitmadeBy] [dbo].[dmUsrCode] NULL,
    [CommitNote] [dbo].[dmRemark] NULL,
    [ParentOwn] [dbo].[dmOwnerCode] NULL,
    [ManageOwn] [dbo].[dmOwnerCode] NULL,
    [RegistOwn] [dbo].[dmOwnerCode] NULL,
    [FullDescr] [dbo].[dmBitFlag] NOT NULL,
    [TCUpdatedAt] [datetime] NULL,
    [TCUpdatedBy] [dbo].[dmUsrCode] NULL,
    [Flag] [dbo].[dmCountryCode] NULL,
    [DwtHigh] [int] NULL,
    [GTHigh] [int] NULL,
    [NTHigh] [int] NULL,
    [Ldt] [int] NULL,
    [LdtMetric] [dbo].[dmFlag] NULL,
    [Loa] [real] NULL,
    [Lbp] [real] NULL,
    [Beam] [real] NULL,
    [Tpc] [real] NULL,
    [DraftHigh] [real] NULL,
    [Engines] [tinyint] NULL,
    [EngineMake] [dbo].[dmName] NULL,
    [EngineDes] [dbo].[dmName] NULL,
    [EngBuilder] [dbo].[dmEngBldrCode] NULL,
    [McoPowerUnit] [dbo].[dmPwrUnitCode] NULL,
    [McoPower] [int] NULL,
    [McoRpm] [smallint] NULL,
    [Propellors] [tinyint] NULL,
    [BowThrusters] [dbo].[dmFlag] NULL,
    [SternThrusters] [dbo].[dmFlag] NULL,
    [SpdConsDisp] [dbo].[dmDisp] NULL,
    [SpeedMax] [real] NULL,
    [FuelGrade] [char](30) NULL,
    [AuxOilSeaCons] [real] NULL,
    [AuxOilSeaType] [dbo].[dmFuelTypeCode] NULL,
    [Class1] [dbo].[dmClassSocCode] NULL,
    [Class2] [dbo].[dmClassSocCode] NULL,
    [ClassNote] [dbo].[dmRemark] NULL,
    [SurveyDates] [varchar](60) NULL,
    [DryDockDates] [varchar](30) NULL,
    [Imo1] [dbo].[dmFlag] NULL,
    [Imo2] [dbo].[dmFlag] NULL,
    [Imo3] [dbo].[dmFlag] NULL,
    [Imo4] [dbo].[dmFlag] NULL,
    [Decks] [tinyint] NULL,
    [DeckArea] [int] NULL,
    [DeckAreaNote] [dbo].[dmRemark] NULL,
    [MainDkStrNote] [dbo].[dmRemark] NULL,
    [Osd] [dbo].[dmFlag] NULL,
    [Csd] [dbo].[dmFlag] NULL,
    [Sd] [dbo].[dmFlag] NULL,
    [Grain] [int] NULL,
    [Bale] [int] NULL,
    [Reefer] [int] NULL,
    [Holds] [tinyint] NULL,
    [Hatches] [tinyint] NULL,
    [HaDims] [dbo].[dmDisp] NULL,
    [HaCover] [dbo].[dmRemark] NULL,
    [OreStr] [dbo].[dmFlag] NULL,
    [HvyStr] [dbo].[dmFlag] NULL,
    [Teu] [int] NULL,
    [ReefPoints] [int] NULL,
    [ReefPtsType] [dbo].[dmReefPtsType] NULL,
    [Homog14T] [int] NULL,
    [Feu] [int] NULL,
    [TeuSplit] [int] NULL,
    [Cellular] [dbo].[dmFlag] NULL,
    [Trailers] [smallint] NULL,
    [LaneMeters] [int] NULL,
    [Cars] [int] NULL,
    [SternDoor] [dbo].[dmFlag] NULL,
    [BowDoor] [dbo].[dmFlag] NULL,
    [SideDoor] [dbo].[dmFlag] NULL,
    [QtrRamp] [dbo].[dmFlag] NULL,
    [SternRamp] [dbo].[dmFlag] NULL,
    [BowRamp] [dbo].[dmFlag] NULL,
    [SideRamp] [dbo].[dmFlag] NULL,
    [SlewRamp] [dbo].[dmFlag] NULL,
    [IntRamps] [dbo].[dmFlag] NULL,
    [IntLifts] [dbo].[dmFlag] NULL,
    [DeckHgts] [dbo].[dmRemark] NULL,
    [GearDisp] [dbo].[dmDisp] NULL,
    [GearMaxSwl] [real] NULL,
    [Gearless] [dbo].[dmFlag] NULL,
    [PassTotal] [int] NULL,
    [Berths] [int] NULL,
    [Cabins] [int] NULL,
    [Crew] [int] NULL,
    [Drivers] [int] NULL,
    [ReefTempLow] [real] NULL,
    [ReefTempHigh] [real] NULL,
    [ReefTempCelcius] [dbo].[dmFlag] NULL,
    [ReefTempNote] [dbo].[dmRemark] NULL,
    [Ventilation] [dbo].[dmBitFlag] NOT NULL,
    [VentType] [dbo].[dmVentTypeCode] NULL,
    [AirChanges] [tinyint] NULL,
    [AirCirculations] [smallint] NULL,
    [TempZones] [tinyint] NULL,
    [Compartments] [tinyint] NULL,
    [PalletCap] [int] NULL,
    [LiquidCubic] [int] NULL,
    [SSteelCubic] [int] NULL,
    [Tanks] [tinyint] NULL,
    [Grades] [tinyint] NULL,
    [Coated] [dbo].[dmFlag] NULL,
    [CoatedNote] [dbo].[dmRemark] NULL,
    [Coiled] [dbo].[dmFlag] NULL,
    [CoiledNote] [dbo].[dmRemark] NULL,
    [Igs] [dbo].[dmFlag] NULL,
    [Cow] [dbo].[dmFlag] NULL,
    [Cls] [dbo].[dmFlag] NULL,
    [Sbt] [dbo].[dmFlag] NULL,
    [PumpsDisp] [dbo].[dmDisp] NULL,
    [BowLoadDisch] [dbo].[dmFlag] NULL,
    [SternLoadDisch] [dbo].[dmFlag] NULL,
    [GasCubic] [int] NULL,
    [GasTemp] [smallint] NULL,
    [GasPressure] [real] NULL,
    [Ethylene] [dbo].[dmFlag] NULL,
    [Vcm] [dbo].[dmFlag] NULL,
    [Ammonia] [dbo].[dmFlag] NULL,
    [Chemical] [dbo].[dmFlag] NULL,
    [SpotPosition] [dbo].[dmPosition] NULL,
    [SpotZone] [dbo].[dmZoneCode] NULL,
    [SpotOpenDisp] [dbo].[dmOpenDateText] NULL,
    [SpotOpenFrom] [datetime] NULL,
    [SpotOpenTo] [datetime] NULL,
    [SpotEntryDate] [datetime] NULL,
    [SpotEnteredBy] [dbo].[dmUsrCode] NULL,
    [SpotSource] [dbo].[dmOrgName] NULL,
    [LongPosition] [dbo].[dmPosition] NULL,
    [LongZone] [dbo].[dmZoneCode] NULL,
    [LongOpenDisp] [dbo].[dmOpenDateText] NULL,
    [LongOpenFrom] [datetime] NULL,
    [LongOpenTo] [datetime] NULL,
    [LongEntryDate] [datetime] NULL,
    [LongEnteredBy] [dbo].[dmUsrCode] NULL,
    [LongSource] [dbo].[dmOrgName] NULL,
    [PositionNote] [dbo].[dmRemark] NULL,
    [Contact1] [dbo].[dmOrgName] NULL,
    [Contact2] [dbo].[dmOrgName] NULL,
    [Contact3] [dbo].[dmOrgName] NULL,
    [Direct] [dbo].[dmFlag] NULL
    ) ON [PRIMARY]
    ALTER TABLE [dbo].[Vessel] ADD [DispOwner] [varchar](60) NULL
    ALTER TABLE [dbo].[Vessel] ADD [DispOwnerContact] [dbo].[dmOrgName] NULL
    ALTER TABLE [dbo].[Vessel] ADD [ContactToDisp] [tinyint] NULL
    ALTER TABLE [dbo].[Vessel] ADD [Note] [dbo].[dmMemo] NULL
    ALTER TABLE [dbo].[Vessel] ADD [FeatureToDisp] [dbo].[dmDisp] NULL
    ALTER TABLE [dbo].[Vessel] ADD [DisplayContact] [dbo].[dmOrgName] NULL
    ALTER TABLE [dbo].[Vessel] ADD [SnpDisplayType] [dbo].[dmVesselTypeCode] NOT NULL
    ALTER TABLE [dbo].[Vessel] ADD [Checked] [tinyint] NULL
    ALTER TABLE [dbo].[Vessel] ADD [CheckedBy] [dbo].[dmUsrCode] NULL
    ALTER TABLE [dbo].[Vessel] ADD [CheckedAt] [datetime] NULL
    ALTER TABLE [dbo].[Vessel] ADD [SizeClass] [varchar](100) NULL
    ALTER TABLE [dbo].[Vessel] ADD [BallastCapacity] [varchar](100) NULL
    ALTER TABLE [dbo].[Vessel] ADD [CO2Fitted] [tinyint] NULL
    ALTER TABLE [dbo].[Vessel] ADD CONSTRAINT [pkVessel] PRIMARY KEY NONCLUSTERED

    Indexing:

    iskVesselCommitmentnonclustered, unique located on PRIMARYCommitment, ID
    iskVesselDispContactnonclustered located on PRIMARYDisplayContact
    iskVesselDwtclustered located on PRIMARYDwtHigh, Name
    iskVesselDwtSnpDisplayTypenonclustered, unique located on PRIMARYDwtHigh, SnpDisplayType, BuiltYear, Flag, ID
    iskVesselExname1nonclustered located on PRIMARYExName1Srch
    iskVesselIDYardShipbuildernonclustered, unique located on PRIMARYID, YardCtry, ShipBuilder, Yard
    iskVesselLloydsNumbernonclustered, unique located on PRIMARYLloydsNumber
    iskVesselLongSourcenonclustered, unique located on PRIMARYID, YardCtry, ShipBuilder, Yard
    iskVesselOwnersnonclustered located on PRIMARYParentOwn, ManageOwn, RegistOwn
    iskVesselShipbuildernonclustered, unique located on PRIMARYShipBuilder, ID, Name, TradingCategory, VesselType, Yard
    iskVesselSource2nonclustered located on PRIMARYContact2
    iskVesselSource3nonclustered located on PRIMARYContact3
    iskVesselSource4nonclustered located on PRIMARYDispOwner
    iskVesselSource5nonclustered located on PRIMARYDispOwnerContact
    iskVesselSourcesnonclustered located on PRIMARYContact1, Contact2, Contact3, DispOwner, DispOwnerContact
    iskVesselSpotSourcenonclustered, unique located on PRIMARYSpotSource, ID
    iskVesselSrchNamenonclustered located on PRIMARYSrchName
    iskVesselSuzTestnonclustered, unique located on PRIMARYID, DwtHigh, VesselType, BuiltYear, TradingCategory, Name, Yard
    iskVesselWellyTestnonclustered located on PRIMARYName
    iskVesselZonenonclustered located on PRIMARYSpotZone, SpotOpenFrom, SpotOpenTo, VesselType, TradingCategory, DisplayType, DwtHigh, Flag, ID
    pkVesselnonclustered, unique, primary key located on PRIMARYID

    CREATE CLUSTERED INDEX iskVesselDwt ON Vessel(DwtHigh, DisplayType, Name, BuiltYear, Flag)
    GO
    CREATE INDEX iskVesselCbm ON Vessel(GasCubic, DisplayType, BuiltYear, Flag, ID)
    GO
    CREATE INDEX iskVesselCommitment ON Vessel(Commitment, ID)
    GO
    CREATE INDEX iskVesselDwtSnpDisplayType ON Vessel(DwtHigh, SnpDisplayType, BuiltYear, Flag, ID)
    GO
    CREATE INDEX iskVesselIDYardShipbuilder ON Vessel(ID, YardCtry, ShipBuilder, Yard)
    GO
    CREATE INDEX iskVesselLongSource ON Vessel(ID, YardCtry, ShipBuilder, Yard)
    GO
    CREATE INDEX iskVesselShipbuilder ON Vessel(ShipBuilder, ID, Name, TradingCategory, VesselType, Yard)
    GO
    CREATE INDEX iskVesselSpotSource ON Vessel(SpotSource, ID)
    GO
    [/center]

    Thank you.

  • I would guess that for the first query the index is covering and in the second (because it's a select *) it's not. With a covering index just the NC needs to be read. If the index is not covering, SQL has to read the NC index and then do lookups back to the cluster. That's expensive for lots of rows (and by lots, I mean more than about 1% of the table)

    See - http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/

    p.s. why is half your post centred?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi

    Thanks for the response. I was trying to organize my code so that it can be easier to read, unfortunately the center tag did not work out as i wanted. I am sorry about that.

    With regards to your response, i do agree that in the first query the index is covering. This being the case i would expect the clustered index to satisfy the query more efficiently by itself instead of doing a further scan via the NC since the two columns in the select are both part of the clustered index. Also considering the fact that we are doing range searches, i wouldn't expect the optimizer to use a NC in the first query?

    I might be confusing the way the optimizer works. May you please clarify a little bit.

  • wellygee (5/27/2009)


    This being the case i would expect the clustered index to satisfy the query more efficiently by itself instead of doing a further scan via the NC since the two columns in the select are both part of the clustered index.

    Seeking on a nonclustered index is more efficient and faster than seeking or scanning on a cluster (since a NC index is usually smaller). SQL will use NC indexes if it can and will fall back onto the cluster if it can't.

    I guess it's using the iskVesselDwtSnpDisplayType index?

    With the inequalities it'll only be able to seek on a single column anyway, it will have to read through all the rows matched by that and filter out rows that don't match the second condition. There are two indexes that it could use for that since nothing has name as the leading column - the cluster and iskVesselDwtSnpDisplayType. The second one will be smaller, since clustered indexes have the data page at the leaf level, so using the NC to seek and then filtering out the name will take less IOs than using the cluster to do the same.

    That said, you have a very poor clustered index there. Guidelines for a cluster are narrow, unchanging, unique and ever-increasing. The clustering key is part of all nonclustered indexes, so a large clustering key firstly makes the clustered index very deep, making it less optimal, then it increases the size of all the NC indexes, making them deeper and less optimal.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This definitely has given me more insight. I will review my indexes and seeork. I had a misconception. I thought that the clustered index is always the first choice for range searches.

    You have rightfully mentioned that it's using the NC index iskVesselDwtSnpDisplayType.

    One more question. Does it make a difference creating an index as a UNIQUE index when it already has the primary key (ID in this case) as a part of it?

  • As far as I'm aware, yes. If an index is unique, mark it as unique.

    You may find this useful - http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hey wellygee,

    For a range-search (range scan/seek), the lowest-cost path is often over the clustered index - so you are right about that. However, if a non-clustered index exists which covers the query completely, the optimizer will choose that - since the entire query can be resolved simply from the NC index, which is also logically (and perhaps physically) ordered on the index keys.

    If that sounds counter-intuitive, consider that the NC index row size will generally be significantly less than a full row. Such an index will be denser than the clustered index (more rows fit on an 8KB page) so fewer logical I/O operations will be required to satisfy the query. Remember that the clustered index key is physically in the leaf level of the data - so the same number of clustered keys fit on an 8KB page as do full rows.

    To take a very rough example, if a row of data occupies 2000 bytes, and the NC index is by contrast just 20 bytes wide, you can see that 100 times as many index entries fit on a page compared to the data rows.

    Of course, if the NC index does not cover the query the situation quickly changes.

    Generally, looking up data in the row using the clustering key embedded in the NC index, results in random I/O, rather than the sequential I/O generally produced by a full or partial scan of the clustered or nonclustered index alone. In addition, the query must traverse the levels of the clustered index to locate the row, which can add several logical I/Os to each 'lookup'.

    The costs of using the NC index in this way are such that the optimizer will very quickly turn to a clustered index scan rather than NC index seek + key lookup, in an effort to reduce the overall cost of the query (which is dominated by I/O costs).

    I should also mention that there are other considerations when choosing a clustered index aside from range-scanning. Minimizing fragmentation of the data rows and choosing a key which never changes are frequently more important considerations. As usual, it depends on your data, your priorities, and how the data will be accessed.

    As far as an UNIQUE index is concerned, marking the index as UNIQUE does add overhead to operations which modify indexed data (since SQL Server has to check uniqueness as well as the usual tasks), but it does give the optimizer more information about the data, so it may choose a better plan. In addition, some optimizations are only available for UNIQUE indexes, so that may help too. If the fact that the index is UNIQUE should be enforced, or the information is likely to be of use to the optimizer, you should consider making it explicitly unique. Of course, if the data is read-only or rarely changes, the argument swings further in favour.

    Paul

  • Paul White (5/27/2009)


    However, if a non-clustered index exists which covers the query completely, the optimizer will choose that - since the entire query can be resolved simply from the NC index, which is also logically (and perhaps physically) ordered on the index keys.

    Even if the NC is not covering it may be used, providing the inequality affects only a very small portion of the table and the optimiser can figure that out at compile time.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/27/2009)


    Paul White (5/27/2009)


    However, if a non-clustered index exists which covers the query completely, the optimizer will choose that - since the entire query can be resolved simply from the NC index, which is also logically (and perhaps physically) ordered on the index keys.

    Even if the NC is not covering it may be used, providing the inequality affects only a very small portion of the table and the optimiser can figure that out at compile time.

    Yes, that was what I was getting at with reference to the key lookups, thanks for making that clearer.

    I should say that there were a good number of generalizations in my post - necessary to avoid posting a complete book 🙂

  • This subject needs a good book.

    There are pieces in lots of places, but nothing that I've found that tells the complete indexing story in one place.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Paul for the detailed explanation. Like Gail said, a book with everything in one place will be good.

    I will apply the recommended changes and update you.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply