Partial Clustered Index Seek then Merge Join...

Question

Post reply

Partial Clustered Index Seek then Merge Join...

J-F Bergeron

SSCrazy Eights

Points: 8593
More actions
May 11, 2010 at 7:19 am

#234836

Hi all,
I've been trying to figure this one out for quite a while, but the query was always much too complex to post here. Now I have a lighter query that may explain my problem a bit better, see the plan attached.
Now, I'm curious, when I look at the plan, I join GICSPFVendorToMfr to GICSPFProduct on the index, and it does an index seek on the product table, then a key lookup, to go look in the clustered index of the product which is fine.
The problem comes in the GICSPFProductToZone, where there are as much rows as in the product table (Approx 700K). It does do a seek on the Clustered index, so at first, I thought, Great, the query works correctly, but when you look more closely, it does a seek on the 3 "Company fields", which are generic for a PC, but not on the ProductID!!!. The productID is the same as in the product table, and it has just done a SEEK on the Product Table, so why not try to reduce the number of reads in the ProductToZone table from 700K to about 36K... This seems to cause delay in the query, and is giving me a hard time.
The query itself, takes about 1.5 secs CPU to execute, which isn't bad, but I commented out a big part of the query, I need it to take max 1.5 secs... since the client wants to access his products at the glimpse of an eye... Can anyone explain why it only does a partial Seek?
Thanks In advance,
For the sake of clarity, here are the main indexes of the problem:
This table holds the "Vendor To Mfr Combination"
CREATE TABLE [dbo].[GICSPFVendorToMFR](
[CompanyID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DivisionID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DepartmentID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[MfrVendorID] [int] NOT NULL IDENTITY (1,1),
[VendorID] [int] NOT NULL,
[MFRID] [int] NULL,
[LockedBy] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LockTS] [datetime] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[GICSPFVendorToMFR] ADD CONSTRAINT [PK_GICSPFVendorToMFR_CoDivDepMfrVendorID] PRIMARY KEY
CLUSTERED
(
[CompanyID] ASC
,[DivisionID] ASC
,[DepartmentID] ASC
,[MfrVendorID] ASC
)
-- This table has the product information
CREATE TABLE [dbo].[GICSPFProduct](
[CompanyID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DivisionID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DepartmentID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ProductID] [int] NOT NULL IDENTITY (1,1),
[MFRVendorID] [int] NOT NULL,
[SPFCategoryID] [int] NULL,
[SPFCatalog] [varchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[SPFDescEn] [nvarchar] (57) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
--................ Etc.....
[LockedBy] [varchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LockTS] [datetime] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[GICSPFProduct] ADD CONSTRAINT [IX_GICSPFProduct_CoDivDepProductID] PRIMARY KEY
CLUSTERED
(
[CompanyID] ASC
,[DivisionID] ASC
,[DepartmentID] ASC
,[ProductID] ASC
)WITH
(
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_GICSPFProduct_CoDivDepMfrVendorIDProductID] ON [dbo].[GICSPFProduct]
(
[CompanyID] ASC
,[DivisionID] ASC
,[DepartmentID] ASC
,[MFRVendorID] ASC
,[ProductID] ASC
)ON [PRIMARY]
GO
--And this table, all the prices and costs of the said product:
CREATE TABLE [dbo].[GICSPFProductToZone](
[CompanyID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DivisionID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DepartmentID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[MFRZoneID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ListPrice] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_ListPrice] DEFAULT ((0)),
[Col1] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_Col1] DEFAULT ((0)),
[Col2] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_Col2] DEFAULT ((0)),
[Col3] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_Col3] DEFAULT ((0)),
[DistrCost] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_DistrCost] DEFAULT ((0)),
[calcSPFMatrixCost] [money] null default null,
[calcPCSalesCost] [money] null default null,
[calcPCPurCost] [money] null default null,
[SPFMatrixDetailID] int null default null,
[PCSalesMatrixDetailID] int null default null,
[PCPurMatrixDetailID] int null default null,
[LastUpdateDate] [datetime] NULL,
[LockedBy] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LockTS] [datetime] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[GICSPFProductToZone] ADD CONSTRAINT [PK_GICSPFProductToZone_CoDivDepProductID] PRIMARY KEY
CLUSTERED
(
[CompanyID] ASC
,[DivisionID] ASC
,[DepartmentID] ASC
,[ProductID] ASC
,[MFRZoneID] ASC
)WITH
(
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
Cheers,
J-F

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 1

I'm including the Time / IO Information to demonstrate the reads:

(36278 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0

Table 'GICSPFProductToZone'. Scan count 1, logical reads 10441, physical reads 0, read-ahead reads 7

Table 'GICSPFProduct'. Scan count 5, logical reads 111265, physical reads 0, read-ahead reads 0

Table 'GICSPFMFR'. Scan count 5, logical reads 10, physical reads 0, read-ahead reads 0

Table 'GICSPFVendorToMFR'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0

Table 'GICSPFVendor'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0

Table 'Product'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0

Table 'GICSPFVendorCategoryDetail'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 1516 ms, elapsed time = 4047 ms.

Edit : Removed LOB Reads

Cheers,

J-F

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 2

I'm stumped that it needs to scan the whole GICSPFProductToZone table, where a seek could lower the reads drastically, and probably decrease the time of the execution by a fair amount... Any insight on why it needs to do this?

Cheers,

J-F

chrisfradenburg SSCrazy Eights Points: 9592 More actions · Answer 3

Can you post the SQL you're using? I'm seeing part of it in the plan but it cuts off.

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 4

J-F Bergeron

SSCrazy Eights

Points: 8593

May 11, 2010 at 11:24 am

#1165718

Edit: My Error

Cheers,

J-F

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 5

Sure, here's the full SQL. Thanks for taking a look at it.

DECLARE @CompanyID VARCHAR(4) ,

@DivisionID VARCHAR(4) ,

@DepartmentID VARCHAR(4) ,

@VendorCode VARCHAR(5) ,

@MfrCode VARCHAR(5) ,

@StartID INT ,

@EndID INT

SELECT @COmpanyID = 'GIC' ,

@DivisionID = '9830' ,

@DepartmentID = '3041' ,

@VendorCode = 'BRA' ,

@MFRCODE = 'SDD' ,

@StartID = 1 ,

@EndID = 151 ;

SET STATISTICS IO ON ;

SET STATISTICS TIME ON ;

WITH cte

AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY CompanyID, DivisionID, DepartmentID, ProductID ORDER BY CompanyID, DivisionID, DepartmentID, ProductID ) AS UniqID ,

*

FROM ( SELECT p.CompanyID ,

p.DivisionID ,

p.DepartmentID ,

p.ProductID ,

p.MFRVendorID ,

v.VendorCode ,

0 AS nbMatrices ,

m.MfrCode ,

mpc.MFRCode AS MfrCodePC ,

p.SPFCatalog ,

pPC.Catalog AS CatalogPC ,

p.SPFDescEn ,

p.SPFDescFr ,

p.SPFUPC ,

p.VndPrdCode ,

0 AS SPFUnitQty ,

p.VndCatalog ,

p.SPFPOA ,

p.SPFLastUpdDate ,

p.SPFLastUpdEffDate ,

p.SPFStatus ,

/*ParamStatus.ParamDescEN AS SPFStatusEn ,

ParamStatus.ParamDescFr AS SPFStatusFr ,*/

p.SPFStatusEffDate ,

p.VndCategory ,

dc.CategoryName ,

CASE WHEN v.VendorKeyColumn = 'UPC' THEN p.SPFUPC

WHEN v.VendorKeyColumn = 'VndPrdCode' THEN p.VndPrdCode

WHEN v.VendorKeyColumn = 'VndCatalog' THEN p.VndCatalog

END AS SPFKeyFieldValue ,

pz.DistrCost ,

/*COALESCE(costs.ReplCost, pz.calcPCPurCost, pz.calcSPFMatrixCost, pz.DistrCost) AS ReplCost ,

CASE WHEN costs.ReplCostUOM IS NOT NULL THEN costs.ReplCostUOM

WHEN pz.calcPCPurCost IS NOT NULL THEN COALESCE(mdPur.PurUOM, p.SPFPurUOM)

WHEN pz.calcSPFMatrixCost IS NOT NULL THEN COALESCE(mdSPF.PurUOM, p.SPFPurUOM)

ELSE p.SPFPurUOM

END AS ReplCostUOM ,

CASE WHEN costs.ReplCostSource = 'R' THEN 'CDN'

ELSE v.VendorCurrency

END AS ReplCostCurrency ,

ParCur.ParamDescEn AS ReplCostCurrencyEnCol ,

ParCur.ParamDescFr AS ReplCostCurrencyFrCol ,

COALESCE(costs.SlsViewCost, pz.calcPCSalesCost, pz.calcPCPurCost,

pz.calcSPFMatrixCost, pz.DistrCost) AS SlsViewCost ,

CASE WHEN costs.SlsViewCostUOM IS NOT NULL THEN costs.SlsViewCostUOM

WHEN pz.calcPCSalesCost IS NOT NULL THEN COALESCE(mdSls.PurUOM, p.SPFPurUOM)

WHEN pz.calcPCPurCost IS NOT NULL THEN COALESCE(mdPur.PurUOM, p.SPFPurUOM)

WHEN pz.calcSPFMatrixCost IS NOT NULL THEN COALESCE(mdSPF.PurUOM, p.SPFPurUOM)

ELSE p.SPFPurUOM

END AS SlsViewCostUOM ,

COALESCE(pz.calcSPFMatrixCost, pz.DistrCost) AS SPFCost ,

COALESCE(mdSPF.PurUOM, p.SPFPurUOM) AS SPFCostUOM ,*/

p.LockedBy ,

p.LockTS

FROM GICSPF.dbo.GICSPFProductToZone pz

INNER JOIN GICSPF.dbo.GICSPFProduct p ON pz.CompanyID = p.CompanyID

AND pz.DivisionID = p.DivisionID

AND pz.DepartmentID = p.DepartmentID

AND pz.ProductID = p.ProductID

INNER JOIN GICSPF.dbo.GICSPFVendorToMFR vm ON p.CompanyID = vm.CompanyID

AND p.DivisionID = vm.DivisionID

AND p.DepartmentID = vm.DepartmentID

AND p.MFRVendorID = vm.MFRVendorID

INNER JOIN GICSPF.dbo.GICSPFMFR m ON vm.CompanyID = m.CompanyID

AND vm.DivisionID = m.DivisionID

AND vm.DepartmentID = m.DepartmentID

AND vm.MfrID = m.MfrID

INNER JOIN GICSPF.dbo.GICSPFVendor v ON vm.CompanyID = v.CompanyID

AND vm.DivisionID = v.DivisionID

AND vm.DepartmentID = v.DepartmentID

AND vm.VendorID = v.VendorID

LEFT JOIN GICSPF.dbo.GICSPFMatrixDetail mdSPF ON pz.CompanyID = mdSPf.CompanyID

AND pz.DivisionID = mdSPF.DivisionID

AND pz.DepartmentID = mdSPF.DepartmentID

AND pz.SPFMatrixDetailID = mdSPF.MatrixDetailID

LEFT JOIN Enterprise.dbo.GICVendorMatrixDetail mdPur ON pz.CompanyID = mdPur.CompanyID

AND pz.DivisionID = mdPur.DivisionID

AND pz.DepartmentID = mdPur.DepartmentID

AND pz.PCPurMatrixDetailID = mdPur.MatrixDetailID

LEFT JOIN Enterprise.dbo.GICVendorMatrixDetail mdSls ON pz.CompanyID = mdSls.CompanyID

AND pz.DivisionID = mdSls.DivisionID

AND pz.DepartmentID = mdSls.DepartmentID

AND pz.PCSalesMatrixDetailID = mdSls.MatrixDetailID

LEFT JOIN Enterprise.Prd.Product pPC ON p.CompanyID = pPC.CompanyID

AND p.DivisionID = pPC.DivisionID

AND p.DepartmentID = pPC.DepartmentID

AND p.ProductID = pPC.SPFProductID

LEFT JOIN Enterprise.Prd.Manufacturer mPC ON pPC.CompanyID = mpc.CompanyID

AND pPC.DivisionID = mpc.DivisionID

AND pPC.DepartmentID = mpc.DepartmentID

AND pPC.MFRID = mPC.MFRID

/*LEFT JOIN Enterprise.Prd.vwProductCosts costs ON pPC.CompanyID = costs.CompanyID

AND pPC.DivisionID = costs.DivisionID

AND pPC.DepartmentID = costs.DepartmentID

AND pPC.ProductID = costs.ProductID*/

LEFT JOIN GICSPF.dbo.GICSPFVendorCategoryDetail dc ON p.CompanyID = dc.CompanyID

AND p.DivisionID = dc.DivisionID

AND p.DepartmentID = dc.DepartmentID

AND p.VndCategory = dc.CategoryID

LEFT JOIN Enterprise.dbo.GICParam ParamStatus ON p.SPFStatus = ParamStatus.ParamValue

AND ParamStatus.ParamName = 'SPFProductStatus'

/*LEFT JOIN Enterprise.dbo.GICParam ParCur ON CASE WHEN costs.ReplCostSource = 'R'

THEN 'CDN'

ELSE v.VendorCurrency

END = ParCur.ParamValue

AND ParCur.ParamName = 'APEBatchCurrency'*/

) AS a

WHERE CompanyID = @CompanyID

AND DivisionID = @DivisionID

AND DepartmentID = @DepartmentID

AND VendorCode = @VendorCode

--AND MfrCode = @MfrCode --@@@

)

SELECT --TOP 150

*

FROM cte

WHERE UniqID = 1

/*AND RowNum >= @StartID

AND RowNum < @EndID*/

SET STATISTICS TIME OFF ;

SET STATISTICS IO OFF ;

Cheers,

J-F

chrisfradenburg SSCrazy Eights Points: 9592 More actions · Answer 6

I can't really explain why this would happen but I do have a recommendation. We've solved similar problems in the past by loading information from some of the tables into a table variable or temp table and then join onto that in the main query. For example, load all the columns you need from GICSPFProduct into a table variable using the where clause you have and then join onto that. You may need to join onto GICSPFMFR for the initial load if you want to use MfrCode since that's the only selection criteria that's in a different table. And, depending on how selective the four columns in GICSPFProduct are it may make sense to not check the MfrCode until the main query.

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 7

cfradenburg (5/11/2010)
I can't really explain why this would happen but I do have a recommendation. We've solved similar problems in the past by loading information from some of the tables into a table variable or temp table and then join onto that in the main query. For example, load all the columns you need from GICSPFProduct into a table variable using the where clause you have and then join onto that. You may need to join onto GICSPFMFR for the initial load if you want to use MfrCode since that's the only selection criteria that's in a different table. And, depending on how selective the four columns in GICSPFProduct are it may make sense to not check the MfrCode until the main query.

Thanks a lot for the input, it's really appreciated, as I don't know how to solve this "issue" anymore.

Therefore, I do not think creating a table variable that will hold part of the data is really the key, since this is a query that runs with different parameters all the time. Also, I do not think using a table variable will improve the performance, from my understanding, it only creates a "In memory" representation of my data, and anywayz, everything is actually in memory, when I execute the query a second time.

Maybe the functional requirement is not correct in this statement, people want to have the whole 40K items returned to the client, at the glimpse of an eye, so they can use the page down, and scroll through the entire dataset.

This is why I tried returning the first 150 rows, but then again, it does not help, since the Where criteria and the ordering of the Dataset is chosen by the client. Even when filtering to the first 150 rows, it still takes the same time to run, about 1.5 to 2.5 secs, depending on the query. And that is when the query is fully in memory, so when ran twice in a row... Anywayz, thanks for your help.

Waiting on any other input.

Thanks again,

Cheers,

J-F

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

For some reason I cannot view the plan, but you are joining 12 tables (hopefully none of those are views), so any aberation in the rowcount estimation by the optimizer will get compounded into ugliness.

1) try dynamic sql, which will give the optimizer hard-coded values to use for the estimates. This has some downsides, primary one for your scenario being the compile time for such a complex query. Be sure to guard against sql injection if you go this route.

2) decompiling this query into smaller chunks and using temporary tables can be a winner in a lot of cases, especially where the optimal plan can vary widely based on input parameters.

3) Consider using a professional to help you with this one. There are other tricks to be had here, and perhaps you can learn enough to tackle both this one and the other beast you mentioned.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Stefan_G SSCertifiable Points: 6609 More actions · Answer 9

The reason the SQL server is using a merge join for joining with ProductToZone is that the cost for reading 653191 rows with sequential IO from the index and then performing a merge join is lower than performing 36278 index seeks.

This is not so strange at all.

I dont think this is your main problem. It is only reading 12MB of data with a single sequential IO. This should be really fast.

You should focus on the key lookup into the product table instead. Try creating a covering index for that table by including all the columns needed for this query.

That will probably speed up this query a lot.

After testing the effect of such a large covering index you can decide if it is worth keeping or not.

If you need to speed up the query even more you would need to eliminate the join between product and ProductToZone - perhaps by denormalizing your table to store the zone informaiton with the product. This could be done by creating an indexed view, or using triggers to maintain the denormalized value.

But try just creating the covering index for product first. I think it will solve your problems.

/SG

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 10

Stefan_G (5/17/2010)
The reason the SQL server is using a merge join for joining with ProductToZone is that the cost for reading 653191 rows with sequential IO from the index and then performing a merge join is lower than performing 36278 index seeks.
This is not so strange at all.
I dont think this is your main problem. It is only reading 12MB of data with a single sequential IO. This should be really fast.
You should focus on the key lookup into the product table instead. Try creating a covering index for that table by including all the columns needed for this query.
That will probably speed up this query a lot.
After testing the effect of such a large covering index you can decide if it is worth keeping or not.
If you need to speed up the query even more you would need to eliminate the join between product and ProductToZone - perhaps by denormalizing your table to store the zone informaiton with the product. This could be done by creating an indexed view, or using triggers to maintain the denormalized value.
But try just creating the covering index for product first. I think it will solve your problems.
/SG

Stefan, first, thanks for your input, those are the kind of comments I was looking for.

I will try the covering index. I quite eliminated this possibility since there were many fields to add as included, and I did not see the point, but you're right, I will try it.

Now about the denormalizing, you are also right, I tried this possibility, but the key lookup of the product was even bigger, so it did not save me much time, well, it was not really worth it for a refond of those tables, since they are the main tables of the application.

I did not realize the key lookup was that long to do, I don't know the internals of a key lookup, but I'm surprised it's longer to do 36K seeks compared to 600K Scans on the ProductToZone.

I will try the covering index, thanks for your input again.

Cheers,

J-F

Stefan_G SSCertifiable Points: 6609 More actions · Answer 11

J-F Bergeron (5/17/2010)
I did not realize the key lookup was that long to do, I don't know the internals of a key lookup, but I'm surprised it's longer to do 36K seeks compared to 600K Scans on the ProductToZone.

It is performing 36k random seeks to do the key lookup. Each seek might need to read data from disk, so worst case you are performing 36K random reads from disk. A normal disk system can support about 100 random reads/sec which means this could take up to 360 seconds.

The scan is reading 600k rows in a single operation. These 600k only takes 12MB (according to your posted plan). Reading 12MB from disk in one operation probably only takes 50ms or so.

The lesson here is that if you have a performance problem with a query it is almost always a good idea to try to avoid key lookups by creating covering indexes.

/SG