Partial Clustered Index Seek then Merge Join...

  • Hi all,

    I've been trying to figure this one out for quite a while, but the query was always much too complex to post here. Now I have a lighter query that may explain my problem a bit better, see the plan attached.

    Now, I'm curious, when I look at the plan, I join GICSPFVendorToMfr to GICSPFProduct on the index, and it does an index seek on the product table, then a key lookup, to go look in the clustered index of the product which is fine.

    The problem comes in the GICSPFProductToZone, where there are as much rows as in the product table (Approx 700K). It does do a seek on the Clustered index, so at first, I thought, Great, the query works correctly, but when you look more closely, it does a seek on the 3 "Company fields", which are generic for a PC, but not on the ProductID!!!. The productID is the same as in the product table, and it has just done a SEEK on the Product Table, so why not try to reduce the number of reads in the ProductToZone table from 700K to about 36K... This seems to cause delay in the query, and is giving me a hard time.

    The query itself, takes about 1.5 secs CPU to execute, which isn't bad, but I commented out a big part of the query, I need it to take max 1.5 secs... since the client wants to access his products at the glimpse of an eye... Can anyone explain why it only does a partial Seek?

    Thanks In advance,

    For the sake of clarity, here are the main indexes of the problem:

    This table holds the "Vendor To Mfr Combination"

    CREATE TABLE [dbo].[GICSPFVendorToMFR](

    [CompanyID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DivisionID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DepartmentID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [MfrVendorID] [int] NOT NULL IDENTITY (1,1),

    [VendorID] [int] NOT NULL,

    [MFRID] [int] NULL,

    [LockedBy] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [LockTS] [datetime] NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[GICSPFVendorToMFR] ADD CONSTRAINT [PK_GICSPFVendorToMFR_CoDivDepMfrVendorID] PRIMARY KEY

    CLUSTERED

    (

    [CompanyID] ASC

    ,[DivisionID] ASC

    ,[DepartmentID] ASC

    ,[MfrVendorID] ASC

    )

    -- This table has the product information

    CREATE TABLE [dbo].[GICSPFProduct](

    [CompanyID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DivisionID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DepartmentID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [ProductID] [int] NOT NULL IDENTITY (1,1),

    [MFRVendorID] [int] NOT NULL,

    [SPFCategoryID] [int] NULL,

    [SPFCatalog] [varchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SPFDescEn] [nvarchar] (57) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    --................ Etc.....

    [LockedBy] [varchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [LockTS] [datetime] NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[GICSPFProduct] ADD CONSTRAINT [IX_GICSPFProduct_CoDivDepProductID] PRIMARY KEY

    CLUSTERED

    (

    [CompanyID] ASC

    ,[DivisionID] ASC

    ,[DepartmentID] ASC

    ,[ProductID] ASC

    )WITH

    (

    PAD_INDEX = OFF

    ,STATISTICS_NORECOMPUTE = OFF

    ,IGNORE_DUP_KEY = OFF

    ,ALLOW_ROW_LOCKS = ON

    ,ALLOW_PAGE_LOCKS = ON

    ) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_GICSPFProduct_CoDivDepMfrVendorIDProductID] ON [dbo].[GICSPFProduct]

    (

    [CompanyID] ASC

    ,[DivisionID] ASC

    ,[DepartmentID] ASC

    ,[MFRVendorID] ASC

    ,[ProductID] ASC

    )ON [PRIMARY]

    GO

    --And this table, all the prices and costs of the said product:

    CREATE TABLE [dbo].[GICSPFProductToZone](

    [CompanyID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DivisionID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DepartmentID] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [MFRZoneID] [int] NOT NULL,

    [ProductID] [int] NOT NULL,

    [ListPrice] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_ListPrice] DEFAULT ((0)),

    [Col1] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_Col1] DEFAULT ((0)),

    [Col2] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_Col2] DEFAULT ((0)),

    [Col3] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_Col3] DEFAULT ((0)),

    [DistrCost] [money] NOT NULL CONSTRAINT [DF_GICSPFProductToZone_DistrCost] DEFAULT ((0)),

    [calcSPFMatrixCost] [money] null default null,

    [calcPCSalesCost] [money] null default null,

    [calcPCPurCost] [money] null default null,

    [SPFMatrixDetailID] int null default null,

    [PCSalesMatrixDetailID] int null default null,

    [PCPurMatrixDetailID] int null default null,

    [LastUpdateDate] [datetime] NULL,

    [LockedBy] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [LockTS] [datetime] NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[GICSPFProductToZone] ADD CONSTRAINT [PK_GICSPFProductToZone_CoDivDepProductID] PRIMARY KEY

    CLUSTERED

    (

    [CompanyID] ASC

    ,[DivisionID] ASC

    ,[DepartmentID] ASC

    ,[ProductID] ASC

    ,[MFRZoneID] ASC

    )WITH

    (

    PAD_INDEX = OFF

    ,STATISTICS_NORECOMPUTE = OFF

    ,IGNORE_DUP_KEY = OFF

    ,ALLOW_ROW_LOCKS = ON

    ,ALLOW_PAGE_LOCKS = ON

    ) ON [PRIMARY]

    Cheers,

    J-F

  • I'm including the Time / IO Information to demonstrate the reads:

    (36278 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0

    Table 'GICSPFProductToZone'. Scan count 1, logical reads 10441, physical reads 0, read-ahead reads 7

    Table 'GICSPFProduct'. Scan count 5, logical reads 111265, physical reads 0, read-ahead reads 0

    Table 'GICSPFMFR'. Scan count 5, logical reads 10, physical reads 0, read-ahead reads 0

    Table 'GICSPFVendorToMFR'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0

    Table 'GICSPFVendor'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0

    Table 'Product'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0

    Table 'GICSPFVendorCategoryDetail'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1516 ms, elapsed time = 4047 ms.

    Edit : Removed LOB Reads

    Cheers,

    J-F

  • I'm stumped that it needs to scan the whole GICSPFProductToZone table, where a seek could lower the reads drastically, and probably decrease the time of the execution by a fair amount... Any insight on why it needs to do this?

    Cheers,

    J-F

  • Can you post the SQL you're using? I'm seeing part of it in the plan but it cuts off.

  • Edit: My Error

    Cheers,

    J-F

  • Sure, here's the full SQL. Thanks for taking a look at it.

    DECLARE @CompanyID VARCHAR(4) ,

    @DivisionID VARCHAR(4) ,

    @DepartmentID VARCHAR(4) ,

    @VendorCode VARCHAR(5) ,

    @MfrCode VARCHAR(5) ,

    @StartID INT ,

    @EndID INT

    SELECT @COmpanyID = 'GIC' ,

    @DivisionID = '9830' ,

    @DepartmentID = '3041' ,

    @VendorCode = 'BRA' ,

    @MFRCODE = 'SDD' ,

    @StartID = 1 ,

    @EndID = 151 ;

    SET STATISTICS IO ON ;

    SET STATISTICS TIME ON ;

    WITH cte

    AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY CompanyID, DivisionID, DepartmentID, ProductID ORDER BY CompanyID, DivisionID, DepartmentID, ProductID ) AS UniqID ,

    *

    FROM ( SELECT p.CompanyID ,

    p.DivisionID ,

    p.DepartmentID ,

    p.ProductID ,

    p.MFRVendorID ,

    v.VendorCode ,

    0 AS nbMatrices ,

    m.MfrCode ,

    mpc.MFRCode AS MfrCodePC ,

    p.SPFCatalog ,

    pPC.Catalog AS CatalogPC ,

    p.SPFDescEn ,

    p.SPFDescFr ,

    p.SPFUPC ,

    p.VndPrdCode ,

    0 AS SPFUnitQty ,

    p.VndCatalog ,

    p.SPFPOA ,

    p.SPFLastUpdDate ,

    p.SPFLastUpdEffDate ,

    p.SPFStatus ,

    /*ParamStatus.ParamDescEN AS SPFStatusEn ,

    ParamStatus.ParamDescFr AS SPFStatusFr ,*/

    p.SPFStatusEffDate ,

    p.VndCategory ,

    dc.CategoryName ,

    CASE WHEN v.VendorKeyColumn = 'UPC' THEN p.SPFUPC

    WHEN v.VendorKeyColumn = 'VndPrdCode' THEN p.VndPrdCode

    WHEN v.VendorKeyColumn = 'VndCatalog' THEN p.VndCatalog

    END AS SPFKeyFieldValue ,

    pz.DistrCost ,

    /*COALESCE(costs.ReplCost, pz.calcPCPurCost, pz.calcSPFMatrixCost, pz.DistrCost) AS ReplCost ,

    CASE WHEN costs.ReplCostUOM IS NOT NULL THEN costs.ReplCostUOM

    WHEN pz.calcPCPurCost IS NOT NULL THEN COALESCE(mdPur.PurUOM, p.SPFPurUOM)

    WHEN pz.calcSPFMatrixCost IS NOT NULL THEN COALESCE(mdSPF.PurUOM, p.SPFPurUOM)

    ELSE p.SPFPurUOM

    END AS ReplCostUOM ,

    CASE WHEN costs.ReplCostSource = 'R' THEN 'CDN'

    ELSE v.VendorCurrency

    END AS ReplCostCurrency ,

    ParCur.ParamDescEn AS ReplCostCurrencyEnCol ,

    ParCur.ParamDescFr AS ReplCostCurrencyFrCol ,

    COALESCE(costs.SlsViewCost, pz.calcPCSalesCost, pz.calcPCPurCost,

    pz.calcSPFMatrixCost, pz.DistrCost) AS SlsViewCost ,

    CASE WHEN costs.SlsViewCostUOM IS NOT NULL THEN costs.SlsViewCostUOM

    WHEN pz.calcPCSalesCost IS NOT NULL THEN COALESCE(mdSls.PurUOM, p.SPFPurUOM)

    WHEN pz.calcPCPurCost IS NOT NULL THEN COALESCE(mdPur.PurUOM, p.SPFPurUOM)

    WHEN pz.calcSPFMatrixCost IS NOT NULL THEN COALESCE(mdSPF.PurUOM, p.SPFPurUOM)

    ELSE p.SPFPurUOM

    END AS SlsViewCostUOM ,

    COALESCE(pz.calcSPFMatrixCost, pz.DistrCost) AS SPFCost ,

    COALESCE(mdSPF.PurUOM, p.SPFPurUOM) AS SPFCostUOM ,*/

    p.LockedBy ,

    p.LockTS

    FROM GICSPF.dbo.GICSPFProductToZone pz

    INNER JOIN GICSPF.dbo.GICSPFProduct p ON pz.CompanyID = p.CompanyID

    AND pz.DivisionID = p.DivisionID

    AND pz.DepartmentID = p.DepartmentID

    AND pz.ProductID = p.ProductID

    INNER JOIN GICSPF.dbo.GICSPFVendorToMFR vm ON p.CompanyID = vm.CompanyID

    AND p.DivisionID = vm.DivisionID

    AND p.DepartmentID = vm.DepartmentID

    AND p.MFRVendorID = vm.MFRVendorID

    INNER JOIN GICSPF.dbo.GICSPFMFR m ON vm.CompanyID = m.CompanyID

    AND vm.DivisionID = m.DivisionID

    AND vm.DepartmentID = m.DepartmentID

    AND vm.MfrID = m.MfrID

    INNER JOIN GICSPF.dbo.GICSPFVendor v ON vm.CompanyID = v.CompanyID

    AND vm.DivisionID = v.DivisionID

    AND vm.DepartmentID = v.DepartmentID

    AND vm.VendorID = v.VendorID

    LEFT JOIN GICSPF.dbo.GICSPFMatrixDetail mdSPF ON pz.CompanyID = mdSPf.CompanyID

    AND pz.DivisionID = mdSPF.DivisionID

    AND pz.DepartmentID = mdSPF.DepartmentID

    AND pz.SPFMatrixDetailID = mdSPF.MatrixDetailID

    LEFT JOIN Enterprise.dbo.GICVendorMatrixDetail mdPur ON pz.CompanyID = mdPur.CompanyID

    AND pz.DivisionID = mdPur.DivisionID

    AND pz.DepartmentID = mdPur.DepartmentID

    AND pz.PCPurMatrixDetailID = mdPur.MatrixDetailID

    LEFT JOIN Enterprise.dbo.GICVendorMatrixDetail mdSls ON pz.CompanyID = mdSls.CompanyID

    AND pz.DivisionID = mdSls.DivisionID

    AND pz.DepartmentID = mdSls.DepartmentID

    AND pz.PCSalesMatrixDetailID = mdSls.MatrixDetailID

    LEFT JOIN Enterprise.Prd.Product pPC ON p.CompanyID = pPC.CompanyID

    AND p.DivisionID = pPC.DivisionID

    AND p.DepartmentID = pPC.DepartmentID

    AND p.ProductID = pPC.SPFProductID

    LEFT JOIN Enterprise.Prd.Manufacturer mPC ON pPC.CompanyID = mpc.CompanyID

    AND pPC.DivisionID = mpc.DivisionID

    AND pPC.DepartmentID = mpc.DepartmentID

    AND pPC.MFRID = mPC.MFRID

    /*LEFT JOIN Enterprise.Prd.vwProductCosts costs ON pPC.CompanyID = costs.CompanyID

    AND pPC.DivisionID = costs.DivisionID

    AND pPC.DepartmentID = costs.DepartmentID

    AND pPC.ProductID = costs.ProductID*/

    LEFT JOIN GICSPF.dbo.GICSPFVendorCategoryDetail dc ON p.CompanyID = dc.CompanyID

    AND p.DivisionID = dc.DivisionID

    AND p.DepartmentID = dc.DepartmentID

    AND p.VndCategory = dc.CategoryID

    LEFT JOIN Enterprise.dbo.GICParam ParamStatus ON p.SPFStatus = ParamStatus.ParamValue

    AND ParamStatus.ParamName = 'SPFProductStatus'

    /*LEFT JOIN Enterprise.dbo.GICParam ParCur ON CASE WHEN costs.ReplCostSource = 'R'

    THEN 'CDN'

    ELSE v.VendorCurrency

    END = ParCur.ParamValue

    AND ParCur.ParamName = 'APEBatchCurrency'*/

    ) AS a

    WHERE CompanyID = @CompanyID

    AND DivisionID = @DivisionID

    AND DepartmentID = @DepartmentID

    AND VendorCode = @VendorCode

    --AND MfrCode = @MfrCode --@@@

    )

    SELECT --TOP 150

    *

    FROM cte

    WHERE UniqID = 1

    /*AND RowNum >= @StartID

    AND RowNum < @EndID*/

    SET STATISTICS TIME OFF ;

    SET STATISTICS IO OFF ;

    Cheers,

    J-F

  • I can't really explain why this would happen but I do have a recommendation. We've solved similar problems in the past by loading information from some of the tables into a table variable or temp table and then join onto that in the main query. For example, load all the columns you need from GICSPFProduct into a table variable using the where clause you have and then join onto that. You may need to join onto GICSPFMFR for the initial load if you want to use MfrCode since that's the only selection criteria that's in a different table. And, depending on how selective the four columns in GICSPFProduct are it may make sense to not check the MfrCode until the main query.

  • cfradenburg (5/11/2010)


    I can't really explain why this would happen but I do have a recommendation. We've solved similar problems in the past by loading information from some of the tables into a table variable or temp table and then join onto that in the main query. For example, load all the columns you need from GICSPFProduct into a table variable using the where clause you have and then join onto that. You may need to join onto GICSPFMFR for the initial load if you want to use MfrCode since that's the only selection criteria that's in a different table. And, depending on how selective the four columns in GICSPFProduct are it may make sense to not check the MfrCode until the main query.

    Thanks a lot for the input, it's really appreciated, as I don't know how to solve this "issue" anymore.

    Therefore, I do not think creating a table variable that will hold part of the data is really the key, since this is a query that runs with different parameters all the time. Also, I do not think using a table variable will improve the performance, from my understanding, it only creates a "In memory" representation of my data, and anywayz, everything is actually in memory, when I execute the query a second time.

    Maybe the functional requirement is not correct in this statement, people want to have the whole 40K items returned to the client, at the glimpse of an eye, so they can use the page down, and scroll through the entire dataset.

    This is why I tried returning the first 150 rows, but then again, it does not help, since the Where criteria and the ordering of the Dataset is chosen by the client. Even when filtering to the first 150 rows, it still takes the same time to run, about 1.5 to 2.5 secs, depending on the query. And that is when the query is fully in memory, so when ran twice in a row... Anywayz, thanks for your help.

    Waiting on any other input.

    Thanks again,

    Cheers,

    J-F

  • For some reason I cannot view the plan, but you are joining 12 tables (hopefully none of those are views), so any aberation in the rowcount estimation by the optimizer will get compounded into ugliness.

    1) try dynamic sql, which will give the optimizer hard-coded values to use for the estimates. This has some downsides, primary one for your scenario being the compile time for such a complex query. Be sure to guard against sql injection if you go this route.

    2) decompiling this query into smaller chunks and using temporary tables can be a winner in a lot of cases, especially where the optimal plan can vary widely based on input parameters.

    3) Consider using a professional to help you with this one. There are other tricks to be had here, and perhaps you can learn enough to tackle both this one and the other beast you mentioned.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The reason the SQL server is using a merge join for joining with ProductToZone is that the cost for reading 653191 rows with sequential IO from the index and then performing a merge join is lower than performing 36278 index seeks.

    This is not so strange at all.

    I dont think this is your main problem. It is only reading 12MB of data with a single sequential IO. This should be really fast.

    You should focus on the key lookup into the product table instead. Try creating a covering index for that table by including all the columns needed for this query.

    That will probably speed up this query a lot.

    After testing the effect of such a large covering index you can decide if it is worth keeping or not.

    If you need to speed up the query even more you would need to eliminate the join between product and ProductToZone - perhaps by denormalizing your table to store the zone informaiton with the product. This could be done by creating an indexed view, or using triggers to maintain the denormalized value.

    But try just creating the covering index for product first. I think it will solve your problems.

    /SG

  • Stefan_G (5/17/2010)


    The reason the SQL server is using a merge join for joining with ProductToZone is that the cost for reading 653191 rows with sequential IO from the index and then performing a merge join is lower than performing 36278 index seeks.

    This is not so strange at all.

    I dont think this is your main problem. It is only reading 12MB of data with a single sequential IO. This should be really fast.

    You should focus on the key lookup into the product table instead. Try creating a covering index for that table by including all the columns needed for this query.

    That will probably speed up this query a lot.

    After testing the effect of such a large covering index you can decide if it is worth keeping or not.

    If you need to speed up the query even more you would need to eliminate the join between product and ProductToZone - perhaps by denormalizing your table to store the zone informaiton with the product. This could be done by creating an indexed view, or using triggers to maintain the denormalized value.

    But try just creating the covering index for product first. I think it will solve your problems.

    /SG

    Stefan, first, thanks for your input, those are the kind of comments I was looking for.

    I will try the covering index. I quite eliminated this possibility since there were many fields to add as included, and I did not see the point, but you're right, I will try it.

    Now about the denormalizing, you are also right, I tried this possibility, but the key lookup of the product was even bigger, so it did not save me much time, well, it was not really worth it for a refond of those tables, since they are the main tables of the application.

    I did not realize the key lookup was that long to do, I don't know the internals of a key lookup, but I'm surprised it's longer to do 36K seeks compared to 600K Scans on the ProductToZone.

    I will try the covering index, thanks for your input again.

    Cheers,

    J-F

  • J-F Bergeron (5/17/2010)


    I did not realize the key lookup was that long to do, I don't know the internals of a key lookup, but I'm surprised it's longer to do 36K seeks compared to 600K Scans on the ProductToZone.

    It is performing 36k random seeks to do the key lookup. Each seek might need to read data from disk, so worst case you are performing 36K random reads from disk. A normal disk system can support about 100 random reads/sec which means this could take up to 360 seconds.

    The scan is reading 600k rows in a single operation. These 600k only takes 12MB (according to your posted plan). Reading 12MB from disk in one operation probably only takes 50ms or so.

    The lesson here is that if you have a performance problem with a query it is almost always a good idea to try to avoid key lookups by creating covering indexes.

    /SG

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply