Why would the use of COALESCE on an INNER JOIN result in faster execution?

  • Hi,

    I have a single table which I am joining to a TVF through an inner join on the table's primary key. We have been seeing poor performance with this query under certain circumstances but I found today that if I use the COALESCE keyword on the result of the TVF performance is improved dramatically.

    Why would this be? I can provide schema information if required but I wondered if there was a more general reason for this?

    Thanks

    James

  • If it's eliminating NULL values you might see an increase in performance, but I'm not sure. Can you post the execution plans along with the queries?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Actually, I'm not so sure now that the coalesce operator does improve things. I've included the query and table information below but the issue is that the ([t0].[ProductType] = 1) statement in the where clause dramatically reduces performance. From the execution plans it looks like it's performing a nested loop which is taking the majority of the time.

    Any advice would be greatly appreciated

    Thanks

    James

    Sample Query

    SELECT TOP (10) * FROM [dbo].[CAT_Product] AS [t0]

    INNER JOIN [dbo].[CAT_fctSearchProducts](' FORMSOF (INFLECTIONAL, tube) ') AS [t2] ON [t0].[ProductID] = [t2].[ProductID]

    WHERE ([t0].[ProductType] = 1) AND ([t0].[National] = 1)

    ORDER BY [t0].[ProductType], [t0].[NPC]

    TVF

    CREATE FUNCTION [dbo].[CAT_fctSearchProducts]

    (

    @Clause As VARCHAR(8000)

    )

    RETURNS TABLE

    AS

    RETURN

    (

    SELECT [Key] As ProductID From ContainsTable(Catalogue.dbo.CAT_Product, Keywords, @Clause)

    )

    Table

    CREATE TABLE [dbo].[CAT_Product]

    (

    [ProductID] [int] NOT NULL,

    [NPC] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [EAN] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [GTIN] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [MPC] [varchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [EClassID] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [Brand] [varchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SecondaryDescription] [varchar] (512) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [UOIID] [int] NOT NULL,

    [Units] [int] NULL,

    [VATCode] [int] NULL CONSTRAINT [DF_CAT_Product_VATCode] DEFAULT ((0)),

    [VLTDays] [int] NULL,

    [ProductType] [int] NOT NULL,

    [ContractID] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SupplierID] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Suspended] [bit] NOT NULL CONSTRAINT [DF_CAT_Product_Susoended] DEFAULT ((0)),

    [SuspendedTo] [datetime] NULL,

    [SuspendedReasonID] [int] NULL,

    [StatusID] [int] NULL,

    [AttributeFlags] [varbinary] (1000) NULL CONSTRAINT [DF_CAT_Product_CommitDisc] DEFAULT ((0)),

    [National] [bit] NOT NULL CONSTRAINT [DF_CAT_Product_NationallyAvailable] DEFAULT ((0)),

    [ImageCount] [int] NOT NULL CONSTRAINT [DF_CAT_Product_ImageCount] DEFAULT ((0)),

    [AttachmentCount] [int] NOT NULL CONSTRAINT [DF_CAT_Product_AttachmentCount] DEFAULT ((0)),

    [Keywords] [varchar] (max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [DateFirstAvailable] [datetime] NOT NULL,

    [DateCreated] [datetime] NOT NULL,

    [DateUpdated] [datetime] NOT NULL

    )

    GO

    -- Constraints and Indexes

    ALTER TABLE [dbo].[CAT_Product] ADD CONSTRAINT [PK_CAT_Product] PRIMARY KEY CLUSTERED ([ProductID])

    GO

    CREATE NONCLUSTERED INDEX [IDX_Nationally_Available] ON [dbo].[CAT_Product] ([National])

    GO

    CREATE NONCLUSTERED INDEX [_dta_index_CAT_Product_17_1844201620__K21_K1_K13] ON [dbo].[CAT_Product] ([National], [ProductID], [ProductType])

    GO

    CREATE UNIQUE NONCLUSTERED INDEX [NPC] ON [dbo].[CAT_Product] ([NPC])

    GO

    CREATE NONCLUSTERED INDEX [NPC_EClass_UOI_Supplier] ON [dbo].[CAT_Product] ([ProductID], [EClassID], [UOIID], [SupplierID], [National])

    GO

    CREATE NONCLUSTERED INDEX [ProdIDNPC] ON [dbo].[CAT_Product] ([ProductID], [NPC])

    GO

    CREATE UNIQUE NONCLUSTERED INDEX [IDX_ProdType_NPC] ON [dbo].[CAT_Product] ([ProductType], [NPC])

    GO

    CREATE STATISTICS [_dta_stat_1844201620_13_2_21] ON [dbo].[CAT_Product] ([ProductType], [NPC], [National])

    GO

    -- Full Text Information

    CREATE FULLTEXT INDEX ON [dbo].[CAT_Product] KEY INDEX [PK_CAT_Product] ON [Catalogue] WITH CHANGE_TRACKING OFF

    GO

    ALTER FULLTEXT INDEX ON [dbo].[CAT_Product] ADD (NPC LANGUAGE 1033)

    GO

    ALTER FULLTEXT INDEX ON [dbo].[CAT_Product] ADD (MPC LANGUAGE 1033)

    GO

    ALTER FULLTEXT INDEX ON [dbo].[CAT_Product] ADD (Brand LANGUAGE 1033)

    GO

    ALTER FULLTEXT INDEX ON [dbo].[CAT_Product] ADD (SecondaryDescription LANGUAGE 1033)

    GO

    ALTER FULLTEXT INDEX ON [dbo].[CAT_Product] ADD (Keywords LANGUAGE 1033)

    GO

  • COALESCE can be used as a general query hint. (ie Non tsql specific.)

    You should look a the query plans to see what is going on.

    In this case putting a function on the table valued function column may push the optimizer to use it as the main table doing lookups to the index of your table. This could be more efficient than lookups on a table valued function which does not have an index.

  • Can you post the actual execution plan? Just save it as a .sqlplan file, zip it and attach it.

    I see that your TVF is also calling another function. Just so you know, that's usually a red flag for performance issues. I'm not saying it's the cause of this one, but it is the cause of many.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ken McKelvey (3/31/2009)


    COALESCE can be used as a general query hint. (ie Non tsql specific.)

    You should look a the query plans to see what is going on.

    In this case putting a function on the table valued function column may push the optimizer to use it as the main table doing lookups to the index of your table. This could be more efficient than lookups on a table valued function which does not have an index.

    Hi Ken,

    Can you explain what you mean by this? Do you define an index using the name of the function?

  • Can you post the actual execution plan? Just save it as a .sqlplan file, zip it and attach it.

    I see that your TVF is also calling another function. Just so you know, that's usually a red flag for performance issues. I'm not saying it's the cause of this one, but it is the cause of many.

    Thanks, execution plan is attached. The TVF is calling ContainsTable but that's a SQL Server Full Text function so I assumed that would be OK?

  • My understanding is that most table valued functions do not have a usable index and the optimizer might make an inaccurate estimation of the number of rows.

    (I think the only table valued functions which could have a usable index are inline functions.)

    Putting a function around a join column effectively forces a nested loop join which in some circumstances may be more efficient than a merge or hash join.

    When there are a lot of tables involved, putting COALESCE in some of the joins seems to allow the optimizer to concentrate on getting the best out of the other joins.

  • I've never heard about using COALESCE as a tuning mechanism before. What do you do with it and how do you format it? Is there a set of tests you can reference for that? Just curious.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I have seen it on some sites where query performance tuning programs are used.

    I think COALESCE(joinColumn, joinColumn) just acts as a function and stops an index being used.

  • Yeah, that's exactly what I was thinking. It would prevent the use of the index. That's a concern usually, not a goal. That's why I'm a bit confused.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • There's this article I was going through this morning from MSSQL tips, it did describe few good examples of Coalesce and how it can be used to avoid cursors.

    http://www.mssqltips.com/tip.asp?tip=1521

  • james.spibey (3/31/2009)


    Can you post the actual execution plan? Just save it as a .sqlplan file, zip it and attach it.

    I see that your TVF is also calling another function. Just so you know, that's usually a red flag for performance issues. I'm not saying it's the cause of this one, but it is the cause of many.

    Thanks, execution plan is attached. The TVF is calling ContainsTable but that's a SQL Server Full Text function so I assumed that would be OK?

    Hmmm. That makes me want to see the one with the COALESCE function. First off, you're getting a key lookup. I'm not sure how useable the IDX_ProdType_NPC index is. I'm not positive it will help, but you might be better off moving the ProductType column out of it's own index and making it a part of the clustered index (although I see that's the PK). But that's a big part of the cost of the execution here.

    The CONTAINS operation doesn't limit the data too much does it. It looks like the Nested Loop operation is the right one since you've got 23000 rows from one side and 6 million from the other.

    I'll bet you're getting a table scan instead of the lookup operation when you put the coalesce into the query. Can you post that execution plan too?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant,

    I'm reading it as 600 million rows (627,966,462) from the Full Text Engine.

    I'd also be interested in the execution plan from the Coalesce function query.

    You might also get better performance if you used a temporary table to hold the results of the function and join on that.

    I know virtually nothing about Full Text, but could you change the function so that you pass the Product ID to the function as well and then use CROSS APPLY? This would limit the results returned by the function and may improve your performance. Again, I've never used Full Text so I don't event know if that would be possible.

  • Grant Fritchey (3/31/2009)


    james.spibey (3/31/2009)


    Can you post the actual execution plan? Just save it as a .sqlplan file, zip it and attach it.

    I see that your TVF is also calling another function. Just so you know, that's usually a red flag for performance issues. I'm not saying it's the cause of this one, but it is the cause of many.

    Thanks, execution plan is attached. The TVF is calling ContainsTable but that's a SQL Server Full Text function so I assumed that would be OK?

    Hmmm. That makes me want to see the one with the COALESCE function. First off, you're getting a key lookup. I'm not sure how useable the IDX_ProdType_NPC index is. I'm not positive it will help, but you might be better off moving the ProductType column out of it's own index and making it a part of the clustered index (although I see that's the PK). But that's a big part of the cost of the execution here.

    The CONTAINS operation doesn't limit the data too much does it. It looks like the Nested Loop operation is the right one since you've got 23000 rows from one side and 6 million from the other.

    I'll bet you're getting a table scan instead of the lookup operation when you put the coalesce into the query. Can you post that execution plan too?

    OK, attached is a zip with 3 query plans

    FullQuery.sqlplan

    SELECT TOP (10) * FROM [dbo].[CAT_Product] AS [t0]

    INNER JOIN [dbo].[CAT_fctSearchProducts](' FORMSOF (INFLECTIONAL, tube) ') AS [t2] ON [t0].[ProductID] = [t2].[ProductID]

    WHERE ([t0].[ProductType] = 1) AND ([t0].[National] = 1)

    ORDER BY [t0].[ProductType], [t0].[NPC]

    Coalesce.sqlplan

    SELECT TOP (10) * FROM [dbo].[CAT_Product] AS [t0]

    INNER JOIN [dbo].[CAT_fctSearchProducts](' FORMSOF (INFLECTIONAL, tube) ') AS [t2] ON [t0].[ProductID] = COALESCE([t2].[ProductID], [t2].[ProductID])

    WHERE ([t0].[ProductType] = 1) AND ([t0].[National] = 1)

    ORDER BY [t0].[ProductType], [t0].[NPC]

    NoOrderBy.sqlplan

    SELECT TOP (10) * FROM [dbo].[CAT_Product] AS [t0]

    INNER JOIN [dbo].[CAT_fctSearchProducts](' FORMSOF (INFLECTIONAL, tube) ') AS [t2] ON [t0].[ProductID] = [t2].[ProductID]

    WHERE ([t0].[ProductType] = 1) AND ([t0].[National] = 1)

    I wonder whether then index I've created on (ProductType,NPC) is causing issues? I added it to improve the sort performance but it looks like it's causing issues with this query

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply