JOIN with TVP ignoring Index

  • I have a table with 35 million rows.  I am querying this table by joining it to a single column TVP on an indexed column.

    The TVP contains a single row.

    SQL chooses to do an index scan on the whole table and it takes about 15 mins to run.

    If I add a FORCESEEK, the query runs instantly.

    If I replace the TVP with a temp table, the query runs instantly.

    Original Query. Takes 15mins

    declare @p3 dbo.IdList
    insert into @p3 values(39425783)
    select c.* from dbo.mytable c join @p3 i on i.Id = c.IndexedColumn

    with a ForceSeek, runs instantly.

    declare @p3 dbo.IdList
    insert into @p3 values(39425783)
    select c.* from dbo.mytable c with(forceseek, index (IX_MyIndex)) join @p3 i on i.Id = c.IndexedColumn

    The single column in the TVP is an INT.  The Indexed Column in MyTable is an INT.

    Adding OPTION (RECOMPILE) does nothing, neither does enabling Trace Flag 2453.

    I've read that SQL struggles with row estimates with TVPs and can generate bad plans. However, the TVP row estimate (1) is the same as the actual rows (1) in the execution plan so I'm struggling to understand this behavior and why SQL Server refuses to use the index to do a seek.

    • This topic was modified 6 months ago by  planetmatt.
  • Personally, I'd try using a temp table and see if it works well before I spent lots of time trying other work-arounds:

    declare @p3 dbo.IdList

    select * into #p3 from @p3

    create unique clustered index p3__cl on #p3 ( id ); /* optional, of course */

    insert into #p3 values(39425783) /* or, perhaps: select distinct <id_column> into #p3 */

    select c.* from dbo.mytable c inner join #p3 i on i.Id = c.IndexedColumn

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I'm not sure if this will fix it but the first thing I would try is to make sure the statistics have been updated:

    UPDATE STATISTICS dbo.mytable WITH FULLSCAN;

    Is this query fast or slow?

    select c.* from dbo.mytable c where c.IndexedColumn = 39425783
  • The temp table works even without an index.

    However, it is not my code. This is code executed via DynamicSQL (being dynamic or static doesnt change performance) from a 3rd party web application and I am trying to troubleshoot as parts of it time out due to our large data volumes which are not present in the vendor's smaller development database.  While the vendor may change their queries or indexes, they are unlikely to stop using the TVPs which are universally used throughout the codebase to pass in various parameters.

    Realistically at this point, I'm just trying to gather a better understanding of what is going on here for my own sanity.  If the Execution plan shows SQL expects a single row in the TVP, why does it not choose to do a seek?  I could understand it if the row estimates were off but they're not.

  • Jonathan AC Roberts wrote:

    I'm not sure if this will fix it but the first thing I would try is to make sure the statistics have been updated:

    UPDATE STATISTICS dbo.mytable WITH FULLSCAN;

    Is this query fast or slow?

    select c.* from dbo.mytable c where c.IndexedColumn = 39425783

    That query runs instantly. Does an Index Seek then a Key Lookup.

    I have updated Statistics but I have not done a Full scan because it's 33m rows.   Might be worth a go though.

  • planetmatt wrote:

    Jonathan AC Roberts wrote:

    I'm not sure if this will fix it but the first thing I would try is to make sure the statistics have been updated:

    UPDATE STATISTICS dbo.mytable WITH FULLSCAN;

    Is this query fast or slow?

    select c.* from dbo.mytable c where c.IndexedColumn = 39425783

    That query runs instantly. Does an Index Seek then a Key Lookup.

    I have updated Statistics but I have not done a Full scan because it's 33m rows.   Might be worth a go though.

    Why don't you just leave the FORCESEEK hint in the query?

  • planetmatt wrote:

    Realistically at this point, I'm just trying to gather a better understanding of what is going on here for my own sanity.  If the Execution plan shows SQL expects a single row in the TVP, why does it not choose to do a seek?  I could understand it if the row estimates were off but they're not.

    I would think the index on the main table is not unique.  Therefore, even though there is only 1 row value to be looked up, in theory at least that could match to many, many rows on the main table.  Since you did a SELECT * on the main table, all columns must be listed.  Rather than risk having to do many, many row key lookups, SQL just scans the main table instead.  I'm not saying that's the right thing for the optimizer to do, just that I can see why it might do that.  And a future lookup could be for many keys, and, again, each of those keys could match multiple rows in the main table.

    It'd be interesting to see the histo-stats for that nonclus index on the main table and see how many dup values there tend to be in those keys.  For example, say for one lookup value, there are 50K rows (out of 33M) in the main table.  50K lookups would be vast overhead, so SQL's almost certain to just scan the main table instead, considering that would be more efficient.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • A TVP is a table variable.  Variables do not have any statistics.  Historically SQL always assumed that a table variable contained exactly 1 record.  In SQL 2019, this was changed to 100 records.  The lack of stats will impact the query optimiser's ability to generate a good plan.  Also, if the data types of the join columns are not 100% the same between the TVP and your main table, there will be implicit conversions which will also impact the query optimiser.

  • Duplicate post removed

  • ScottPletcher wrote:

    planetmatt wrote:

    Realistically at this point, I'm just trying to gather a better understanding of what is going on here for my own sanity.  If the Execution plan shows SQL expects a single row in the TVP, why does it not choose to do a seek?  I could understand it if the row estimates were off but they're not.

    I would think the index on the main table is not unique.  Therefore, even though there is only 1 row value to be looked up, in theory at least that could match to many, many rows on the main table.  Since you did a SELECT * on the main table, all columns must be listed.  Rather than risk having to do many, many row key lookups, SQL just scans the main table instead.  I'm not saying that's the right thing for the optimizer to do, just that I can see why it might do that.  And a future lookup could be for many keys, and, again, each of those keys could match multiple rows in the main table.

    It'd be interesting to see the histo-stats for that nonclus index on the main table and see how many dup values there tend to be in those keys.  For example, say for one lookup value, there are 50K rows (out of 33M) in the main table.  50K lookups would be vast overhead, so SQL's almost certain to just scan the main table instead, considering that would be more efficient.

     

    I believe you are correct.  The IndexedColumn in my example is a new column recently added by the software vendor. In all 33 million rows, this value is set to 0.

    I suspect SQL expects the join to return so many matches (despite none matching) that it is quicker to do a scan of all records than to first do a seek and then a key lookup on its expected many matches to fullfil the SELECT *.

    If I alter the query to SELECT ID instead of SELECT *, SQL does an Index seek and the query completes immediately.

    So it appears this is a data issue. I have fed this back to the software vendor, but I now understand what is going on.  Thank you.

    • This reply was modified 6 months ago by  planetmatt.
  • DesNorton wrote:

    This article from Brent Ozar might help

    https://www.brentozar.com/archive/2018/09/sql-server-2019-faster-table-variables-and-new-parameter-sniffing-issues/%5B/quote%5D

     

    I had read that prior to posting but I don't think that's what is happening here. The TVP row estimates were good in the plan. This is an issue with the data in the large table and the statistics on the indexed column along with the SELECT * leading SQL to do a scan instead of a seek.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply