I have a table with 35 million rows. I am querying this table by joining it to a single column TVP on an indexed column.
The TVP contains a single row.
SQL chooses to do an index scan on the whole table and it takes about 15 mins to run.
If I add a FORCESEEK, the query runs instantly.
If I replace the TVP with a temp table, the query runs instantly.
Original Query. Takes 15mins
declare @p3 dbo.IdList
insert into @p3 values(39425783)
select c.* from dbo.mytable c join @p3 i on i.Id = c.IndexedColumn
with a ForceSeek, runs instantly.
declare @p3 dbo.IdList
insert into @p3 values(39425783)
select c.* from dbo.mytable c with(forceseek, index (IX_MyIndex)) join @p3 i on i.Id = c.IndexedColumn
The single column in the TVP is an INT. The Indexed Column in MyTable is an INT.
Adding OPTION (RECOMPILE) does nothing, neither does enabling Trace Flag 2453.
I've read that SQL struggles with row estimates with TVPs and can generate bad plans. However, the TVP row estimate (1) is the same as the actual rows (1) in the execution plan so I'm struggling to understand this behavior and why SQL Server refuses to use the index to do a seek.
June 19, 2024 at 3:17 pm
Personally, I'd try using a temp table and see if it works well before I spent lots of time trying other work-arounds:
declare @p3 dbo.IdList
select * into #p3 from @p3
create unique clustered index p3__cl on #p3 ( id ); /* optional, of course */
insert into #p3 values(39425783) /* or, perhaps: select distinct <id_column> into #p3 */
select c.* from dbo.mytable c inner join #p3 i on i.Id = c.IndexedColumn
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 19, 2024 at 3:29 pm
I'm not sure if this will fix it but the first thing I would try is to make sure the statistics have been updated:
UPDATE STATISTICS dbo.mytable WITH FULLSCAN;
Is this query fast or slow?
select c.* from dbo.mytable c where c.IndexedColumn = 39425783
June 19, 2024 at 3:29 pm
The temp table works even without an index.
However, it is not my code. This is code executed via DynamicSQL (being dynamic or static doesnt change performance) from a 3rd party web application and I am trying to troubleshoot as parts of it time out due to our large data volumes which are not present in the vendor's smaller development database. While the vendor may change their queries or indexes, they are unlikely to stop using the TVPs which are universally used throughout the codebase to pass in various parameters.
Realistically at this point, I'm just trying to gather a better understanding of what is going on here for my own sanity. If the Execution plan shows SQL expects a single row in the TVP, why does it not choose to do a seek? I could understand it if the row estimates were off but they're not.
June 19, 2024 at 3:56 pm
I'm not sure if this will fix it but the first thing I would try is to make sure the statistics have been updated:
UPDATE STATISTICS dbo.mytable WITH FULLSCAN;Is this query fast or slow?
select c.* from dbo.mytable c where c.IndexedColumn = 39425783
That query runs instantly. Does an Index Seek then a Key Lookup.
I have updated Statistics but I have not done a Full scan because it's 33m rows. Might be worth a go though.
June 19, 2024 at 4:29 pm
Jonathan AC Roberts wrote:I'm not sure if this will fix it but the first thing I would try is to make sure the statistics have been updated:
UPDATE STATISTICS dbo.mytable WITH FULLSCAN;Is this query fast or slow?
select c.* from dbo.mytable c where c.IndexedColumn = 39425783That query runs instantly. Does an Index Seek then a Key Lookup.
I have updated Statistics but I have not done a Full scan because it's 33m rows. Might be worth a go though.
Why don't you just leave the FORCESEEK hint in the query?
June 19, 2024 at 5:32 pm
Realistically at this point, I'm just trying to gather a better understanding of what is going on here for my own sanity. If the Execution plan shows SQL expects a single row in the TVP, why does it not choose to do a seek? I could understand it if the row estimates were off but they're not.
I would think the index on the main table is not unique. Therefore, even though there is only 1 row value to be looked up, in theory at least that could match to many, many rows on the main table. Since you did a SELECT * on the main table, all columns must be listed. Rather than risk having to do many, many row key lookups, SQL just scans the main table instead. I'm not saying that's the right thing for the optimizer to do, just that I can see why it might do that. And a future lookup could be for many keys, and, again, each of those keys could match multiple rows in the main table.
It'd be interesting to see the histo-stats for that nonclus index on the main table and see how many dup values there tend to be in those keys. For example, say for one lookup value, there are 50K rows (out of 33M) in the main table. 50K lookups would be vast overhead, so SQL's almost certain to just scan the main table instead, considering that would be more efficient.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 20, 2024 at 2:35 am
A TVP is a table variable. Variables do not have any statistics. Historically SQL always assumed that a table variable contained exactly 1 record. In SQL 2019, this was changed to 100 records. The lack of stats will impact the query optimiser's ability to generate a good plan. Also, if the data types of the join columns are not 100% the same between the TVP and your main table, there will be implicit conversions which will also impact the query optimiser.
June 20, 2024 at 2:41 am
Duplicate post removed
June 20, 2024 at 2:53 am
planetmatt wrote:Realistically at this point, I'm just trying to gather a better understanding of what is going on here for my own sanity. If the Execution plan shows SQL expects a single row in the TVP, why does it not choose to do a seek? I could understand it if the row estimates were off but they're not.
I would think the index on the main table is not unique. Therefore, even though there is only 1 row value to be looked up, in theory at least that could match to many, many rows on the main table. Since you did a SELECT * on the main table, all columns must be listed. Rather than risk having to do many, many row key lookups, SQL just scans the main table instead. I'm not saying that's the right thing for the optimizer to do, just that I can see why it might do that. And a future lookup could be for many keys, and, again, each of those keys could match multiple rows in the main table.
It'd be interesting to see the histo-stats for that nonclus index on the main table and see how many dup values there tend to be in those keys. For example, say for one lookup value, there are 50K rows (out of 33M) in the main table. 50K lookups would be vast overhead, so SQL's almost certain to just scan the main table instead, considering that would be more efficient.
I believe you are correct. The IndexedColumn in my example is a new column recently added by the software vendor. In all 33 million rows, this value is set to 0.
I suspect SQL expects the join to return so many matches (despite none matching) that it is quicker to do a scan of all records than to first do a seek and then a key lookup on its expected many matches to fullfil the SELECT *.
If I alter the query to SELECT ID instead of SELECT *, SQL does an Index seek and the query completes immediately.
So it appears this is a data issue. I have fed this back to the software vendor, but I now understand what is going on. Thank you.
June 20, 2024 at 12:38 pm
This article from Brent Ozar might help
I had read that prior to posting but I don't think that's what is happening here. The TVP row estimates were good in the plan. This is an issue with the data in the large table and the statistics on the indexed column along with the SELECT * leading SQL to do a scan instead of a seek.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply