November 23, 2015 at 11:38 am
A vendor script is doing a query like this:
use databaseABC
select column5 from table10
where '_' + cast(pk as varchar) = '_10000'
pk is and integer primary key so it has a clustered index. Due to the where clause, it is doing a clustered index scan instead of a seek.
I can modify the database but not the query. Is there any kind of index or other modification that would help here?
Thanks
November 23, 2015 at 12:02 pm
PHXHoward (11/23/2015)
A vendor script is doing a query like this:use databaseABC
select column5 from table10
where '_' + cast(pk as varchar) = '_10000'
pk is and integer primary key so it has a clustered index. Due to the where clause, it is doing a clustered index scan instead of a seek.
I can modify the database but not the query. Is there any kind of index or other modification that would help here?
Thanks
My question would be why is the vendor adding an underscore to the numeric value? If you want to improve the query strip the leading underscore and convert the value to an int.
Not knowing the details of the table (number of columns, data length of the columns), there really isn't much you can do without modifying the query itself. Leaving the query alone, creating a persistent computed column converting the pk to varchar and prepending an underscore, and then creating a nonclustered index on that column would simple take the query from a clustered index scan to a nonclustered index scan with a lookup IF SQL Server thought it would be faster than a clustered index scan.
November 23, 2015 at 12:16 pm
They are using it to create an export of the column to a text file. The _ + primary key is how they want to label the file name that is created.
There are 5.6 million rows in the table and it is taking over 1 second per row. Query cost is 36.
The DTA suggests creating a new non clustered index with PK as the column. After that, running the query does a non clustered index scan and a key lookup. It drops the query cost from 36 to 16.
Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?
November 23, 2015 at 12:32 pm
I created a persisted computed column to matchthe where line.
alter table table10 add merged as '_' + cast(pk as varchar) persisted
create index test2 on table10 (merged asc)
The query plan does not use it.
November 23, 2015 at 12:40 pm
That's extremely bizarre. They can just do:
WHERE pk = '10000'
and if pk is a numeric type, SQL will convert the 10000 to that type and be able to do a seek. Query labels/naming are 100% irrelevant to this WHERE condition(!).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 23, 2015 at 12:43 pm
The problem in here is the query. Other than adding hardware, there's no other viable solution.
If they're labeling the file name with an underscore, it's fine, but they don't need to include the underscore in the where clause.
Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?
Because the nonclustered index conatins only the pk column, while the clustered index is the whole table. In other words, it needs to read less pages. If you include the columns used in the query as covering columns, it might avoid the lookup, but it will also take more space in the leaf pages.
November 23, 2015 at 1:13 pm
Did you still have the other nonclustered index? Then, yes it would still use that index and it is fine. The reason it is faster is there is more data per page than on the clustered index (the table itself). Still, there are better ways to accomplish what they want without going through the hassles they are going through from looking at that query.
November 23, 2015 at 4:58 pm
This is very interesting (to me).
I deleted the other NCI so that I only have the CI and the also NCI that is based on the persisted computed column. If I run the query, it picks to do a slow clustered index scan. If I run the query with (RECOMPILE), it does a very fast nci seek and lookup. If I take with (RECOMPILE) off, it will unfortunately revert to the slow plan.
Any idea why and if I can force it to recompile each time or else use the better plan? I can't change the query that the export script runs.
Thanks for the help!
Howard
November 24, 2015 at 12:43 am
I suspect that the engine is reverting to a cached plan, and reading BoL in regards to the RECOMPILE query hint
Instructs the SQL Server Database Engine to discard the plan generated for the query after it executes, forcing the query optimizer to recompile a query plan the next time the same query is executed. Without specifying RECOMPILE, the Database Engine caches query plans and reuses them. When compiling query plans, the RECOMPILE query hint uses the current values of any local variables in the query and, if the query is inside a stored procedure, the current values passed to any parameters
it seems to back this hypothosis up.
Can anyone with more knowledge on the query engine internals confirm my understanding is correct.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 24, 2015 at 3:16 am
PHXHoward (11/23/2015)
They are using it to create an export of the column to a text file. The _ + primary key is how they want to label the file name that is created.There are 5.6 million rows in the table and it is taking over 1 second per row. Query cost is 36.
The DTA suggests creating a new non clustered index with PK as the column. After that, running the query does a non clustered index scan and a key lookup. It drops the query cost from 36 to 16.
Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?
What are they using to facilitate the export? SSIS?
"The _ + primary key is how they want to label the file name that is created" Only one row per file?
What is the association between the WHERE clause and the file name?
Why can't you rewrite their awful script?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 24, 2015 at 9:10 am
ChrisM@Work (11/24/2015)
PHXHoward (11/23/2015)
They are using it to create an export of the column to a text file. The _ + primary key is how they want to label the file name that is created.There are 5.6 million rows in the table and it is taking over 1 second per row. Query cost is 36.
The DTA suggests creating a new non clustered index with PK as the column. After that, running the query does a non clustered index scan and a key lookup. It drops the query cost from 36 to 16.
Why is it faster to make a non clustered index of the same column as the clustered index and have it scan that and then do a lookup instead of just scanning the clustered index?
What are they using to facilitate the export? SSIS?
"The _ + primary key is how they want to label the file name that is created" Only one row per file?
What is the association between the WHERE clause and the file name?
Why can't you rewrite their awful script?
I agree. If you can make changes to the database tables, why can't you fix their queries?
November 24, 2015 at 10:26 am
Ok, I'll explain.
We have a product that the company has been using since the late 1980s. They are switching to another product to replace it. The new product uses Oracle. The vendor for the new system tells us that they have no SQL Servers in their data center so I can not just send them a .bak file and let them do the export/import.
The developer wrote a process to extract the tables in the SQL Server database to csv that I can zip up and send to them.
The process that the developer created runs a script in sql server. I save the output to a text file that I run as a batch file. The batch file uses bcp to dump each table to a csv file. It is very quick.
The catch is that data in very large "text" fields is skipped in the initial run. The second part of the script does a bcp for every row to put the text field into a file for each row. They have created a directory for each table and they use bcp to extract those very large fields to later merge back into the Oracle database.
For example
file 1 D:\Export\table1\<text column name>_<pk name>
file 2 D:\Export\table1\<text column name>_<pk name>
file 3 D:\Export\table1\<text column name>_<pk name>
etc...
The table has 5.7 million rows in it. The batch file has pre-created 5.7 million lines using that query provided at the top of this thread. Due to its construction, is trying to do very slow scans.
The text file is almost 2GB so I can't open it and modify the query 5.7 million times.
Anyway, that's what I'm stuck with. I was hoping to speed it up. The suggestion that Lynn gave about a persisted computed column does speed it up a lot but it only uses that plan if I use option (RECOMPILE). I was hoping there was a plan guide or something that would get it to use the better plan.
I should not be reengineering their export process so that is why I can't change the query. Instead I'm just trying to get it done.
Thanks for all the help.
November 24, 2015 at 10:36 am
That doesn't explain why would the query use the underscore in the WHERE clause.
If this is an export for a migration and not a permanent process, I suggest that you change the query to make it SARGable, or demand them to do it.
November 24, 2015 at 10:44 am
Absolutely the query should be rewritten but it is TG week and the person is out of the office all week.
It would be useful for me to learn something about working around an unchangeable query. ๐
November 25, 2015 at 1:31 am
PHXHoward (11/24/2015)
Absolutely the query should be rewritten but it is TG week and the person is out of the office all week.It would be useful for me to learn something about working around an unchangeable query. ๐
Thanks for the explanation.
What proportion of rows have a non-empty value in the text column?
"The text file is almost 2GB so I can't open it and modify the query 5.7 million times."
What are you proposing to do with it? Have you tested to ensure it will work?
Wouldn't it be more efficient to write one small batch to run against 5.7 million rows, exporting each non-empty text column value to file?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply