November 30, 2010 at 5:07 pm
I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:
UPDATE table1,
SET col3 = 'value',
col4 = 'value'
WHERE col1 + col2
IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>
The question is, is there a more efficient way to search on these 2 columns than to concat them together?
December 1, 2010 at 4:58 am
Ken Sturgeon (11/30/2010)
I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:UPDATE table1,
SET col3 = 'value',
col4 = 'value'
WHERE col1 + col2
IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>
The question is, is there a more efficient way to search on these 2 columns than to concat them together?
Hi ken, i had typed out an answer for you and lost the whole thing when my session timed out. Anyway, here we go again.
I assume, there is a clustered index, as part of the primary key. A clustered index helps if your query is like -
---
-----
where col1 = 'value1' and col2 = 'value2'
In your case, your search predicate is based on col1 + col2 and there is no index on it. So, the query would cause a clustered index scan, which is not efficient. Please check the execution plan to confirm this. To speed up the query, you have to add a covering index. You can do it in two ways.
1. Create a computed column -
create a computed column coln = col1 + col2.
create a non-clustered index on it.
query becomes faster.
2. If you have constraints that prevent you from adding a computed column, go for this approach. For this to work, the table should have a unique column - UC, preferable integer.
-----------------------------------------------------------------
if (object_id('tempdb..#test') is not null) drop table #test
go
create table #test(UC int, coln varchar(50))
go
insert into #test
select UC, col1 + col2
from table1
create nonclustered index IX_Test_Coln on #test(coln) include (UC)
-----------------------------------------------------------------
update t1
set col3 = 'value', col4 = 'value'
from
#test t
join table1 t1
on t1.UC = t.UC
where t.coln in ('col1col2', 'col1col2', . . . 'n')
This will use the index IX_Test_Coln in #test table. So, your query will work faster.
Please let me know if this does not work.
- arjun
https://sqlroadie.com/
December 1, 2010 at 11:18 am
Thx for your response! Learned a couple of things from you today and thx for that as well. One of the things I did not mention in my first post was that I am dealing with a non-normalized database and with no unique fields in this table1. Also, I don't have the capability of adding the UC column you recommended. I set the query up basically as you recommended but used a composite of 4 columns to create unique querying criteria in place of the UC and the query plan did not improve much.
I created:
CREATE TABLE #tmpOrd (col5 varchar(25), col6 float, salesorder varchar(20))
go
INSERT INTO #tmpOrd
SELECT col5, col6, ltrim(rtrim(str(col1))) + rtrim(col2) as salesorder
FROM table1
CREATE NONCLUSTERED INDEX IX_tmpOrd on #tmpOrd(salesorder) include (col5, col6)
UPDATE table1
SET col5 = '40401A ',
col6 = 110301
FROM #tmpOrd t
JOIN table1 t1
ON ltrim(rtrim(str(t1.col1))) + rtrim(t1.col2) = t.salesorder
WHERE t.salesorder
IN ('value1', 'value2', . . . 'n')
Should setting up the 4-column composite work the same as using your UC column?
December 1, 2010 at 12:39 pm
Hi ken, i would need more info to give you a good explanation. I am replying from my phone now, so will keep it short.
It's not surprising that you didn't get an improvement when you used 4 columns to uniquely identify the rows. More columns in the index would slow down queries as the number of levels of the b tree increases.
1. Can't you use computed column approach?
2. How about creating a identity column and using that as the unique column?
3. In the query you have posted, you do the join on col1 + col2, which does not use the index and is not correct.
4. If you cannot add a computed column and don't have a unique column, you may not be able to improve the query's performance by a sizeable margin as you can't use the benefit of indexes.
I suggest that you read Gail Shaw's article on indexes in SSC. I don't have the URL right now. Will post that later.
https://sqlroadie.com/
December 1, 2010 at 1:25 pm
Could I set up the computed column in the temp table? Otherwise, no, I wouldn't be able to create a computed column on the actual table unless it can be done on the fly somehow. We are using a JD Edwards preconfigured prod database and it is being replicated so mgmt is not keen on adding columns or even indexes on the prod copy. I did give you the wrong syntax from what I am using. It should be:
CREATE TABLE #tmpOrd (col5 varchar(25), col6 float, salesorder varchar(20))
GO
INSERT INTO #tmpOrd
SELECT col5, col6, ltrim(rtrim(str(col1))) + rtrim(col2) as salesorder
FROM table1
CREATE NONCLUSTERED INDEX IX_tmpOrd on #tmpOrd(salesorder) include (col5, col6)
UPDATE table1
SET col7 = '40401A ',
col8 = 110301
FROM #tmpOrd t
JOIN table1 t1
ON t1.salesorder = t.salesorder
WHERE t.salesorder
IN ('value1', 'value2', . . . 'n')
You may be correct that the query is doomed due to business constraints but let me approach it from a different angle. I tried the following using a CTE as well and did see some improvement if I could get your input on how to identify the min and max salesorder numbers on the fly. The salesorders are input using the full Sales Order + Sales Order Type concatenated so the input is arbitrary. When I limited the range of SalesOrders, of course, the query ran much faster. The query is:
SELECT ltrim(rtrim(str(col1))) + rtrim(col2) salesorders
INTO #tmpOrd
FROM table1
WHERE ltrim(rtrim(str(col1))) + rtrim(col2)
IN ('value','value', . . . 'n')
AND salesorders between (lower salesorder number) AND (upper salesorder number in the mix entered by the user)
--This is the point I wanted to try to add maybe a min and max that could accept any arbitrary values.
--CTE used here to cycle through records needing the update held in #tmpOrd table
WITH CAccts (col3, col4) AS
(
SELECTsh.col3,
sh.col4
FROM table1 sh
JOIN #tmpOrd t
ON sh.salesorders=t.salesorders
)
UPDATE CAccts
SET sh.col3 = '40404A ',
sh.col4 = 110301
GO
Does this look feasible?
December 1, 2010 at 2:46 pm
What about the obvious answer of changing the query to use the clustered index by removing the IN clause? The IN clause is treated just like a series of OR clauses; but using the contatenated values prevents the index from being used. Why not change the code to build out the query as OR statements instead?
Like this:
WHERE (col1 = 'value1' and col2 = 'value2')
OR (col1 = 'value3' and col2 = 'value4')
OR (col1 = 'value5' and col2 = 'value6')
December 1, 2010 at 3:06 pm
Ken Sturgeon (11/30/2010)
I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:UPDATE table1,
SET col3 = 'value',
col4 = 'value'
WHERE col1 + col2
IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>
The question is, is there a more efficient way to search on these 2 columns than to concat them together?
Ken,
Is it safe to assume that those 60 Sales Orders have the two separate columns also?
You might want to try:
UPDATE t1,
SET col3 = 'value',
col4 = 'value'
FROM table1 t1
JOIN OtherTable t2 -- the random assortment of some 60 Sales Orders
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
WHERE t2.somecolumn = criteria to select from (if needed)
Obviously, indexes on col1 and col2 on both tables will help
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
December 1, 2010 at 10:44 pm
Ken Sturgeon (12/1/2010)
Could I set up the computed column in the temp table?
I'm afraid that won't be of much help, as you want to do the update on the parent table and you do not have a simple unique key.
UPDATE table1
SET col7 = '40401A ',
col8 = 110301
FROM #tmpOrd t
JOIN table1 t1
ON t1.salesorder = t.salesorder
WHERE t.salesorder
IN ('value1', 'value2', . . . 'n')
JOIN table1 t1
ON t1.salesorder = t.salesorder
This part right here, how do you do this? If I am not wrong, there is no column called salesorder in table1 and you get that value by concatenating col1 and col2. If you are doing that, you are not using indexes again, in spite of adding one. You can try the 'AND - OR' approach suggested by fellow SSCians. That should help a lot.
--This is the point I wanted to try to add maybe a min and max that could accept any arbitrary values.
--CTE used here to cycle through records needing the update held in #tmpOrd table
WITH CAccts (col3, col4) AS
(
SELECTsh.col3,
sh.col4
FROM table1 sh
JOIN #tmpOrd t
ON sh.salesorders=t.salesorders
)
UPDATE CAccts
SET sh.col3 = '40404A ',
sh.col4 = 110301
GO
Does this look feasible?
Yes, mentioning a range helps. But at this point the best bet is to AND-OR the search criteria.
Other suggestions:
1. If table1 is not frequently updated, how about creating a materialized view? If you do not want to modify the table structure, this could be one way. You can add a computed column here; and you can go for an identity column as well.
2. It will help if you can post the table definition and your current query. People here can work on that. That's always an easier way.
Check out this article by Gail Shaw. It will help you demystify a lot of things.
http://www.sqlservercentral.com/articles/Indexing/68439/
- arjun
https://sqlroadie.com/
December 2, 2010 at 9:43 am
Thanks everyone for your feedback. I am looking into your suggestions and will get back to you on my results.
December 17, 2010 at 8:13 am
Sorry to take so long getting back but had some things come up. Finally decided, along with our app. developer, to go with the suggestion of breaking out the two key columns using the OR statments. I found out after talking with the developer that he could build a loop into his code (wasn't sure what his options were originally) that would separate out the Sales Order number from the Order Type concat. and plug them into the update statement. Problem solved. Thanks for the great assistance you all provided!
December 17, 2010 at 7:32 pm
Cool. Keep coming back to SSC. 🙂
https://sqlroadie.com/
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply