Update Records Based on Two Concatenated PK Columns

Question

Update Records Based on Two Concatenated PK Columns

Ken Sturgeon

SSC Enthusiast

Points: 120
More actions
November 30, 2010 at 5:07 pm

#241958

I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:
UPDATE table1,
SET col3 = 'value',
col4 = 'value'
WHERE col1 + col2
IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>
The question is, is there a more efficient way to search on these 2 columns than to concat them together?

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 1

Ken Sturgeon (11/30/2010)
I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:
UPDATE table1,
SET col3 = 'value',
col4 = 'value'
WHERE col1 + col2
IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>
The question is, is there a more efficient way to search on these 2 columns than to concat them together?

Hi ken, i had typed out an answer for you and lost the whole thing when my session timed out. Anyway, here we go again.

I assume, there is a clustered index, as part of the primary key. A clustered index helps if your query is like -

---

-----

where col1 = 'value1' and col2 = 'value2'

In your case, your search predicate is based on col1 + col2 and there is no index on it. So, the query would cause a clustered index scan, which is not efficient. Please check the execution plan to confirm this. To speed up the query, you have to add a covering index. You can do it in two ways.

1. Create a computed column -

create a computed column coln = col1 + col2.

create a non-clustered index on it.

query becomes faster.

2. If you have constraints that prevent you from adding a computed column, go for this approach. For this to work, the table should have a unique column - UC, preferable integer.

-----------------------------------------------------------------

if (object_id('tempdb..#test') is not null) drop table #test

go

create table #test(UC int, coln varchar(50))

go

insert into #test

select UC, col1 + col2

from table1

create nonclustered index IX_Test_Coln on #test(coln) include (UC)

-----------------------------------------------------------------

update t1

set col3 = 'value', col4 = 'value'

from

#test t

join table1 t1

on t1.UC = t.UC

where t.coln in ('col1col2', 'col1col2', . . . 'n')

This will use the index IX_Test_Coln in #test table. So, your query will work faster.

Please let me know if this does not work.

- arjun

https://sqlroadie.com/

Ken Sturgeon SSC Enthusiast Points: 120 More actions · Answer 2

Thx for your response! Learned a couple of things from you today and thx for that as well. One of the things I did not mention in my first post was that I am dealing with a non-normalized database and with no unique fields in this table1. Also, I don't have the capability of adding the UC column you recommended. I set the query up basically as you recommended but used a composite of 4 columns to create unique querying criteria in place of the UC and the query plan did not improve much.

I created:

CREATE TABLE #tmpOrd (col5 varchar(25), col6 float, salesorder varchar(20))

go

INSERT INTO #tmpOrd

SELECT col5, col6, ltrim(rtrim(str(col1))) + rtrim(col2) as salesorder

FROM table1

CREATE NONCLUSTERED INDEX IX_tmpOrd on #tmpOrd(salesorder) include (col5, col6)

UPDATE table1

SET col5 = '40401A ',

col6 = 110301

FROM #tmpOrd t

JOIN table1 t1

ON ltrim(rtrim(str(t1.col1))) + rtrim(t1.col2) = t.salesorder

WHERE t.salesorder

IN ('value1', 'value2', . . . 'n')

Should setting up the 4-column composite work the same as using your UC column?

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 3

Hi ken, i would need more info to give you a good explanation. I am replying from my phone now, so will keep it short.

It's not surprising that you didn't get an improvement when you used 4 columns to uniquely identify the rows. More columns in the index would slow down queries as the number of levels of the b tree increases.

1. Can't you use computed column approach?

2. How about creating a identity column and using that as the unique column?

3. In the query you have posted, you do the join on col1 + col2, which does not use the index and is not correct.

4. If you cannot add a computed column and don't have a unique column, you may not be able to improve the query's performance by a sizeable margin as you can't use the benefit of indexes.

I suggest that you read Gail Shaw's article on indexes in SSC. I don't have the URL right now. Will post that later.

https://sqlroadie.com/

Ken Sturgeon SSC Enthusiast Points: 120 More actions · Answer 4

Could I set up the computed column in the temp table? Otherwise, no, I wouldn't be able to create a computed column on the actual table unless it can be done on the fly somehow. We are using a JD Edwards preconfigured prod database and it is being replicated so mgmt is not keen on adding columns or even indexes on the prod copy. I did give you the wrong syntax from what I am using. It should be:

CREATE TABLE #tmpOrd (col5 varchar(25), col6 float, salesorder varchar(20))

GO

INSERT INTO #tmpOrd

SELECT col5, col6, ltrim(rtrim(str(col1))) + rtrim(col2) as salesorder

FROM table1

CREATE NONCLUSTERED INDEX IX_tmpOrd on #tmpOrd(salesorder) include (col5, col6)

UPDATE table1

SET col7 = '40401A ',

col8 = 110301

FROM #tmpOrd t

JOIN table1 t1

ON t1.salesorder = t.salesorder

WHERE t.salesorder

IN ('value1', 'value2', . . . 'n')

You may be correct that the query is doomed due to business constraints but let me approach it from a different angle. I tried the following using a CTE as well and did see some improvement if I could get your input on how to identify the min and max salesorder numbers on the fly. The salesorders are input using the full Sales Order + Sales Order Type concatenated so the input is arbitrary. When I limited the range of SalesOrders, of course, the query ran much faster. The query is:

SELECT ltrim(rtrim(str(col1))) + rtrim(col2) salesorders

INTO #tmpOrd

FROM table1

WHERE ltrim(rtrim(str(col1))) + rtrim(col2)

IN ('value','value', . . . 'n')

AND salesorders between (lower salesorder number) AND (upper salesorder number in the mix entered by the user)

--This is the point I wanted to try to add maybe a min and max that could accept any arbitrary values.

--CTE used here to cycle through records needing the update held in #tmpOrd table

WITH CAccts (col3, col4) AS

(

SELECTsh.col3,

sh.col4

FROM table1 sh

JOIN #tmpOrd t

ON sh.salesorders=t.salesorders

)

UPDATE CAccts

SET sh.col3 = '40404A ',

sh.col4 = 110301

GO

Does this look feasible?

John Rowan SSC Guru Points: 56440 More actions · Answer 5

What about the obvious answer of changing the query to use the clustered index by removing the IN clause? The IN clause is treated just like a series of OR clauses; but using the contatenated values prevents the index from being used. Why not change the code to build out the query as OR statements instead?

Like this:

WHERE (col1 = 'value1' and col2 = 'value2')

OR (col1 = 'value3' and col2 = 'value4')

OR (col1 = 'value5' and col2 = 'value6')

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

WayneS SSC Guru Points: 95461 More actions · Answer 6

Ken Sturgeon (11/30/2010)
I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:
UPDATE table1,
SET col3 = 'value',
col4 = 'value'
WHERE col1 + col2
IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>
The question is, is there a more efficient way to search on these 2 columns than to concat them together?

Ken,

Is it safe to assume that those 60 Sales Orders have the two separate columns also?

You might want to try:

UPDATE t1,

SET col3 = 'value',

col4 = 'value'

FROM table1 t1

JOIN OtherTable t2 -- the random assortment of some 60 Sales Orders

ON t1.col1 = t2.col1

AND t1.col2 = t2.col2

WHERE t2.somecolumn = criteria to select from (if needed)

Obviously, indexes on col1 and col2 on both tables will help

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 7

Ken Sturgeon (12/1/2010)
Could I set up the computed column in the temp table?

I'm afraid that won't be of much help, as you want to do the update on the parent table and you do not have a simple unique key.

UPDATE table1
SET col7 = '40401A ',
col8 = 110301
FROM #tmpOrd t
JOIN table1 t1
ON t1.salesorder = t.salesorder
WHERE t.salesorder
IN ('value1', 'value2', . . . 'n')

JOIN table1 t1

ON t1.salesorder = t.salesorder

This part right here, how do you do this? If I am not wrong, there is no column called salesorder in table1 and you get that value by concatenating col1 and col2. If you are doing that, you are not using indexes again, in spite of adding one. You can try the 'AND - OR' approach suggested by fellow SSCians. That should help a lot.

--This is the point I wanted to try to add maybe a min and max that could accept any arbitrary values.
--CTE used here to cycle through records needing the update held in #tmpOrd table
WITH CAccts (col3, col4) AS
(
SELECTsh.col3,
sh.col4
FROM table1 sh
JOIN #tmpOrd t
ON sh.salesorders=t.salesorders
)
UPDATE CAccts
SET sh.col3 = '40404A ',
sh.col4 = 110301
GO
Does this look feasible?

Yes, mentioning a range helps. But at this point the best bet is to AND-OR the search criteria.

Other suggestions:

1. If table1 is not frequently updated, how about creating a materialized view? If you do not want to modify the table structure, this could be one way. You can add a computed column here; and you can go for an identity column as well.

2. It will help if you can post the table definition and your current query. People here can work on that. That's always an easier way.

Check out this article by Gail Shaw. It will help you demystify a lot of things.

http://www.sqlservercentral.com/articles/Indexing/68439/

- arjun

https://sqlroadie.com/

Ken Sturgeon SSC Enthusiast Points: 120 More actions · Answer 8

Thanks everyone for your feedback. I am looking into your suggestions and will get back to you on my results.

Ken Sturgeon SSC Enthusiast Points: 120 More actions · Answer 9

Sorry to take so long getting back but had some things come up. Finally decided, along with our app. developer, to go with the suggestion of breaking out the two key columns using the OR statments. I found out after talking with the developer that he could build a loop into his code (wasn't sure what his options were originally) that would separate out the Sales Order number from the Order Type concat. and plug them into the update statement. Problem solved. Thanks for the great assistance you all provided!

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 10

Cool. Keep coming back to SSC. 🙂

https://sqlroadie.com/