UPDATE SET =NULL running over an hour

Question

Post reply

UPDATE SET =NULL running over an hour

Viewing 10 posts - 31 through 39 (of 39 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

Heh... that code never get's past the first 10000... keeps updating the first 10000 over and over and over.

Lemme build a test table, write a bit of code, and I'll be back.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan Melo Old Hand Points: 329 More actions · Answer 2

There are some things you might want to change on your last snippet, even before Jeff gives his code back.

First off, your temp table shouldn't have the key as varchar(50), since in the original table the key is int. That means you're converting everything all over the place.

Second, you need a way to progress along the temp table or (as Jeff said) you keep updating the same rows forever. Get a counter and update it.

The following snippet is your code, slightly modified (and untested) to run with these changes (note also that I use SELECT to set the variables, as BOL indicates SET is deprecated) :

CREATE TABLE #tmp1 (id INT IDENTITY(1,1)PRIMARY KEY CLUSTERED, claim_key INT)

CREATE NONCLUSTERED INDEX IX_TMP1_claimkey ON #tmp1(claim_key ASC);

INSERT INTO #tmp1 SELECT claim_key FROM dbo.CLAIMS WHERE admission_date IS NOT NULL;

INSERT INTO #tmp1 SELECT T1.claim_key

FROM dbo.CLAIMS T1 LEFT OUTER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key

WHERE T1.discharge_date IS NOT NULL AND T2.claim_key IS NULL;

DECLARE @BatchSize INT, @BatchCounter INT;

SELECT @BatchSize = 10000, @BatchCounter = 1;

WHILE EXISTS(SELECT 1 FROM dbo.CLAIMS WHERE admission_date IS NOT NULL)

BEGIN

UPDATE T1

SET admission_date=NULL, discharge_date=NULL

FROM dbo.CLAIMS T1 INNER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key

WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize;

SELECT @BatchCounter = @BatchCounter + @BatchSize

END

That should help a bit.

Gimli son of Gloin Ten Centuries Points: 1263 More actions · Answer 3

Jonathan Melo (12/5/2008)
There are some things you might want to change on your last snippet, even before Jeff gives his code back.
First off, your temp table shouldn't have the key as varchar(50), since in the original table the key is int. That means you're converting everything all over the place.
Second, you need a way to progress along the temp table or (as Jeff said) you keep updating the same rows forever. Get a counter and update it.
The following snippet is your code, slightly modified (and untested) to run with these changes (note also that I use SELECT to set the variables, as BOL indicates SET is deprecated) :
CREATE TABLE #tmp1 (id INT IDENTITY(1,1)PRIMARY KEY CLUSTERED, claim_key INT)
CREATE NONCLUSTERED INDEX IX_TMP1_claimkey ON #tmp1(claim_key ASC);
INSERT INTO #tmp1 SELECT claim_key FROM dbo.CLAIMS WHERE admission_date IS NOT NULL;
INSERT INTO #tmp1 SELECT T1.claim_key
FROM dbo.CLAIMS T1 LEFT OUTER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key
WHERE T1.discharge_date IS NOT NULL AND T2.claim_key IS NULL;
DECLARE @BatchSize INT, @BatchCounter INT;
SELECT @BatchSize = 10000, @BatchCounter = 1;
WHILE EXISTS(SELECT 1 FROM dbo.CLAIMS WHERE admission_date IS NOT NULL)
BEGIN
UPDATE T1
SET admission_date=NULL, discharge_date=NULL
FROM dbo.CLAIMS T1 INNER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key
WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize;
SELECT @BatchCounter = @BatchCounter + @BatchSize
END

Thanks Jonathan. I made your recommended changes and reran it. I killed it after it ran for 1:58 hrs. It ran 186 batches of 10,000 rows before stopping.

I made two changes to your code above:

Added "-1" and changed T1.id to T2.id in this line:

WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize-1;

-Doug

Gimli son of Gloin Ten Centuries Points: 1263 More actions · Answer 4

Matt Miller (12/3/2008)
Here's an example of walking the clustered index:
drop table #ranges
;with AlphanumericCTE as (
select
char(N) digit,
char(N+1) nextdigit
from tally where
n between 48 and 57 or --digits 0 through 9
n between 65 and 90) --A through Z
select row_number() over (order by a1.digit,a2.digit) rn,
a1.digit+a2.digit startrange,
a1.digit+a2.nextdigit endrange
into #ranges
from alphanumericCTE a1 cross join alphanumericCTE a2
order by startrange
declare @curr int
declare @lastrow int
set @curr=1;
select @lastrow=max(rn) from #ranges
--walk through the ranges bsaed on the first two digits of the member_ID
--presumes that the spread is fairly even - if not- then customize at will
while (@curr<=@lastrow)
begin
UPDATE CLAIMS
SET
admission_date = NULL,
discharge_date = NULL
from claims join #ranges on member_id between startrange and endrange
where rn=@curr and (admission_date is not null or discharge_date is not null)
set @curr=@curr+1;
end
--create table claims(member_ID varchar(20), admission_date datetime, discharge_date datetime)
Assuming you don't make that into one giant transaction - this should go a bit faster. Note that this is only one pass.

Matt, I don't have the table "tally". How can I get it?

--Nevermind: I see it is just a table of integers that should cover at least 48-90. This method of yours is very interesting. It appears to be working great except, I need the #ranges to include "startrange"-"endrange":

09-10

19-20

29-30

39-40

49-50

59-60

69-70

...and so on.

Without these ranges, the update overlooks member_ids that begin with, ie.

59

69

Currently, it searches for

member_id between '58' and '59'

member_id between '59' and '5:'

But, neither of these will pick up the member_id '5938489'

However, this will

member_id between '59' and '60'.

I will manually enter these and test it.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

Are your member Id's actually just made up of numbers even though they are stored varchar? If so - you could simply use the Tally table itself to generate the startrange and endrange. Something like:

;with AlphanumericCTE as (

select

right('0000'+cast(N as varchar(4)),4) digit,

right('0000'+cast(N+1 as varchar(4)),4) nextdigit

from tally where

n between 0 and 999)

select row_number() over (order by a1.digit) rn,

a1.digit startrange,

a1.nextdigit endrange

--into #ranges

from alphanumericCTE a1

order by startrange

You will still have to add on last update (for member_ID>'999')

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Gimli son of Gloin Ten Centuries Points: 1263 More actions · Answer 6

The member ids sometimes do have alphanumeric characters but should never have any symbols, ie. "[" and ":"; so I like the first way you populated #ranges. I will add something to get in the other ranges I need.

Your script finished running in 25:12 mins. It searched 1296 times, only 70 searches actually affected rows. The others found no match.

Is there a way to narrow the #ranges to only those that are necessary to the data in CLAIMS tbl instead of going through all 1296 possibilities? That would speed it up tremendously. The member ids will vary by data source and I will never know exactly the combination of the 1st two characters until the new data source is loaded.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 7

Here's an improved version for fixing the ranges:

--drop table #ranges

;with Alpha1 as (select

char(N) digit,

Row_number() over (order by N ) rowid

from tally where

n between 48 and 57 or --digits 0 through 9

n between 65 and 90

), --A through Z

AlphanumericCTE as (

selectRow_number() over (order by a1.digit,a2.digit ) rid,

a1.digit+a2.digit strtrange

from Alpha1 A1 Cross join Alpha1 A2

) --A through Z

select row_number() over (order by acte1.strtrange,acte2.strtrange) rn,

acte1.strtrange startrange,

acte2.strtrange endrange

from alphanumericCTE acte1 left join alphanumericCTE acte2

on acte1.rid=acte2.rid-1

order by startrange

You will still need to run one last query after this loop with a > than the "max endrange".

As to cutting down the number of ranges to run against - you could TRY adding an IF EXISTS beofre the update. Something like:

IF EXISTS (select NULL

from claims

join #ranges on member_id between startrange and endrange

where rn=@curr)

UPDATE CLAIMS

etc...

Of course - this may actually slow things down. This may be a "TEST and see" scenario. OR - you can try finding all of the ranges FIRST (select Distinct left(member_ID,2) etc...), but again - I have a feeling that might be slower too.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Gimli son of Gloin Ten Centuries Points: 1263 More actions · Answer 8

Matt, thank you for the range table fix. I implemented that code and also the IF EXISTS and it ran at 28:43. It seems the fastest I can get as it is now, is to remove the index and run the updates as is which gets me to 24:17 mins; next is walking the clustered index which runs at 25:12 mins. That is a nice improvement over the inital 1:13 hrs.

------------------

Thank you all, Matt, Jeff, Lynn, and Jonathan for the time you spent on this and for your suggestions.

Have a Merry Christmas!

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

Thanks for the feedback!

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

Same here... thanks for the feedback. Sorry I got side tracked...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)