UPDATE SET =NULL running over an hour

  • Heh... that code never get's past the first 10000... keeps updating the first 10000 over and over and over.

    Lemme build a test table, write a bit of code, and I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There are some things you might want to change on your last snippet, even before Jeff gives his code back.

    First off, your temp table shouldn't have the key as varchar(50), since in the original table the key is int. That means you're converting everything all over the place.

    Second, you need a way to progress along the temp table or (as Jeff said) you keep updating the same rows forever. Get a counter and update it.

    The following snippet is your code, slightly modified (and untested) to run with these changes (note also that I use SELECT to set the variables, as BOL indicates SET is deprecated) :

    CREATE TABLE #tmp1 (id INT IDENTITY(1,1)PRIMARY KEY CLUSTERED, claim_key INT)

    CREATE NONCLUSTERED INDEX IX_TMP1_claimkey ON #tmp1(claim_key ASC);

    INSERT INTO #tmp1 SELECT claim_key FROM dbo.CLAIMS WHERE admission_date IS NOT NULL;

    INSERT INTO #tmp1 SELECT T1.claim_key

    FROM dbo.CLAIMS T1 LEFT OUTER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key

    WHERE T1.discharge_date IS NOT NULL AND T2.claim_key IS NULL;

    DECLARE @BatchSize INT, @BatchCounter INT;

    SELECT @BatchSize = 10000, @BatchCounter = 1;

    WHILE EXISTS(SELECT 1 FROM dbo.CLAIMS WHERE admission_date IS NOT NULL)

    BEGIN

    UPDATE T1

    SET admission_date=NULL, discharge_date=NULL

    FROM dbo.CLAIMS T1 INNER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key

    WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize;

    SELECT @BatchCounter = @BatchCounter + @BatchSize

    END

    That should help a bit.

  • Jonathan Melo (12/5/2008)


    There are some things you might want to change on your last snippet, even before Jeff gives his code back.

    First off, your temp table shouldn't have the key as varchar(50), since in the original table the key is int. That means you're converting everything all over the place.

    Second, you need a way to progress along the temp table or (as Jeff said) you keep updating the same rows forever. Get a counter and update it.

    The following snippet is your code, slightly modified (and untested) to run with these changes (note also that I use SELECT to set the variables, as BOL indicates SET is deprecated) :

    CREATE TABLE #tmp1 (id INT IDENTITY(1,1)PRIMARY KEY CLUSTERED, claim_key INT)

    CREATE NONCLUSTERED INDEX IX_TMP1_claimkey ON #tmp1(claim_key ASC);

    INSERT INTO #tmp1 SELECT claim_key FROM dbo.CLAIMS WHERE admission_date IS NOT NULL;

    INSERT INTO #tmp1 SELECT T1.claim_key

    FROM dbo.CLAIMS T1 LEFT OUTER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key

    WHERE T1.discharge_date IS NOT NULL AND T2.claim_key IS NULL;

    DECLARE @BatchSize INT, @BatchCounter INT;

    SELECT @BatchSize = 10000, @BatchCounter = 1;

    WHILE EXISTS(SELECT 1 FROM dbo.CLAIMS WHERE admission_date IS NOT NULL)

    BEGIN

    UPDATE T1

    SET admission_date=NULL, discharge_date=NULL

    FROM dbo.CLAIMS T1 INNER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key

    WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize;

    SELECT @BatchCounter = @BatchCounter + @BatchSize

    END

    Thanks Jonathan. I made your recommended changes and reran it. I killed it after it ran for 1:58 hrs. It ran 186 batches of 10,000 rows before stopping.

    I made two changes to your code above:

    Added "-1" and changed T1.id to T2.id in this line:

    WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize-1;

    -Doug

  • Matt Miller (12/3/2008)


    Here's an example of walking the clustered index:

    drop table #ranges

    ;with AlphanumericCTE as (

    select

    char(N) digit,

    char(N+1) nextdigit

    from tally where

    n between 48 and 57 or --digits 0 through 9

    n between 65 and 90) --A through Z

    select row_number() over (order by a1.digit,a2.digit) rn,

    a1.digit+a2.digit startrange,

    a1.digit+a2.nextdigit endrange

    into #ranges

    from alphanumericCTE a1 cross join alphanumericCTE a2

    order by startrange

    declare @curr int

    declare @lastrow int

    set @curr=1;

    select @lastrow=max(rn) from #ranges

    --walk through the ranges bsaed on the first two digits of the member_ID

    --presumes that the spread is fairly even - if not- then customize at will

    while (@curr<=@lastrow)

    begin

    UPDATE CLAIMS

    SET

    admission_date = NULL,

    discharge_date = NULL

    from claims join #ranges on member_id between startrange and endrange

    where rn=@curr and (admission_date is not null or discharge_date is not null)

    set @curr=@curr+1;

    end

    --create table claims(member_ID varchar(20), admission_date datetime, discharge_date datetime)

    Assuming you don't make that into one giant transaction - this should go a bit faster. Note that this is only one pass.

    Matt, I don't have the table "tally". How can I get it?

    --Nevermind: I see it is just a table of integers that should cover at least 48-90. This method of yours is very interesting. It appears to be working great except, I need the #ranges to include "startrange"-"endrange":

    09-10

    19-20

    29-30

    39-40

    49-50

    59-60

    69-70

    ...and so on.

    Without these ranges, the update overlooks member_ids that begin with, ie.

    59

    69

    Currently, it searches for

    member_id between '58' and '59'

    member_id between '59' and '5:'

    But, neither of these will pick up the member_id '5938489'

    However, this will

    member_id between '59' and '60'.

    I will manually enter these and test it.

  • Are your member Id's actually just made up of numbers even though they are stored varchar? If so - you could simply use the Tally table itself to generate the startrange and endrange. Something like:

    ;with AlphanumericCTE as (

    ;with AlphanumericCTE as (

    select

    right('0000'+cast(N as varchar(4)),4) digit,

    right('0000'+cast(N+1 as varchar(4)),4) nextdigit

    from tally where

    n between 0 and 999)

    select row_number() over (order by a1.digit) rn,

    a1.digit startrange,

    a1.nextdigit endrange

    --into #ranges

    from alphanumericCTE a1

    order by startrange

    You will still have to add on last update (for member_ID>'999')

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • The member ids sometimes do have alphanumeric characters but should never have any symbols, ie. "[" and ":"; so I like the first way you populated #ranges. I will add something to get in the other ranges I need.

    Your script finished running in 25:12 mins. It searched 1296 times, only 70 searches actually affected rows. The others found no match.

    Is there a way to narrow the #ranges to only those that are necessary to the data in CLAIMS tbl instead of going through all 1296 possibilities? That would speed it up tremendously. The member ids will vary by data source and I will never know exactly the combination of the 1st two characters until the new data source is loaded.

  • Here's an improved version for fixing the ranges:

    --drop table #ranges

    ;with Alpha1 as (select

    char(N) digit,

    Row_number() over (order by N ) rowid

    from tally where

    n between 48 and 57 or --digits 0 through 9

    n between 65 and 90

    ), --A through Z

    AlphanumericCTE as (

    selectRow_number() over (order by a1.digit,a2.digit ) rid,

    a1.digit+a2.digit strtrange

    from Alpha1 A1 Cross join Alpha1 A2

    ) --A through Z

    select row_number() over (order by acte1.strtrange,acte2.strtrange) rn,

    acte1.strtrange startrange,

    acte2.strtrange endrange

    from alphanumericCTE acte1 left join alphanumericCTE acte2

    on acte1.rid=acte2.rid-1

    order by startrange

    You will still need to run one last query after this loop with a > than the "max endrange".

    As to cutting down the number of ranges to run against - you could TRY adding an IF EXISTS beofre the update. Something like:

    IF EXISTS (select NULL

    from claims

    join #ranges on member_id between startrange and endrange

    where rn=@curr)

    UPDATE CLAIMS

    etc...

    Of course - this may actually slow things down. This may be a "TEST and see" scenario. OR - you can try finding all of the ranges FIRST (select Distinct left(member_ID,2) etc...), but again - I have a feeling that might be slower too.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt, thank you for the range table fix. I implemented that code and also the IF EXISTS and it ran at 28:43. It seems the fastest I can get as it is now, is to remove the index and run the updates as is which gets me to 24:17 mins; next is walking the clustered index which runs at 25:12 mins. That is a nice improvement over the inital 1:13 hrs.

    ------------------

    Thank you all, Matt, Jeff, Lynn, and Jonathan for the time you spent on this and for your suggestions.

    Have a Merry Christmas!

  • Thanks for the feedback!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Same here... thanks for the feedback. Sorry I got side tracked...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 31 through 39 (of 39 total)

You must be logged in to reply to this topic. Login to reply