December 4, 2008 at 7:32 pm
Heh... that code never get's past the first 10000... keeps updating the first 10000 over and over and over.
Lemme build a test table, write a bit of code, and I'll be back.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 5, 2008 at 8:55 am
There are some things you might want to change on your last snippet, even before Jeff gives his code back.
First off, your temp table shouldn't have the key as varchar(50), since in the original table the key is int. That means you're converting everything all over the place.
Second, you need a way to progress along the temp table or (as Jeff said) you keep updating the same rows forever. Get a counter and update it.
The following snippet is your code, slightly modified (and untested) to run with these changes (note also that I use SELECT to set the variables, as BOL indicates SET is deprecated) :
CREATE TABLE #tmp1 (id INT IDENTITY(1,1)PRIMARY KEY CLUSTERED, claim_key INT)
CREATE NONCLUSTERED INDEX IX_TMP1_claimkey ON #tmp1(claim_key ASC);
INSERT INTO #tmp1 SELECT claim_key FROM dbo.CLAIMS WHERE admission_date IS NOT NULL;
INSERT INTO #tmp1 SELECT T1.claim_key
FROM dbo.CLAIMS T1 LEFT OUTER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key
WHERE T1.discharge_date IS NOT NULL AND T2.claim_key IS NULL;
DECLARE @BatchSize INT, @BatchCounter INT;
SELECT @BatchSize = 10000, @BatchCounter = 1;
WHILE EXISTS(SELECT 1 FROM dbo.CLAIMS WHERE admission_date IS NOT NULL)
BEGIN
UPDATE T1
SET admission_date=NULL, discharge_date=NULL
FROM dbo.CLAIMS T1 INNER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key
WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize;
SELECT @BatchCounter = @BatchCounter + @BatchSize
END
That should help a bit.
December 5, 2008 at 12:26 pm
Jonathan Melo (12/5/2008)
There are some things you might want to change on your last snippet, even before Jeff gives his code back.First off, your temp table shouldn't have the key as varchar(50), since in the original table the key is int. That means you're converting everything all over the place.
Second, you need a way to progress along the temp table or (as Jeff said) you keep updating the same rows forever. Get a counter and update it.
The following snippet is your code, slightly modified (and untested) to run with these changes (note also that I use SELECT to set the variables, as BOL indicates SET is deprecated) :
CREATE TABLE #tmp1 (id INT IDENTITY(1,1)PRIMARY KEY CLUSTERED, claim_key INT)
CREATE NONCLUSTERED INDEX IX_TMP1_claimkey ON #tmp1(claim_key ASC);
INSERT INTO #tmp1 SELECT claim_key FROM dbo.CLAIMS WHERE admission_date IS NOT NULL;
INSERT INTO #tmp1 SELECT T1.claim_key
FROM dbo.CLAIMS T1 LEFT OUTER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key
WHERE T1.discharge_date IS NOT NULL AND T2.claim_key IS NULL;
DECLARE @BatchSize INT, @BatchCounter INT;
SELECT @BatchSize = 10000, @BatchCounter = 1;
WHILE EXISTS(SELECT 1 FROM dbo.CLAIMS WHERE admission_date IS NOT NULL)
BEGIN
UPDATE T1
SET admission_date=NULL, discharge_date=NULL
FROM dbo.CLAIMS T1 INNER JOIN #tmp1 T2 ON T1.claim_key=T2.claim_key
WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize;
SELECT @BatchCounter = @BatchCounter + @BatchSize
END
Thanks Jonathan. I made your recommended changes and reran it. I killed it after it ran for 1:58 hrs. It ran 186 batches of 10,000 rows before stopping.
I made two changes to your code above:
Added "-1" and changed T1.id to T2.id in this line:
WHERE T2.id BETWEEN @BatchCounter AND @BatchCounter+@BatchSize-1;
-Doug
December 5, 2008 at 12:50 pm
Matt Miller (12/3/2008)
Here's an example of walking the clustered index:
drop table #ranges
;with AlphanumericCTE as (
select
char(N) digit,
char(N+1) nextdigit
from tally where
n between 48 and 57 or --digits 0 through 9
n between 65 and 90) --A through Z
select row_number() over (order by a1.digit,a2.digit) rn,
a1.digit+a2.digit startrange,
a1.digit+a2.nextdigit endrange
into #ranges
from alphanumericCTE a1 cross join alphanumericCTE a2
order by startrange
declare @curr int
declare @lastrow int
set @curr=1;
select @lastrow=max(rn) from #ranges
--walk through the ranges bsaed on the first two digits of the member_ID
--presumes that the spread is fairly even - if not- then customize at will
while (@curr<=@lastrow)
begin
UPDATE CLAIMS
SET
admission_date = NULL,
discharge_date = NULL
from claims join #ranges on member_id between startrange and endrange
where rn=@curr and (admission_date is not null or discharge_date is not null)
set @curr=@curr+1;
end
--create table claims(member_ID varchar(20), admission_date datetime, discharge_date datetime)
Assuming you don't make that into one giant transaction - this should go a bit faster. Note that this is only one pass.
Matt, I don't have the table "tally". How can I get it?
--Nevermind: I see it is just a table of integers that should cover at least 48-90. This method of yours is very interesting. It appears to be working great except, I need the #ranges to include "startrange"-"endrange":
09-10
19-20
29-30
39-40
49-50
59-60
69-70
...and so on.
Without these ranges, the update overlooks member_ids that begin with, ie.
59
69
Currently, it searches for
member_id between '58' and '59'
member_id between '59' and '5:'
But, neither of these will pick up the member_id '5938489'
However, this will
member_id between '59' and '60'.
I will manually enter these and test it.
December 5, 2008 at 2:45 pm
Are your member Id's actually just made up of numbers even though they are stored varchar? If so - you could simply use the Tally table itself to generate the startrange and endrange. Something like:
;with AlphanumericCTE as (
;with AlphanumericCTE as (
select
right('0000'+cast(N as varchar(4)),4) digit,
right('0000'+cast(N+1 as varchar(4)),4) nextdigit
from tally where
n between 0 and 999)
select row_number() over (order by a1.digit) rn,
a1.digit startrange,
a1.nextdigit endrange
--into #ranges
from alphanumericCTE a1
order by startrange
You will still have to add on last update (for member_ID>'999')
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 5, 2008 at 4:02 pm
The member ids sometimes do have alphanumeric characters but should never have any symbols, ie. "[" and ":"; so I like the first way you populated #ranges. I will add something to get in the other ranges I need.
Your script finished running in 25:12 mins. It searched 1296 times, only 70 searches actually affected rows. The others found no match.
Is there a way to narrow the #ranges to only those that are necessary to the data in CLAIMS tbl instead of going through all 1296 possibilities? That would speed it up tremendously. The member ids will vary by data source and I will never know exactly the combination of the 1st two characters until the new data source is loaded.
December 5, 2008 at 5:01 pm
Here's an improved version for fixing the ranges:
--drop table #ranges
;with Alpha1 as (select
char(N) digit,
Row_number() over (order by N ) rowid
from tally where
n between 48 and 57 or --digits 0 through 9
n between 65 and 90
), --A through Z
AlphanumericCTE as (
selectRow_number() over (order by a1.digit,a2.digit ) rid,
a1.digit+a2.digit strtrange
from Alpha1 A1 Cross join Alpha1 A2
) --A through Z
select row_number() over (order by acte1.strtrange,acte2.strtrange) rn,
acte1.strtrange startrange,
acte2.strtrange endrange
from alphanumericCTE acte1 left join alphanumericCTE acte2
on acte1.rid=acte2.rid-1
order by startrange
You will still need to run one last query after this loop with a > than the "max endrange".
As to cutting down the number of ranges to run against - you could TRY adding an IF EXISTS beofre the update. Something like:
IF EXISTS (select NULL
from claims
join #ranges on member_id between startrange and endrange
where rn=@curr)
UPDATE CLAIMS
etc...
Of course - this may actually slow things down. This may be a "TEST and see" scenario. OR - you can try finding all of the ranges FIRST (select Distinct left(member_ID,2) etc...), but again - I have a feeling that might be slower too.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 8, 2008 at 9:46 am
Matt, thank you for the range table fix. I implemented that code and also the IF EXISTS and it ran at 28:43. It seems the fastest I can get as it is now, is to remove the index and run the updates as is which gets me to 24:17 mins; next is walking the clustered index which runs at 25:12 mins. That is a nice improvement over the inital 1:13 hrs.
------------------
Thank you all, Matt, Jeff, Lynn, and Jonathan for the time you spent on this and for your suggestions.
Have a Merry Christmas!
December 8, 2008 at 10:23 am
Thanks for the feedback!
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 8, 2008 at 7:45 pm
Same here... thanks for the feedback. Sorry I got side tracked...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 31 through 39 (of 39 total)
You must be logged in to reply to this topic. Login to reply