how to update millions of records quickly........

Question

Post reply

how to update millions of records quickly........

sql_learner29

SSCrazy Eights

Points: 9107
More actions
July 26, 2010 at 3:02 pm

#227245

how to update millions of records quickly........
I have a table that has an account no and an associated indicator , which is sometimes =00000
Now the requirement is that when I get this whole data from Source, have to update all the indicator
records where indicator =00000
The update is done looking up following way:
select the first 6 positions of every account where indicator =00000
and find if you have any other row for the same account with indicator <>00000
if YES then match the name for those 2 accounts found in above 2 steps, if the names matches then update the indicator = 0 with the indicator <>0
and if you dont find any match put it in some other table for clease lateron....
Here is some test data for this:
actually its more than 50 million rows...any help?
Thanks
CREATE TABLE #t(
[acno] [varchar](15) NULL,
[indicator] [varchar](10) NULL,
[name] [varchar](100) NULL
)
INSERT INTO #t ( acno,indicator,name) VALUES ( '126665', '000000', 'NEWDEP')
INSERT INTO #t ( acno,indicator,name) VALUES ( '12444501', '000000', 'MYNEW')
INSERT INTO #t ( acno,indicator,name) VALUES ( '12444504', '000777', 'MYNEW')
INSERT INTO #t ( acno,indicator,name) VALUES ( '133455', '9876', 'OTHERDEP')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000012', '123456', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000013', '123456', 'MYDEPT')
[font="Comic Sans MS"]
---------------------------------------------------
Thanks [/font]

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

What would be your expected result based on the sample data?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 2

I have to update all 00000 indicator with matching indicator found in other acno(6 digit) or I have to put it in an error file.

here are my expected results for the sample data

These are all updated rows having indicator <> 0000, and if it is 000 there is another acno(same initial 6 digits ), so they will take that accno's indicator...

12444501000777MYNEW

12444504000777MYNEW

1334559876OTHERDEP

29000010123456MYDEPT

29000011123456MYDEPT

29000012123456MYDEPT

29000013123456MYDEPT

but this row is having 0000 and no matching acno(6 initial digit) so will go to an error table...

126665000000NEWDEP

actually this part is easy as I can get all the remaining with 000 and send to error, but befroe that I have to update the matching ones with valid indicator..

hope its clear...

Thanks

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 3

ssis learner__ (7/26/2010)
how to update millions of records quickly........

there is no precise solution/approach for this.

following are the things you can try.

1. use nolock( Warning :if possible)

2. do it in peak-off hours.

3. try to avoid cast/convert in values

4. use batch approach.

5. try to avoid like operator in conditions.

and a lot more . but TEST TEST TEST .....................................................

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 4

Hi

hopefully this along the lines of what you want to do.....

I strongly recommend you read the following article by Jeff Moden on this "quirky update".....and to follow all the rules if you intend to use it.

http://www.sqlservercentral.com/articles/T-SQL/68467/

edit. correct link now posted

regards Graham

USE [tempDB]

GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[SOURCEDATA]') AND type in (N'U'))

DROP TABLE SOURCEDATA

CREATE TABLE SOURCEDATA(

[acno] [varchar](15) NULL,

[indicator] [varchar](10) NULL,

[name] [varchar](100) NULL

)

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '126665', '000000', 'NEWDEP')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '12444501', '000000', 'MYNEW')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '12444504', '000777', 'MYNEW')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '133455', '9876', 'OTHERDEP')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000012', '123456', 'MYDEPT')

INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000013', '123456', 'MYDEPT')

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TEMPDATA]') AND type in (N'U'))

DROP TABLE TEMPDATA

CREATE TABLE [TEMPDATA](

[acno] [varchar](15) NULL,

[indicator] [varchar](10) NULL,

[name] [varchar](100) NULL,

[trim_acno] [varchar](15) NULL,

[ID] [int] IDENTITY(1,1) NOT NULL

)

INSERT INTO dbo.TEMPDATA

(acno, indicator, name, trim_acno)

SELECT acno, indicator, name, LEFT(acno, 6)

FROM dbo.SOURCEDATA

ALTER TABLE [TEMPDATA] ADD CONSTRAINT [IX_TEMDATA] UNIQUE CLUSTERED

(

[trim_acno] ASC,

[name] ASC,

[indicator] DESC,

[ID] ASC

)

DECLARE @TRIM_ACNO VARCHAR(15)

DECLARE @NAME VARCHAR (100)

DECLARE @INDICATOR VARCHAR (10)

UPDATE DBO.TEMPDATA

SET @INDICATOR = indicator = CASE WHEN trim_acno = @TRIM_ACNO AND [name] = @NAME

THEN @INDICATOR

ELSE INDICATOR

END,

@TRIM_ACNO = trim_acno,

@NAME = [name]

FROM dbo.TEMPDATA WITH (TABLOCKX)

OPTION (MAXDOP 1)

GO

SELECT acno, indicator, [name]

FROM dbo.TEMPDATA

SELECT acno, indicator, name

FROM dbo.TEMPDATA

WHERE (indicator = '000000')

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Ben Teraberry SSChampion Points: 11845 More actions · Answer 5

You wouldn't want the quirky update method for this because it's not needed and therefore the performance hit would be unwarranted. It's a fairly simple join, but there are too many unanswered questions to know whether or not anything would be fast enough.

To the OP ... you may get more poignant feedback if you post what query you're trying to run, how long it's taking and how long it needs to take (and I can't remember if you already gave information about your table's indexes.) That would give a good a good baseline for people to give performance improving suggestions.

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 6

bteraberry (7/27/2010)
You wouldn't want the quirky update method for this because it's not needed and therefore the performance hit would be unwarranted. It's a fairly simple join, but there are too many unanswered questions to know whether or not anything would be fast enough.
To the OP ... you may get more poignant feedback if you post what query you're trying to run, how long it's taking and how long it needs to take (and I can't remember if you already gave information about your table's indexes.) That would give a good a good baseline for people to give performance improving suggestions.

I am not aware that the OP has posted any details of what has been tried so far or any detail of indexes...only that a request of help was posted.

I have suggested a method...may be not suitable at all, but I hope that the results were as requested...if not I am keen to learn why and also to learn a better method.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 7

Great code Gah...works like a charm........

Thanks a lot for your reply ...

I tried just now for about 4 million and it worked fine...

I will be trying on more rows after I get them from Source..

Thanks again

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 8

ssis learner__ (7/27/2010)
Great code Gah...works like a charm........
Thanks a lot for your reply ...
I tried just now for about 4 million and it worked fine...
I will be trying on more rows after I get them from Source..
Thanks again

Please read the linked articles and make sure you understand them fully.

I would also suggest that you research the views that this "quirky update" is unsuitable for production.....then decide

Any credit due is entirely due to Jeff Moden's original thoughts and code

regards gah

ps ...would appreciate feedback on performance

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Ben Teraberry SSChampion Points: 11845 More actions · Answer 9

Wasn't trying to be overly critical ... just pointing out that the quirky update method isn't used for it's efficiency, it's used for when a regular set based approach won't work. The quirky update is an implicit cursor and it's fantastic for really tricky problems, but only when it's needed.

In this case a simple join will work and perform much faster ...

--for the "good" records ...

;WITH cteBase AS

(

SELECT acno,

min(indicator) value

FROM #t

WHERE indicator <> '000000'

GROUP BY acno

)

SELECT t.acno,

coalesce(c.value, t.indicator) indicator,

t.name

FROM #t t

JOIN cteBase c

ON t.acno = c.acno

ORDER BY t.acno

--for the "bad" records ...

;WITH cteBase AS

(

SELECT DISTINCT acno

FROM #t

WHERE indicator <> '000000'

)

SELECT t.acno,

t.indicator, --will always be null

t.name

FROM #t t

LEFT JOIN cteBase c

ON t.acno = c.acno

WHERE c.acno IS NULL

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 10

bteraberry (7/27/2010)
Wasn't trying to be overly critical ... just pointing out that the quirky update method isn't used for it's efficiency, it's used for when a regular set based approach won't work. The quirky update is an implicit cursor and it's fantastic for really tricky problems, but only when it's needed.
In this case a simple join will work and perform much faster ...
--for the "good" records ...
;WITH cteBase AS
(
SELECT acno,
min(indicator) value
FROM #t
WHERE indicator <> '000000'
GROUP BY acno
)
SELECT t.acno,
coalesce(c.value, t.indicator) indicator,
t.name
FROM #t t
JOIN cteBase c
ON t.acno = c.acno
ORDER BY t.acno
--for the "bad" records ...
;WITH cteBase AS
(
SELECT DISTINCT acno
FROM #t
WHERE indicator <> '000000'
)
SELECT t.acno,
t.indicator, --will always be null
t.name
FROM #t t
LEFT JOIN cteBase c
ON t.acno = c.acno
WHERE c.acno IS NULL

update statement required somewhere?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Ben Teraberry SSChampion Points: 11845 More actions · Answer 11

gah (7/28/2010)
update statement required somewhere?

The OP said

"Now the requirement is that when I get this whole data from Source, have to update all the indicator records where indicator =00000".

To me that sounds more like an ETL than an actual update. The point was to show how to use basic joins to avoid an implicit cursor, not to match up exactly with vague requirements. Obviously the OP will need to modify whatever code is given to the actual nature of the system and process.

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 12

ssis learner__ (7/27/2010)
Great code Gah...works like a charm........
Thanks a lot for your reply ...
I tried just now for about 4 million and it worked fine...
I will be trying on more rows after I get them from Source..
Thanks again

Oh.... be real careful. 4 million rows may update in mere seconds. 50 million rows may take a day or two IF your system reaches the "Tipping Point" that every system has. My recommendation would be to update only 4 million rows at a time (because you already know THAT works) and, yeah, you'd need a loop for that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 13

Jeff Moden (7/28/2010)
you'd need a loop for that.

or may be batch approach can make work smoother

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 14

Bhuvnesh (7/28/2010)
Jeff Moden (7/28/2010)
you'd need a loop for that.
or may be batch approach can make work smoother

The loop would control "batches". That being said, what do you mean by a "batch approach"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)