July 26, 2010 at 3:02 pm
how to update millions of records quickly........
I have a table that has an account no and an associated indicator , which is sometimes =00000
Now the requirement is that when I get this whole data from Source, have to update all the indicator
records where indicator =00000
The update is done looking up following way:
select the first 6 positions of every account where indicator =00000
and find if you have any other row for the same account with indicator <>00000
if YES then match the name for those 2 accounts found in above 2 steps, if the names matches then update the indicator = 0 with the indicator <>0
and if you dont find any match put it in some other table for clease lateron....
Here is some test data for this:
actually its more than 50 million rows...any help?
Thanks
CREATE TABLE #t(
[acno] [varchar](15) NULL,
[indicator] [varchar](10) NULL,
[name] [varchar](100) NULL
)
INSERT INTO #t ( acno,indicator,name) VALUES ( '126665', '000000', 'NEWDEP')
INSERT INTO #t ( acno,indicator,name) VALUES ( '12444501', '000000', 'MYNEW')
INSERT INTO #t ( acno,indicator,name) VALUES ( '12444504', '000777', 'MYNEW')
INSERT INTO #t ( acno,indicator,name) VALUES ( '133455', '9876', 'OTHERDEP')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000012', '123456', 'MYDEPT')
INSERT INTO #t ( acno,indicator,name) VALUES ( '29000013', '123456', 'MYDEPT')
Thanks [/font]
July 26, 2010 at 4:48 pm
What would be your expected result based on the sample data?
July 27, 2010 at 6:10 am
I have to update all 00000 indicator with matching indicator found in other acno(6 digit) or I have to put it in an error file.
here are my expected results for the sample data
These are all updated rows having indicator <> 0000, and if it is 000 there is another acno(same initial 6 digits ), so they will take that accno's indicator...
12444501000777MYNEW
12444504000777MYNEW
1334559876OTHERDEP
29000010123456MYDEPT
29000010123456MYDEPT
29000010123456MYDEPT
29000010123456MYDEPT
29000010123456MYDEPT
29000010123456MYDEPT
29000010123456MYDEPT
29000011123456MYDEPT
29000011123456MYDEPT
29000011123456MYDEPT
29000011123456MYDEPT
29000011123456MYDEPT
29000011123456MYDEPT
29000012123456MYDEPT
29000013123456MYDEPT
but this row is having 0000 and no matching acno(6 initial digit) so will go to an error table...
126665000000NEWDEP
actually this part is easy as I can get all the remaining with 000 and send to error, but befroe that I have to update the matching ones with valid indicator..
hope its clear...
Thanks
Thanks [/font]
July 27, 2010 at 6:22 am
ssis learner__ (7/26/2010)
how to update millions of records quickly........
there is no precise solution/approach for this.
following are the things you can try.
1. use nolock( Warning :if possible)
2. do it in peak-off hours.
3. try to avoid cast/convert in values
4. use batch approach.
5. try to avoid like operator in conditions.
and a lot more . but TEST TEST TEST .....................................................
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
July 27, 2010 at 7:51 am
Hi
hopefully this along the lines of what you want to do.....
I strongly recommend you read the following article by Jeff Moden on this "quirky update".....and to follow all the rules if you intend to use it.
http://www.sqlservercentral.com/articles/T-SQL/68467/
edit. correct link now posted
regards Graham
USE [tempDB]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[SOURCEDATA]') AND type in (N'U'))
DROP TABLE SOURCEDATA
CREATE TABLE SOURCEDATA(
[acno] [varchar](15) NULL,
[indicator] [varchar](10) NULL,
[name] [varchar](100) NULL
)
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '126665', '000000', 'NEWDEP')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '12444501', '000000', 'MYNEW')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '12444504', '000777', 'MYNEW')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '133455', '9876', 'OTHERDEP')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000012', '123456', 'MYDEPT')
INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000013', '123456', 'MYDEPT')
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TEMPDATA]') AND type in (N'U'))
DROP TABLE TEMPDATA
CREATE TABLE [TEMPDATA](
[acno] [varchar](15) NULL,
[indicator] [varchar](10) NULL,
[name] [varchar](100) NULL,
[trim_acno] [varchar](15) NULL,
[ID] [int] IDENTITY(1,1) NOT NULL
)
INSERT INTO dbo.TEMPDATA
(acno, indicator, name, trim_acno)
SELECT acno, indicator, name, LEFT(acno, 6)
FROM dbo.SOURCEDATA
ALTER TABLE [TEMPDATA] ADD CONSTRAINT [IX_TEMDATA] UNIQUE CLUSTERED
(
[trim_acno] ASC,
[name] ASC,
[indicator] DESC,
[ID] ASC
)
DECLARE @TRIM_ACNO VARCHAR(15)
DECLARE @NAME VARCHAR (100)
DECLARE @INDICATOR VARCHAR (10)
UPDATE DBO.TEMPDATA
SET @INDICATOR = indicator = CASE WHEN trim_acno = @TRIM_ACNO AND [name] = @NAME
THEN @INDICATOR
ELSE INDICATOR
END,
@TRIM_ACNO = trim_acno,
@NAME = [name]
FROM dbo.TEMPDATA WITH (TABLOCKX)
OPTION (MAXDOP 1)
GO
SELECT acno, indicator, [name]
FROM dbo.TEMPDATA
SELECT acno, indicator, name
FROM dbo.TEMPDATA
WHERE (indicator = '000000')
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 27, 2010 at 12:46 pm
You wouldn't want the quirky update method for this because it's not needed and therefore the performance hit would be unwarranted. It's a fairly simple join, but there are too many unanswered questions to know whether or not anything would be fast enough.
To the OP ... you may get more poignant feedback if you post what query you're trying to run, how long it's taking and how long it needs to take (and I can't remember if you already gave information about your table's indexes.) That would give a good a good baseline for people to give performance improving suggestions.
July 27, 2010 at 2:44 pm
bteraberry (7/27/2010)
You wouldn't want the quirky update method for this because it's not needed and therefore the performance hit would be unwarranted. It's a fairly simple join, but there are too many unanswered questions to know whether or not anything would be fast enough.To the OP ... you may get more poignant feedback if you post what query you're trying to run, how long it's taking and how long it needs to take (and I can't remember if you already gave information about your table's indexes.) That would give a good a good baseline for people to give performance improving suggestions.
I am not aware that the OP has posted any details of what has been tried so far or any detail of indexes...only that a request of help was posted.
I have suggested a method...may be not suitable at all, but I hope that the results were as requested...if not I am keen to learn why and also to learn a better method.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 27, 2010 at 2:48 pm
Great code Gah...works like a charm........
Thanks a lot for your reply ...
I tried just now for about 4 million and it worked fine...
I will be trying on more rows after I get them from Source..
Thanks again
Thanks [/font]
July 27, 2010 at 3:01 pm
ssis learner__ (7/27/2010)
Great code Gah...works like a charm........Thanks a lot for your reply ...
I tried just now for about 4 million and it worked fine...
I will be trying on more rows after I get them from Source..
Thanks again
Please read the linked articles and make sure you understand them fully.
I would also suggest that you research the views that this "quirky update" is unsuitable for production.....then decide
Any credit due is entirely due to Jeff Moden's original thoughts and code
regards gah
ps ...would appreciate feedback on performance
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 27, 2010 at 4:24 pm
Wasn't trying to be overly critical ... just pointing out that the quirky update method isn't used for it's efficiency, it's used for when a regular set based approach won't work. The quirky update is an implicit cursor and it's fantastic for really tricky problems, but only when it's needed.
In this case a simple join will work and perform much faster ...
--for the "good" records ...
;WITH cteBase AS
(
SELECT acno,
min(indicator) value
FROM #t
WHERE indicator <> '000000'
GROUP BY acno
)
SELECT t.acno,
coalesce(c.value, t.indicator) indicator,
t.name
FROM #t t
JOIN cteBase c
ON t.acno = c.acno
ORDER BY t.acno
--for the "bad" records ...
;WITH cteBase AS
(
SELECT DISTINCT acno
FROM #t
WHERE indicator <> '000000'
)
SELECT t.acno,
t.indicator, --will always be null
t.name
FROM #t t
LEFT JOIN cteBase c
ON t.acno = c.acno
WHERE c.acno IS NULL
July 28, 2010 at 3:28 am
bteraberry (7/27/2010)
Wasn't trying to be overly critical ... just pointing out that the quirky update method isn't used for it's efficiency, it's used for when a regular set based approach won't work. The quirky update is an implicit cursor and it's fantastic for really tricky problems, but only when it's needed.In this case a simple join will work and perform much faster ...
--for the "good" records ...
;WITH cteBase AS
(
SELECT acno,
min(indicator) value
FROM #t
WHERE indicator <> '000000'
GROUP BY acno
)
SELECT t.acno,
coalesce(c.value, t.indicator) indicator,
t.name
FROM #t t
JOIN cteBase c
ON t.acno = c.acno
ORDER BY t.acno
--for the "bad" records ...
;WITH cteBase AS
(
SELECT DISTINCT acno
FROM #t
WHERE indicator <> '000000'
)
SELECT t.acno,
t.indicator, --will always be null
t.name
FROM #t t
LEFT JOIN cteBase c
ON t.acno = c.acno
WHERE c.acno IS NULL
update statement required somewhere?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 28, 2010 at 10:57 am
gah (7/28/2010)
update statement required somewhere?
The OP said
"Now the requirement is that when I get this whole data from Source, have to update all the indicator records where indicator =00000".
To me that sounds more like an ETL than an actual update. The point was to show how to use basic joins to avoid an implicit cursor, not to match up exactly with vague requirements. Obviously the OP will need to modify whatever code is given to the actual nature of the system and process.
July 28, 2010 at 10:56 pm
ssis learner__ (7/27/2010)
Great code Gah...works like a charm........Thanks a lot for your reply ...
I tried just now for about 4 million and it worked fine...
I will be trying on more rows after I get them from Source..
Thanks again
Oh.... be real careful. 4 million rows may update in mere seconds. 50 million rows may take a day or two IF your system reaches the "Tipping Point" that every system has. My recommendation would be to update only 4 million rows at a time (because you already know THAT works) and, yeah, you'd need a loop for that.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 28, 2010 at 11:19 pm
Jeff Moden (7/28/2010)
you'd need a loop for that.
or may be batch approach can make work smoother
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
July 29, 2010 at 10:29 pm
Bhuvnesh (7/28/2010)
Jeff Moden (7/28/2010)
you'd need a loop for that.or may be batch approach can make work smoother
The loop would control "batches". That being said, what do you mean by a "batch approach"?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply