how to update millions of records quickly........

  • how to update millions of records quickly........

    I have a table that has an account no and an associated indicator , which is sometimes =00000

    Now the requirement is that when I get this whole data from Source, have to update all the indicator

    records where indicator =00000

    The update is done looking up following way:

    select the first 6 positions of every account where indicator =00000

    and find if you have any other row for the same account with indicator <>00000

    if YES then match the name for those 2 accounts found in above 2 steps, if the names matches then update the indicator = 0 with the indicator <>0

    and if you dont find any match put it in some other table for clease lateron....

    Here is some test data for this:

    actually its more than 50 million rows...any help?

    Thanks

    CREATE TABLE #t(

    [acno] [varchar](15) NULL,

    [indicator] [varchar](10) NULL,

    [name] [varchar](100) NULL

    )

    INSERT INTO #t ( acno,indicator,name) VALUES ( '126665', '000000', 'NEWDEP')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '12444501', '000000', 'MYNEW')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '12444504', '000777', 'MYNEW')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '133455', '9876', 'OTHERDEP')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000012', '123456', 'MYDEPT')

    INSERT INTO #t ( acno,indicator,name) VALUES ( '29000013', '123456', 'MYDEPT')

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • What would be your expected result based on the sample data?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I have to update all 00000 indicator with matching indicator found in other acno(6 digit) or I have to put it in an error file.

    here are my expected results for the sample data

    These are all updated rows having indicator <> 0000, and if it is 000 there is another acno(same initial 6 digits ), so they will take that accno's indicator...

    12444501000777MYNEW

    12444504000777MYNEW

    1334559876OTHERDEP

    29000010123456MYDEPT

    29000010123456MYDEPT

    29000010123456MYDEPT

    29000010123456MYDEPT

    29000010123456MYDEPT

    29000010123456MYDEPT

    29000010123456MYDEPT

    29000011123456MYDEPT

    29000011123456MYDEPT

    29000011123456MYDEPT

    29000011123456MYDEPT

    29000011123456MYDEPT

    29000011123456MYDEPT

    29000012123456MYDEPT

    29000013123456MYDEPT

    but this row is having 0000 and no matching acno(6 initial digit) so will go to an error table...

    126665000000NEWDEP

    actually this part is easy as I can get all the remaining with 000 and send to error, but befroe that I have to update the matching ones with valid indicator..

    hope its clear...

    Thanks

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • ssis learner__ (7/26/2010)


    how to update millions of records quickly........

    there is no precise solution/approach for this.

    following are the things you can try.

    1. use nolock( Warning :if possible)

    2. do it in peak-off hours.

    3. try to avoid cast/convert in values

    4. use batch approach.

    5. try to avoid like operator in conditions.

    and a lot more . but TEST TEST TEST .....................................................

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Hi

    hopefully this along the lines of what you want to do.....

    I strongly recommend you read the following article by Jeff Moden on this "quirky update".....and to follow all the rules if you intend to use it.

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    edit. correct link now posted

    regards Graham

    USE [tempDB]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[SOURCEDATA]') AND type in (N'U'))

    DROP TABLE SOURCEDATA

    CREATE TABLE SOURCEDATA(

    [acno] [varchar](15) NULL,

    [indicator] [varchar](10) NULL,

    [name] [varchar](100) NULL

    )

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '126665', '000000', 'NEWDEP')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '12444501', '000000', 'MYNEW')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '12444504', '000777', 'MYNEW')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '133455', '9876', 'OTHERDEP')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000010', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000011', '000000', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000012', '123456', 'MYDEPT')

    INSERT INTO SOURCEDATA ( acno,indicator,name) VALUES ( '29000013', '123456', 'MYDEPT')

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TEMPDATA]') AND type in (N'U'))

    DROP TABLE TEMPDATA

    CREATE TABLE [TEMPDATA](

    [acno] [varchar](15) NULL,

    [indicator] [varchar](10) NULL,

    [name] [varchar](100) NULL,

    [trim_acno] [varchar](15) NULL,

    [ID] [int] IDENTITY(1,1) NOT NULL

    )

    INSERT INTO dbo.TEMPDATA

    (acno, indicator, name, trim_acno)

    SELECT acno, indicator, name, LEFT(acno, 6)

    FROM dbo.SOURCEDATA

    ALTER TABLE [TEMPDATA] ADD CONSTRAINT [IX_TEMDATA] UNIQUE CLUSTERED

    (

    [trim_acno] ASC,

    [name] ASC,

    [indicator] DESC,

    [ID] ASC

    )

    DECLARE @TRIM_ACNO VARCHAR(15)

    DECLARE @NAME VARCHAR (100)

    DECLARE @INDICATOR VARCHAR (10)

    UPDATE DBO.TEMPDATA

    SET @INDICATOR = indicator = CASE WHEN trim_acno = @TRIM_ACNO AND [name] = @NAME

    THEN @INDICATOR

    ELSE INDICATOR

    END,

    @TRIM_ACNO = trim_acno,

    @NAME = [name]

    FROM dbo.TEMPDATA WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    GO

    SELECT acno, indicator, [name]

    FROM dbo.TEMPDATA

    SELECT acno, indicator, name

    FROM dbo.TEMPDATA

    WHERE (indicator = '000000')

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • You wouldn't want the quirky update method for this because it's not needed and therefore the performance hit would be unwarranted. It's a fairly simple join, but there are too many unanswered questions to know whether or not anything would be fast enough.

    To the OP ... you may get more poignant feedback if you post what query you're trying to run, how long it's taking and how long it needs to take (and I can't remember if you already gave information about your table's indexes.) That would give a good a good baseline for people to give performance improving suggestions.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • bteraberry (7/27/2010)


    You wouldn't want the quirky update method for this because it's not needed and therefore the performance hit would be unwarranted. It's a fairly simple join, but there are too many unanswered questions to know whether or not anything would be fast enough.

    To the OP ... you may get more poignant feedback if you post what query you're trying to run, how long it's taking and how long it needs to take (and I can't remember if you already gave information about your table's indexes.) That would give a good a good baseline for people to give performance improving suggestions.

    I am not aware that the OP has posted any details of what has been tried so far or any detail of indexes...only that a request of help was posted.

    I have suggested a method...may be not suitable at all, but I hope that the results were as requested...if not I am keen to learn why and also to learn a better method.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Great code Gah...works like a charm........

    Thanks a lot for your reply ...

    I tried just now for about 4 million and it worked fine...

    I will be trying on more rows after I get them from Source..

    Thanks again

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • ssis learner__ (7/27/2010)


    Great code Gah...works like a charm........

    Thanks a lot for your reply ...

    I tried just now for about 4 million and it worked fine...

    I will be trying on more rows after I get them from Source..

    Thanks again

    Please read the linked articles and make sure you understand them fully.

    I would also suggest that you research the views that this "quirky update" is unsuitable for production.....then decide

    Any credit due is entirely due to Jeff Moden's original thoughts and code

    regards gah

    ps ...would appreciate feedback on performance

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Wasn't trying to be overly critical ... just pointing out that the quirky update method isn't used for it's efficiency, it's used for when a regular set based approach won't work. The quirky update is an implicit cursor and it's fantastic for really tricky problems, but only when it's needed.

    In this case a simple join will work and perform much faster ...

    --for the "good" records ...

    ;WITH cteBase AS

    (

    SELECT acno,

    min(indicator) value

    FROM #t

    WHERE indicator <> '000000'

    GROUP BY acno

    )

    SELECT t.acno,

    coalesce(c.value, t.indicator) indicator,

    t.name

    FROM #t t

    JOIN cteBase c

    ON t.acno = c.acno

    ORDER BY t.acno

    --for the "bad" records ...

    ;WITH cteBase AS

    (

    SELECT DISTINCT acno

    FROM #t

    WHERE indicator <> '000000'

    )

    SELECT t.acno,

    t.indicator, --will always be null

    t.name

    FROM #t t

    LEFT JOIN cteBase c

    ON t.acno = c.acno

    WHERE c.acno IS NULL

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • bteraberry (7/27/2010)


    Wasn't trying to be overly critical ... just pointing out that the quirky update method isn't used for it's efficiency, it's used for when a regular set based approach won't work. The quirky update is an implicit cursor and it's fantastic for really tricky problems, but only when it's needed.

    In this case a simple join will work and perform much faster ...

    --for the "good" records ...

    ;WITH cteBase AS

    (

    SELECT acno,

    min(indicator) value

    FROM #t

    WHERE indicator <> '000000'

    GROUP BY acno

    )

    SELECT t.acno,

    coalesce(c.value, t.indicator) indicator,

    t.name

    FROM #t t

    JOIN cteBase c

    ON t.acno = c.acno

    ORDER BY t.acno

    --for the "bad" records ...

    ;WITH cteBase AS

    (

    SELECT DISTINCT acno

    FROM #t

    WHERE indicator <> '000000'

    )

    SELECT t.acno,

    t.indicator, --will always be null

    t.name

    FROM #t t

    LEFT JOIN cteBase c

    ON t.acno = c.acno

    WHERE c.acno IS NULL

    update statement required somewhere?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • gah (7/28/2010)


    update statement required somewhere?

    The OP said

    "Now the requirement is that when I get this whole data from Source, have to update all the indicator records where indicator =00000".

    To me that sounds more like an ETL than an actual update. The point was to show how to use basic joins to avoid an implicit cursor, not to match up exactly with vague requirements. Obviously the OP will need to modify whatever code is given to the actual nature of the system and process.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • ssis learner__ (7/27/2010)


    Great code Gah...works like a charm........

    Thanks a lot for your reply ...

    I tried just now for about 4 million and it worked fine...

    I will be trying on more rows after I get them from Source..

    Thanks again

    Oh.... be real careful. 4 million rows may update in mere seconds. 50 million rows may take a day or two IF your system reaches the "Tipping Point" that every system has. My recommendation would be to update only 4 million rows at a time (because you already know THAT works) and, yeah, you'd need a loop for that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/28/2010)


    you'd need a loop for that.

    or may be batch approach can make work smoother

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Bhuvnesh (7/28/2010)


    Jeff Moden (7/28/2010)


    you'd need a loop for that.

    or may be batch approach can make work smoother

    The loop would control "batches". That being said, what do you mean by a "batch approach"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply