Slow running Huge batches

  • I have a procedure that inserts and then updates roughly 2 million rows with roughly 8 million rows of resting data in the main table. The last run time was about 9 hours. The procedure currently runs in one batch, and execution plan looks good with clustered index scans and index seeks/scans.

    My two thoughts were either to partition the tables:

    table namerowcnt

    ListDataCASS30,342,222 rows

    ListData31,532,222

    Prospect6,664,661

    ProspectAddressVersion6,760,741

    Or slice the batch up. The last @ListFileID gave over 2 million records and this procedure call is wrapped in a transaction. I was thinking that the calling code could mult-thread 20 calls to this procedure with ~100,000 per call and maybe 4 of them symultaniously based on some criteria like id, rowcount, zip blocks or whatever.

    The disk is RAID10 for log/temp and RAID5 for data with 16 spindles for the RAID5, 2GB fiberchannel, EMC SAN. Indexes are maintained at between 85-95% fill, Stats are set to auto-update, clustered indexes are on ints, 4 dual core CPU's, max LOP

    I want feeback on my 2 ideas and any other ideas you might have.

    Thanks

    CREATE PROCEDURE [dbo].[UpdateProspectFromListData]

    (

    @ListFileID int

    )

    AS

    BEGIN

    SET NOCOUNT ON;

    create table #InsertedPAV(ProspectID int, ProspectAddressVersionID int);

    INSERT INTO ProspectAddressVersion

    (

    ProspectID,

    Address1,

    Address2,

    City,

    State,

    Zip,

    Zip4,

    AddressDuplicateCode,

    dpvstatus

    )

    OUTPUT inserted.ProspectID, inserted.ProspectAddressVersionID into #InsertedPAV

    SELECT

    p.ProspectID,

    ldc.Address1,

    ldc.Address2,

    ldc.City,

    ldc.State,

    ldc.Zip,

    ldc.Zip4,

    ldc.address_duplicate_code,

    ldc.dpvstatus

    FROM ListData ld

    --already backupdated listdata so this is valid

    INNER JOIN Prospect p on ld.ProspectID = p.ProspectID

    INNER JOIN ListDataCass ldc on ldc.SourceRecordID = ld.SourceRecordID

    INNER JOIN ListPriority lpp on lpp.ListCode = p.ListCode

    LEFT OUTER JOIN TitlePriority tpp on tpp.Title = p.Title

    INNER JOIN ListFile lf on lf.ListFileID = ld.ListFileID

    INNER JOIN ListPriority lpld on lpld.ListCode = lf.ListCode

    LEFT OUTER JOIN TitlePriority tpld on tpld.Title = ld.Title

    WHERE ld.ListFileID = @ListFileID

    AND

    (

    (p.BusinessName IS NULL and ld.BusinessName IS NOT NULL) OR

    (p.FirstName IS NULL AND p.LastName IS NULL

    AND ld.FirstName IS NOT NULL AND ld.LastName IS NOT NULL) OR

    (ISNULL(tpp.Priority,10000) > ISNULL(tpld.Priority,10000)) OR

    --don't update if current prospect has title and incoming does not

    (tpp.Priority is null and (lpp.Priority > lpld.Priority)) OR

    datediff(m, p.ModifiedDate,getdate()) > 12

    );

    UPDATE p SET

    --FirstSourceRecordID = ld.SourceRecordID,

    ListCode = lf.ListCode,

    Priority = lpld.Priority,

    ModifiedDate = getdate(),

    Prefix = ld.Prefix,

    FirstName = ld.Firstname,

    MiddleName = ld.MiddleName,

    LastName = ld.LastName,

    Suffix = ld.Suffix,

    Title = ld.Title,

    Gender = ld.Gender,

    Email = ld.Email,

    Phone = ld.Phone,

    SicCode = ISNULL(ld.SicCode,p.SicCode),

    BusinessName = ISNULL(ld.BusinessName, p.BusinessName),

    AnnualSales = ISNULL(ld.AnnualSales,p.AnnualSales),

    YearsInBusiness = ISNULL(ld.YearsInBusiness, p.YearsInBusiness),

    NaicsCode = ISNULL(ld.NaicsCode, p.NaicsCode),

    TaxID = ISNULL(ld.TaxID, p.TaxID),

    EmployeeCount = ISNULL(ld.EmployeeCount, p.EmployeeCount),

    ListFileID = ld.ListFileID,

    ProspectAddressVersionID = ipav.ProspectAddressVersionID

    FROM #InsertedPAV ipav

    --already backupdated listdata so this is valid

    INNER JOIN Prospect p on ipav.ProspectID = p.ProspectID

    INNER JOIN ListData ld on ld.ProspectID = p.ProspectID

    INNER JOIN ListFile lf on lf.ListFileID = ld.ListFileID

    INNER JOIN ListPriority lpld on lpld.ListCode = lf.ListCode;

    UPDATE p SET

    ModifiedDate = getdate(),

    SicCode = ISNULL(p.SicCode, ld.SicCode),

    BusinessName = ISNULL(p.BusinessName, ld.BusinessName),

    AnnualSales = ISNULL(p.AnnualSales, ld.AnnualSales),

    YearsInBusiness =ISNULL(p.YearsInBusiness, ld.YearsInBusiness),

    NaicsCode = ISNULL(p.NaicsCode, ld.NaicsCode),

    TaxID = ISNULL(p.TaxID, ld.TaxID),

    EmployeeCount = ISNULL(p.EmployeeCount, ld.EmployeeCount)--,

    FROM ListData ld

    INNER JOIN Prospect p on ld.ProspectID = p.ProspectID

    WHERE ld.ListFileID = @ListFileID

    AND NOT EXISTS (SELECT * FROM #InsertedPAV ipav WHERE ipav.ProspectID = ld.ProspectID)

    END

  • Can you post the query plan for the above so we can see what's happening?

    Also, what's the sp_configure value for 'cost threshold for parallelism' ? Usually the default of 5 seconds is too low. The great SQL Shamans here recommend 20 seconds, which I've tested and it works pretty well.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply