August 19, 2008 at 9:54 am
I have a procedure that inserts and then updates roughly 2 million rows with roughly 8 million rows of resting data in the main table. The last run time was about 9 hours. The procedure currently runs in one batch, and execution plan looks good with clustered index scans and index seeks/scans.
My two thoughts were either to partition the tables:
table namerowcnt
ListDataCASS30,342,222 rows
ListData31,532,222
Prospect6,664,661
ProspectAddressVersion6,760,741
Or slice the batch up. The last @ListFileID gave over 2 million records and this procedure call is wrapped in a transaction. I was thinking that the calling code could mult-thread 20 calls to this procedure with ~100,000 per call and maybe 4 of them symultaniously based on some criteria like id, rowcount, zip blocks or whatever.
The disk is RAID10 for log/temp and RAID5 for data with 16 spindles for the RAID5, 2GB fiberchannel, EMC SAN. Indexes are maintained at between 85-95% fill, Stats are set to auto-update, clustered indexes are on ints, 4 dual core CPU's, max LOP
I want feeback on my 2 ideas and any other ideas you might have.
Thanks
CREATE PROCEDURE [dbo].[UpdateProspectFromListData]
(
@ListFileID int
)
AS
BEGIN
SET NOCOUNT ON;
create table #InsertedPAV(ProspectID int, ProspectAddressVersionID int);
INSERT INTO ProspectAddressVersion
(
ProspectID,
Address1,
Address2,
City,
State,
Zip,
Zip4,
AddressDuplicateCode,
dpvstatus
)
OUTPUT inserted.ProspectID, inserted.ProspectAddressVersionID into #InsertedPAV
SELECT
p.ProspectID,
ldc.Address1,
ldc.Address2,
ldc.City,
ldc.State,
ldc.Zip,
ldc.Zip4,
ldc.address_duplicate_code,
ldc.dpvstatus
FROM ListData ld
--already backupdated listdata so this is valid
INNER JOIN Prospect p on ld.ProspectID = p.ProspectID
INNER JOIN ListDataCass ldc on ldc.SourceRecordID = ld.SourceRecordID
INNER JOIN ListPriority lpp on lpp.ListCode = p.ListCode
LEFT OUTER JOIN TitlePriority tpp on tpp.Title = p.Title
INNER JOIN ListFile lf on lf.ListFileID = ld.ListFileID
INNER JOIN ListPriority lpld on lpld.ListCode = lf.ListCode
LEFT OUTER JOIN TitlePriority tpld on tpld.Title = ld.Title
WHERE ld.ListFileID = @ListFileID
AND
(
(p.BusinessName IS NULL and ld.BusinessName IS NOT NULL) OR
(p.FirstName IS NULL AND p.LastName IS NULL
AND ld.FirstName IS NOT NULL AND ld.LastName IS NOT NULL) OR
(ISNULL(tpp.Priority,10000) > ISNULL(tpld.Priority,10000)) OR
--don't update if current prospect has title and incoming does not
(tpp.Priority is null and (lpp.Priority > lpld.Priority)) OR
datediff(m, p.ModifiedDate,getdate()) > 12
);
UPDATE p SET
--FirstSourceRecordID = ld.SourceRecordID,
ListCode = lf.ListCode,
Priority = lpld.Priority,
ModifiedDate = getdate(),
Prefix = ld.Prefix,
FirstName = ld.Firstname,
MiddleName = ld.MiddleName,
LastName = ld.LastName,
Suffix = ld.Suffix,
Title = ld.Title,
Gender = ld.Gender,
Email = ld.Email,
Phone = ld.Phone,
SicCode = ISNULL(ld.SicCode,p.SicCode),
BusinessName = ISNULL(ld.BusinessName, p.BusinessName),
AnnualSales = ISNULL(ld.AnnualSales,p.AnnualSales),
YearsInBusiness = ISNULL(ld.YearsInBusiness, p.YearsInBusiness),
NaicsCode = ISNULL(ld.NaicsCode, p.NaicsCode),
TaxID = ISNULL(ld.TaxID, p.TaxID),
EmployeeCount = ISNULL(ld.EmployeeCount, p.EmployeeCount),
ListFileID = ld.ListFileID,
ProspectAddressVersionID = ipav.ProspectAddressVersionID
FROM #InsertedPAV ipav
--already backupdated listdata so this is valid
INNER JOIN Prospect p on ipav.ProspectID = p.ProspectID
INNER JOIN ListData ld on ld.ProspectID = p.ProspectID
INNER JOIN ListFile lf on lf.ListFileID = ld.ListFileID
INNER JOIN ListPriority lpld on lpld.ListCode = lf.ListCode;
UPDATE p SET
ModifiedDate = getdate(),
SicCode = ISNULL(p.SicCode, ld.SicCode),
BusinessName = ISNULL(p.BusinessName, ld.BusinessName),
AnnualSales = ISNULL(p.AnnualSales, ld.AnnualSales),
YearsInBusiness =ISNULL(p.YearsInBusiness, ld.YearsInBusiness),
NaicsCode = ISNULL(p.NaicsCode, ld.NaicsCode),
TaxID = ISNULL(p.TaxID, ld.TaxID),
EmployeeCount = ISNULL(p.EmployeeCount, ld.EmployeeCount)--,
FROM ListData ld
INNER JOIN Prospect p on ld.ProspectID = p.ProspectID
WHERE ld.ListFileID = @ListFileID
AND NOT EXISTS (SELECT * FROM #InsertedPAV ipav WHERE ipav.ProspectID = ld.ProspectID)
END
August 19, 2008 at 10:03 am
Can you post the query plan for the above so we can see what's happening?
Also, what's the sp_configure value for 'cost threshold for parallelism' ? Usually the default of 5 seconds is too low. The great SQL Shamans here recommend 20 seconds, which I've tested and it works pretty well.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply