August 19, 2008 at 9:54 am
I have a procedure that inserts and then updates roughly 2 million rows with roughly 8 million rows of resting data in the main table. The last run time was about 9 hours. The procedure currently runs in one batch, and execution plan looks good with clustered index scans and index seeks/scans.
My two thoughts were either to partition the tables:
table namerowcnt
ListDataCASS30,342,222 rows
ListData31,532,222
Prospect6,664,661
ProspectAddressVersion6,760,741
Or slice the batch up. The last @ListFileID gave over 2 million records and this procedure call is wrapped in a transaction. I was thinking that the calling code could mult-thread 20 calls to this procedure with ~100,000 per call and maybe 4 of them symultaniously based on some criteria like id, rowcount, zip blocks or whatever.
The disk is RAID10 for log/temp and RAID5 for data with 16 spindles for the RAID5, 2GB fiberchannel, EMC SAN. Indexes are maintained at between 85-95% fill, Stats are set to auto-update, clustered indexes are on ints, 4 dual core CPU's, max LOP
I want feeback on my 2 ideas and any other ideas you might have.
Thanks
CREATE PROCEDURE [dbo].[UpdateProspectFromListData]
(
@ListFileID int
)
AS
BEGIN
SET NOCOUNT ON;
create table #InsertedPAV(ProspectID int, ProspectAddressVersionID int);
INSERT INTO ProspectAddressVersion
(
ProspectID,
Address1,
Address2,
City,
State,
Zip,
Zip4,
AddressDuplicateCode,
dpvstatus
)
OUTPUT inserted.ProspectID, inserted.ProspectAddressVersionID into #InsertedPAV
SELECT
p.ProspectID,
ldc.Address1,
ldc.Address2,
ldc.City,
ldc.State,
ldc.Zip,
ldc.Zip4,
ldc.address_duplicate_code,
ldc.dpvstatus
FROM ListData ld
--already backupdated listdata so this is valid
INNER JOIN Prospect p on ld.ProspectID = p.ProspectID
INNER JOIN ListDataCass ldc on ldc.SourceRecordID = ld.SourceRecordID
INNER JOIN ListPriority lpp on lpp.ListCode = p.ListCode
LEFT OUTER JOIN TitlePriority tpp on tpp.Title = p.Title
INNER JOIN ListFile lf on lf.ListFileID = ld.ListFileID
INNER JOIN ListPriority lpld on lpld.ListCode = lf.ListCode
LEFT OUTER JOIN TitlePriority tpld on tpld.Title = ld.Title
WHERE ld.ListFileID = @ListFileID
AND
(
(p.BusinessName IS NULL and ld.BusinessName IS NOT NULL) OR
(p.FirstName IS NULL AND p.LastName IS NULL
AND ld.FirstName IS NOT NULL AND ld.LastName IS NOT NULL) OR
(ISNULL(tpp.Priority,10000) > ISNULL(tpld.Priority,10000)) OR
--don't update if current prospect has title and incoming does not
(tpp.Priority is null and (lpp.Priority > lpld.Priority)) OR
datediff(m, p.ModifiedDate,getdate()) > 12
);
UPDATE p SET
--FirstSourceRecordID = ld.SourceRecordID,
ListCode = lf.ListCode,
Priority = lpld.Priority,
ModifiedDate = getdate(),
Prefix = ld.Prefix,
FirstName = ld.Firstname,
MiddleName = ld.MiddleName,
LastName = ld.LastName,
Suffix = ld.Suffix,
Title = ld.Title,
Gender = ld.Gender,
Email = ld.Email,
Phone = ld.Phone,
SicCode = ISNULL(ld.SicCode,p.SicCode),
BusinessName = ISNULL(ld.BusinessName, p.BusinessName),
AnnualSales = ISNULL(ld.AnnualSales,p.AnnualSales),
YearsInBusiness = ISNULL(ld.YearsInBusiness, p.YearsInBusiness),
NaicsCode = ISNULL(ld.NaicsCode, p.NaicsCode),
TaxID = ISNULL(ld.TaxID, p.TaxID),
EmployeeCount = ISNULL(ld.EmployeeCount, p.EmployeeCount),
ListFileID = ld.ListFileID,
ProspectAddressVersionID = ipav.ProspectAddressVersionID
FROM #InsertedPAV ipav
--already backupdated listdata so this is valid
INNER JOIN Prospect p on ipav.ProspectID = p.ProspectID
INNER JOIN ListData ld on ld.ProspectID = p.ProspectID
INNER JOIN ListFile lf on lf.ListFileID = ld.ListFileID
INNER JOIN ListPriority lpld on lpld.ListCode = lf.ListCode;
UPDATE p SET
ModifiedDate = getdate(),
SicCode = ISNULL(p.SicCode, ld.SicCode),
BusinessName = ISNULL(p.BusinessName, ld.BusinessName),
AnnualSales = ISNULL(p.AnnualSales, ld.AnnualSales),
YearsInBusiness =ISNULL(p.YearsInBusiness, ld.YearsInBusiness),
NaicsCode = ISNULL(p.NaicsCode, ld.NaicsCode),
TaxID = ISNULL(p.TaxID, ld.TaxID),
EmployeeCount = ISNULL(p.EmployeeCount, ld.EmployeeCount)--,
FROM ListData ld
INNER JOIN Prospect p on ld.ProspectID = p.ProspectID
WHERE ld.ListFileID = @ListFileID
AND NOT EXISTS (SELECT * FROM #InsertedPAV ipav WHERE ipav.ProspectID = ld.ProspectID)
END
August 19, 2008 at 10:03 am
Can you post the query plan for the above so we can see what's happening?
Also, what's the sp_configure value for 'cost threshold for parallelism' ? Usually the default of 5 seconds is too low. The great SQL Shamans here recommend 20 seconds, which I've tested and it works pretty well.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy