Insert is taking hours to complete

Question

Insert is taking hours to complete

dndaughtery

SSChampion

Points: 11290
More actions
March 26, 2009 at 7:30 am

#130145

I am running an insert where I'm performing an inner join between two large tables (13 million rows in each). I looked in the Activity Monitor and the process shows my insert as suspended and there is a select on the tempdb table that is running. Is this normal? I'm guessing that the select on the temp table may have something to do with the join? I have indexes on both tables for the join fields. Below is my query:
Insert into MedParSpecial( CaseID, CoInsuranceAmt, PPSIndicator, DayAdmit, DischargeDate, DischargeDestination, OutlierCodeID,
FiscalIntermediary,AdmissionToDateOfDeath,ProposedDRGForNextYear,BloodDeductable,CapitolTotalAmt,DisproportionateShareAmt,
IndirectMedicalEducationAmt,InpatientDeductable,OutlierAmt,PrimaryPayorAmount,WardDays)
Select p.CaseID,t.coins_amt,t.pps_ind,t.admitdate,t.dischdate,t.dischdest,t.outlier_co,t.inter_nbr,t.adm_dod,t.prop_drg,t.blood_ded,
t.cap_tot_am,t.dis_sh_amt,t.ime_amt,t.inpat_ded, t.outl_amt, t.pri_pay_am, t.warddays
from dbo.temp_medpar2007_hosp_costedcsv t
inner join CASES p on t.ID = p.tmp_OldPrimeKey

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 1

dndaughtery (3/26/2009)
I am running an insert where I'm performing an inner join between two large tables (13 million rows in each). I looked in the Activity Monitor and the process shows my insert as suspended and there is a select on the tempdb table that is running. Is this normal? I'm guessing that the select on the temp table may have something to do with the join? I have indexes on both tables for the join fields. Below is my query:
Insert into MedParSpecial( CaseID, CoInsuranceAmt, PPSIndicator, DayAdmit, DischargeDate, DischargeDestination, OutlierCodeID,
FiscalIntermediary,AdmissionToDateOfDeath,ProposedDRGForNextYear,BloodDeductable,CapitolTotalAmt,DisproportionateShareAmt,
IndirectMedicalEducationAmt,InpatientDeductable,OutlierAmt,PrimaryPayorAmount,WardDays)
Select p.CaseID,t.coins_amt,t.pps_ind,t.admitdate,t.dischdate,t.dischdest,t.outlier_co,t.inter_nbr,t.adm_dod,t.prop_drg,t.blood_ded,
t.cap_tot_am,t.dis_sh_amt,t.ime_amt,t.inpat_ded, t.outl_amt, t.pri_pay_am, t.warddays
from dbo.temp_medpar2007_hosp_costedcsv t
inner join CASES p on t.ID = p.tmp_OldPrimeKey

Hi,

First, the tempdb select that is occuring is probably because the server does not have enough memory, so it had to swap to the tempDB to do the join. This happens when you do huge queries, with too low memory on the machine.

Why do you need to insert 13 M rows in another table? (I consider the inner join will work between those 2 tables)

What are the indexes on these tables (please tell me both clustered indexes match for the join)

I'll try to find a piece of code I saw to insert in batches in another table, it will prevent from swapping on the tempDB, and will increase the performance you are experiencing, but for now, try to post your index definitions of both tables joined.

Cheers,

J-F

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 2

Edited : Added Florian's worship!

Try this, see what it does for you, tell me if it's any better.

Couldn't find my piece of code so I "Stole" Florian's, I hope he does not go mad, 😉

DECLARE @batch INT

DECLARE @all INT

DECLARE @current INT

SELECT @batch = 100,

@all = 0,

@current = 0

WHILE (1 = 1)

BEGIN

INSERT INTO MedParSpecial

(CaseID,

CoInsuranceAmt,

PPSIndicator,

DayAdmit,

DischargeDate,

DischargeDestination,

OutlierCodeID,

FiscalIntermediary,

AdmissionToDateOfDeath,

ProposedDRGForNextYear,

BloodDeductable,

CapitolTotalAmt,

DisproportionateShareAmt,

IndirectMedicalEducationAmt,

InpatientDeductable,

OutlierAmt,

PrimaryPayorAmount,

WardDays)

SELECT TOP ( @Batch ) p.CaseID,

t.coins_amt,

t.pps_ind,

t.admitdate,

t.dischdate,

t.dischdest,

t.outlier_co,

t.inter_nbr,

t.adm_dod,

t.prop_drg,

t.blood_ded,

t.cap_tot_am,

t.dis_sh_amt,

t.ime_amt,

t.inpat_ded,

t.outl_amt,

t.pri_pay_am,

t.warddays

FROM dbo.temp_medpar2007_hosp_costedcsv t

INNER JOIN CASES p

ON t.ID = p.tmp_OldPrimeKey

SELECT @current = @@ROWCOUNT,

@all = @all + @@ROWCOUNT

IF (@current = 0)

BREAK

END

PRINT 'Did: '

+ CONVERT(VARCHAR(10),@all) + ' Rows'

Cheers,

J-F

dndaughtery SSChampion Points: 11290 More actions · Answer 3

Well, the database file is sitting on a 1.5 terabyte drive all by itself and it hasn't taken up 10% of the space available.

Why do you need to insert 13 M rows in another table? (I consider the inner join will work between those 2 tables)

I had to import a large txt file so I loaded it into a table that I will remove later.

What are the indexes on these tables (please tell me both clustered indexes match for the join)

Yes, the one for the table that I will remove has a clustered index but the other tabler I'm joining to has a nonclustered index on the old ID (Matches the other ID). I am of the understanding that I can only have one clustered index per table?

I'll try to find a piece of code I saw to insert in batches in another table, it will prevent from swapping on the tempDB, and will increase the performance you are experiencing, but for now, try to post your index definitions of both tables joined.

dndaughtery SSChampion Points: 11290 More actions · Answer 4

Thanks for the code but I'm curious as to how it works. Will it not just keep importing the top 100 everytime? I don't see how it will go past the first 100 rows.

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 5

dndaughtery (3/26/2009)
Thanks for the code but I'm curious as to how it works. Will it not just keep importing the top 100 everytime? I don't see how it will go past the first 100 rows.

My bad, you are so right. Lemme get a coffee and fix this. I'll add a where not exists.

I shouldn't post stuff early in the morning...

Cheers,

J-F

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 6

DECLARE @batch INT

DECLARE @all INT

DECLARE @current INT

SELECT @batch = 100,

@all = 0,

@current = 0

WHILE (1 = 1)

BEGIN

INSERT INTO MedParSpecial

(CaseID,

CoInsuranceAmt,

PPSIndicator,

DayAdmit,

DischargeDate,

DischargeDestination,

OutlierCodeID,

FiscalIntermediary,

AdmissionToDateOfDeath,

ProposedDRGForNextYear,

BloodDeductable,

CapitolTotalAmt,

DisproportionateShareAmt,

IndirectMedicalEducationAmt,

InpatientDeductable,

OutlierAmt,

PrimaryPayorAmount,

WardDays)

SELECT TOP ( @Batch ) p.CaseID,

t.coins_amt,

t.pps_ind,

t.admitdate,

t.dischdate,

t.dischdest,

t.outlier_co,

t.inter_nbr,

t.adm_dod,

t.prop_drg,

t.blood_ded,

t.cap_tot_am,

t.dis_sh_amt,

t.ime_amt,

t.inpat_ded,

t.outl_amt,

t.pri_pay_am,

t.warddays

FROM dbo.temp_medpar2007_hosp_costedcsv t

INNER JOIN CASES p

ON t.ID = p.tmp_OldPrimeKey

LEFT JOIN MedParSpecial s on p.CaseID = s.CaseID

WHERE s.CaseID is null

SELECT @current = @@ROWCOUNT,

@all = @all + @@ROWCOUNT

IF (@current = 0)

BREAK

END

PRINT 'Did: '

+ CONVERT(VARCHAR(10),@all) + ' Rows'

What kind of performance do you get with this query? Run it just once (for let's say 1000 rows), and post the plan here, see if the left join did not ruin the performance of the query.

Cheers,

J-F