March 26, 2009 at 7:30 am
I am running an insert where I'm performing an inner join between two large tables (13 million rows in each). I looked in the Activity Monitor and the process shows my insert as suspended and there is a select on the tempdb table that is running. Is this normal? I'm guessing that the select on the temp table may have something to do with the join? I have indexes on both tables for the join fields. Below is my query:
Insert into MedParSpecial( CaseID, CoInsuranceAmt, PPSIndicator, DayAdmit, DischargeDate, DischargeDestination, OutlierCodeID,
FiscalIntermediary,AdmissionToDateOfDeath,ProposedDRGForNextYear,BloodDeductable,CapitolTotalAmt,DisproportionateShareAmt,
IndirectMedicalEducationAmt,InpatientDeductable,OutlierAmt,PrimaryPayorAmount,WardDays)
Select p.CaseID,t.coins_amt,t.pps_ind,t.admitdate,t.dischdate,t.dischdest,t.outlier_co,t.inter_nbr,t.adm_dod,t.prop_drg,t.blood_ded,
t.cap_tot_am,t.dis_sh_amt,t.ime_amt,t.inpat_ded, t.outl_amt, t.pri_pay_am, t.warddays
from dbo.temp_medpar2007_hosp_costedcsv t
inner join CASES p on t.ID = p.tmp_OldPrimeKey
March 26, 2009 at 7:43 am
dndaughtery (3/26/2009)
I am running an insert where I'm performing an inner join between two large tables (13 million rows in each). I looked in the Activity Monitor and the process shows my insert as suspended and there is a select on the tempdb table that is running. Is this normal? I'm guessing that the select on the temp table may have something to do with the join? I have indexes on both tables for the join fields. Below is my query:
Insert into MedParSpecial( CaseID, CoInsuranceAmt, PPSIndicator, DayAdmit, DischargeDate, DischargeDestination, OutlierCodeID,
FiscalIntermediary,AdmissionToDateOfDeath,ProposedDRGForNextYear,BloodDeductable,CapitolTotalAmt,DisproportionateShareAmt,
IndirectMedicalEducationAmt,InpatientDeductable,OutlierAmt,PrimaryPayorAmount,WardDays)
Select p.CaseID,t.coins_amt,t.pps_ind,t.admitdate,t.dischdate,t.dischdest,t.outlier_co,t.inter_nbr,t.adm_dod,t.prop_drg,t.blood_ded,
t.cap_tot_am,t.dis_sh_amt,t.ime_amt,t.inpat_ded, t.outl_amt, t.pri_pay_am, t.warddays
from dbo.temp_medpar2007_hosp_costedcsv t
inner join CASES p on t.ID = p.tmp_OldPrimeKey
Hi,
First, the tempdb select that is occuring is probably because the server does not have enough memory, so it had to swap to the tempDB to do the join. This happens when you do huge queries, with too low memory on the machine.
Why do you need to insert 13 M rows in another table? (I consider the inner join will work between those 2 tables)
What are the indexes on these tables (please tell me both clustered indexes match for the join)
I'll try to find a piece of code I saw to insert in batches in another table, it will prevent from swapping on the tempDB, and will increase the performance you are experiencing, but for now, try to post your index definitions of both tables joined.
Cheers,
J-F
March 26, 2009 at 7:51 am
Edited : Added Florian's worship!
Try this, see what it does for you, tell me if it's any better.
Couldn't find my piece of code so I "Stole" Florian's, I hope he does not go mad, 😉
DECLARE @batch INT
DECLARE @all INT
DECLARE @current INT
SELECT @batch = 100,
@all = 0,
@current = 0
WHILE (1 = 1)
BEGIN
INSERT INTO MedParSpecial
(CaseID,
CoInsuranceAmt,
PPSIndicator,
DayAdmit,
DischargeDate,
DischargeDestination,
OutlierCodeID,
FiscalIntermediary,
AdmissionToDateOfDeath,
ProposedDRGForNextYear,
BloodDeductable,
CapitolTotalAmt,
DisproportionateShareAmt,
IndirectMedicalEducationAmt,
InpatientDeductable,
OutlierAmt,
PrimaryPayorAmount,
WardDays)
SELECT TOP ( @Batch ) p.CaseID,
t.coins_amt,
t.pps_ind,
t.admitdate,
t.dischdate,
t.dischdest,
t.outlier_co,
t.inter_nbr,
t.adm_dod,
t.prop_drg,
t.blood_ded,
t.cap_tot_am,
t.dis_sh_amt,
t.ime_amt,
t.inpat_ded,
t.outl_amt,
t.pri_pay_am,
t.warddays
FROM dbo.temp_medpar2007_hosp_costedcsv t
INNER JOIN CASES p
ON t.ID = p.tmp_OldPrimeKey
SELECT @current = @@ROWCOUNT,
@all = @all + @@ROWCOUNT
IF (@current = 0)
BREAK
END
PRINT 'Did: '
+ CONVERT(VARCHAR(10),@all) + ' Rows'
Cheers,
J-F
March 26, 2009 at 7:53 am
Well, the database file is sitting on a 1.5 terabyte drive all by itself and it hasn't taken up 10% of the space available.
Why do you need to insert 13 M rows in another table? (I consider the inner join will work between those 2 tables)
I had to import a large txt file so I loaded it into a table that I will remove later.
What are the indexes on these tables (please tell me both clustered indexes match for the join)
Yes, the one for the table that I will remove has a clustered index but the other tabler I'm joining to has a nonclustered index on the old ID (Matches the other ID). I am of the understanding that I can only have one clustered index per table?
I'll try to find a piece of code I saw to insert in batches in another table, it will prevent from swapping on the tempDB, and will increase the performance you are experiencing, but for now, try to post your index definitions of both tables joined.
March 26, 2009 at 8:03 am
Thanks for the code but I'm curious as to how it works. Will it not just keep importing the top 100 everytime? I don't see how it will go past the first 100 rows.
March 26, 2009 at 8:08 am
dndaughtery (3/26/2009)
Thanks for the code but I'm curious as to how it works. Will it not just keep importing the top 100 everytime? I don't see how it will go past the first 100 rows.
My bad, you are so right. Lemme get a coffee and fix this. I'll add a where not exists.
I shouldn't post stuff early in the morning...
Cheers,
J-F
March 26, 2009 at 8:13 am
DECLARE @batch INT
DECLARE @all INT
DECLARE @current INT
SELECT @batch = 100,
@all = 0,
@current = 0
WHILE (1 = 1)
BEGIN
INSERT INTO MedParSpecial
(CaseID,
CoInsuranceAmt,
PPSIndicator,
DayAdmit,
DischargeDate,
DischargeDestination,
OutlierCodeID,
FiscalIntermediary,
AdmissionToDateOfDeath,
ProposedDRGForNextYear,
BloodDeductable,
CapitolTotalAmt,
DisproportionateShareAmt,
IndirectMedicalEducationAmt,
InpatientDeductable,
OutlierAmt,
PrimaryPayorAmount,
WardDays)
SELECT TOP ( @Batch ) p.CaseID,
t.coins_amt,
t.pps_ind,
t.admitdate,
t.dischdate,
t.dischdest,
t.outlier_co,
t.inter_nbr,
t.adm_dod,
t.prop_drg,
t.blood_ded,
t.cap_tot_am,
t.dis_sh_amt,
t.ime_amt,
t.inpat_ded,
t.outl_amt,
t.pri_pay_am,
t.warddays
FROM dbo.temp_medpar2007_hosp_costedcsv t
INNER JOIN CASES p
ON t.ID = p.tmp_OldPrimeKey
LEFT JOIN MedParSpecial s on p.CaseID = s.CaseID
WHERE s.CaseID is null
SELECT @current = @@ROWCOUNT,
@all = @all + @@ROWCOUNT
IF (@current = 0)
BREAK
END
PRINT 'Did: '
+ CONVERT(VARCHAR(10),@all) + ' Rows'
What kind of performance do you get with this query? Run it just once (for let's say 1000 rows), and post the plan here, see if the left join did not ruin the performance of the query.
Cheers,
J-F
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply