Tempdb is full for Large Table Joins

Question

Tempdb is full for Large Table Joins

facticatech

Ten Centuries

Points: 1302
More actions
March 25, 2014 at 7:15 am

#305585

Hi,
I'm joining few large tables for loading historical load. 4 of these tables are about 10-15 million rows and two tables are very huge (30 million+).
The tables are indexed and the query is using the index.
I have put some filter conditions to load only 6 months of data and the final data will be in the range 2 million rows.
Everytime i run this query the tempdb gets full. (utilizing 175 gb)
tempdb and the db where i run this query are in simple recovery model.
I have attached the query plan. Please let me know what do I need to do when dealing with such large data sets.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398880 More actions · Answer 1

I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.

First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.

You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.

Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.

If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101087 More actions · Answer 2

Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.
First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.

I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ScottPletcher SSC Guru Points: 101087 More actions · Answer 3

Hard to provide more details without the rest of the query and with no index info on the tables.

Edit: Looks like a covering index is needed on SubOrderTrans at least, but stats from below will still be extremely helpful.

From a very quick look I did, it seems all the indexes are in one db ("ODS").

Please run this query in that db and post the results --preferably in a spreadsheet so they are easy to load :-).

SET DEADLOCK_PRIORITY LOW

DECLARE @list_missing_indexes bit

DECLARE @table_name_pattern sysname

SET @list_missing_indexes = 0

SET @table_name_pattern = '%'

PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

--SELECT create_date FROM sys.databases WITH (NOLOCK) WHERE name = N'tempdb'

IF @list_missing_indexes = 1

BEGIN

SELECT

GETDATE() AS capture_date,

DB_NAME(mid.database_id) AS Db_Name,

OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,

mid.equality_columns, mid.inequality_columns, mid.included_columns,

migs.*,

mid.statement, mid.object_id, mid.index_handle

FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)

LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON

mig.index_handle = mid.index_handle

LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON

migs.group_handle = mig.index_group_handle

--order by

--DB_NAME, Table_Name, equality_columns

WHERE

1 = 1

AND mid.database_id = DB_ID()

AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern

ORDER BY

--avg_total_user_cost * (user_seeks + user_scans) DESC,

Db_Name, Table_Name, equality_columns, inequality_columns

END --IF

-- list index usage stats (seeks, scans, etc.)

SELECT

ius2.row_num, DB_NAME() AS db_name,

i.name AS index_name,

OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,

i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

dps.row_count,

SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

fk.Reference_Count AS fk_ref_count,

FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

ca1.sql_up_days AS days_in_use,

ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

FROM sys.indexes i WITH (NOLOCK)

INNER JOIN sys.objects o WITH (NOLOCK) ON

o.object_id = i.object_id

CROSS APPLY (

SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

) AS ca1

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal > 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

ic.key_ordinal

FOR XML PATH('')

) AS key_cols (key_cols)

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal = 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

COL_NAME(object_id, ic.column_id)

FOR XML PATH('')

) AS nonkey_cols (nonkey_cols)

LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

dps.object_id = i.object_id AND

dps.index_id = i.index_id

LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

ius.database_id = DB_ID() AND

ius.object_id = i.object_id AND

ius.index_id = i.index_id

LEFT OUTER JOIN (

SELECT

database_id, object_id, MAX(user_scans) AS user_scans,

ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

WHERE

database_id = DB_ID()

--AND index_id > 0

GROUP BY

database_id, object_id

) AS ius2 ON

ius2.database_id = DB_ID() AND

ius2.object_id = i.object_id

LEFT OUTER JOIN (

SELECT

referenced_object_id, COUNT(*) AS Reference_Count

FROM sys.foreign_keys

WHERE

is_disabled = 0

GROUP BY

referenced_object_id

) AS fk ON

fk.referenced_object_id = i.object_id

WHERE

--EXISTS(SELECT 1 FROM sys.indexes i2 WHERE i2.object_id = i.object_id AND i2.index_id = 0) AND --$T

i.object_id > 100 AND

i.is_hypothetical = 0 AND

i.type IN (0, 1, 2) AND

o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

(

o.name LIKE @table_name_pattern AND

o.name NOT LIKE 'dtprop%' AND

o.name NOT LIKE 'filestream[_]' AND

o.name NOT LIKE 'MSpeer%' AND

o.name NOT LIKE 'MSpub%' AND

--o.name NOT LIKE 'queue[_]%' AND

o.name NOT LIKE 'sys%' AND

)

--AND OBJECT_NAME(i.object_id, DB_ID()) IN ('tbl1', 'tbl2', 'tbl3')

ORDER BY

--row_count DESC,

--ius.user_scans DESC,

--ius2.row_num, --user_scans&|user_seeks

-- list clustered index first, if any, then other index(es)

db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name

PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Grant Fritchey SSC Guru Points: 398880 More actions · Answer 4

ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.
First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.

Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101087 More actions · Answer 5

Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.
First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.

Huh?? It's doing a keyed read (SEEK predicate) to the exact start of the 1/10 of the index (yes, equivalent to table for this query's use of it), and all the rows that it needs to read are contiguous. How could that not reduce I/O by 90% vs scanning the entire index??

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Grant Fritchey SSC Guru Points: 398880 More actions · Answer 6

ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.
First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.
Huh?? It's doing a keyed read (SEEK predicate) to the exact start of the 1/10 of the index (yes, equivalent to table for this query's use of it), and all the rows that it needs to read are contiguous. How could that not reduce I/O by 90% vs scanning the entire index??

I know the magic number is supposed to be 20% of the table when the scan & seek shift in terms of performance, but that is a ballpark figure. We're moving into the range of it.

You'll note weasel words such as "if" and "much". Calm down dude.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

facticatech Ten Centuries Points: 1302 More actions · Answer 7

The query plan was 3 months. I have also attached the index usage and the actual query.

Table Structure For Subordertrans and shipmentSTL are state transitions below.

ID, SUborderid, FromState, ToState, CreateDate

1 123 Ordered Payment Recieved 2013-10-01 11:00

2 123 payment Recieved SomeotherState 2013-11-01 23:00

I'm just making this as a flat table . I know the source tables are poorly designed.

ScottPletcher SSC Guru Points: 101087 More actions · Answer 8

Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.
First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.
Huh?? It's doing a keyed read (SEEK predicate) to the exact start of the 1/10 of the index (yes, equivalent to table for this query's use of it), and all the rows that it needs to read are contiguous. How could that not reduce I/O by 90% vs scanning the entire index??
I know the magic number is supposed to be 20% of the table when the scan & seek shift in terms of performance, but that is a ballpark figure. We're moving into the range of it.
You'll note weasel words such as "if" and "much". Calm down dude.

You're confusing two different types of reads. We're not doing a key or rid lookup here, so even reading 98% of the rows that way would still save 2% of the I/O.

Therefore, the current seek is exactly what we want to see in the query. The problem lies in some other part of the query.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

facticatech Ten Centuries Points: 1302 More actions · Answer 9

i have attached the query and the index usage above. hope that helps in identifying the issue.

ScottPletcher SSC Guru Points: 101087 More actions · Answer 10

Based on the query plan, I think two these things 100% need done and should help:

1)

Table: SubOrderTrans

Index: idx_subordertransday

INCLUDE column SubOrderTransDeleted in that index

2)

Table: SUBORDER_ETA_CORRECTIONS

Index: IDX_NC_SubOrderID

Change it to a clustered index, that is, cluster this table on SUBORDER_ID.

This may or may not help, but it's definitely worth a try, since it will be a very small increase in the index size:

3)

Table: SubOrder

Index: idx_suborderday

INCLUDE column SubOrderStateID in that index

Then try the query again. If possible, an actual query plan is much better than only an estimated plan, since actual row counts are available only in an actual plan.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".