March 25, 2014 at 7:15 am
Hi,
I'm joining few large tables for loading historical load. 4 of these tables are about 10-15 million rows and two tables are very huge (30 million+).
The tables are indexed and the query is using the index.
I have put some filter conditions to load only 6 months of data and the final data will be in the range 2 million rows.
Everytime i run this query the tempdb gets full. (utilizing 175 gb)
tempdb and the db where i run this query are in simple recovery model.
I have attached the query plan. Please let me know what do I need to do when dealing with such large data sets.
March 25, 2014 at 12:20 pm
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.
First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 25, 2014 at 3:22 pm
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 25, 2014 at 3:25 pm
Hard to provide more details without the rest of the query and with no index info on the tables.
Edit: Looks like a covering index is needed on SubOrderTrans at least, but stats from below will still be extremely helpful.
From a very quick look I did, it seems all the indexes are in one db ("ODS").
Please run this query in that db and post the results --preferably in a spreadsheet so they are easy to load :-).
SET DEADLOCK_PRIORITY LOW
DECLARE @list_missing_indexes bit
DECLARE @table_name_pattern sysname
SET @list_missing_indexes = 0
SET @table_name_pattern = '%'
PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)
--SELECT create_date FROM sys.databases WITH (NOLOCK) WHERE name = N'tempdb'
IF @list_missing_indexes = 1
BEGIN
SELECT
GETDATE() AS capture_date,
DB_NAME(mid.database_id) AS Db_Name,
OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
mid.equality_columns, mid.inequality_columns, mid.included_columns,
migs.*,
mid.statement, mid.object_id, mid.index_handle
FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
mig.index_handle = mid.index_handle
LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
migs.group_handle = mig.index_group_handle
--order by
--DB_NAME, Table_Name, equality_columns
WHERE
1 = 1
AND mid.database_id = DB_ID()
AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern
ORDER BY
--avg_total_user_cost * (user_seeks + user_scans) DESC,
Db_Name, Table_Name, equality_columns, inequality_columns
END --IF
-- list index usage stats (seeks, scans, etc.)
SELECT
ius2.row_num, DB_NAME() AS db_name,
i.name AS index_name,
OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,
i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
dps.row_count,
SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,
ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
fk.Reference_Count AS fk_ref_count,
FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
ca1.sql_up_days AS days_in_use,
ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON
o.object_id = i.object_id
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'
) AS ca1
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal > 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
ic.key_ordinal
FOR XML PATH('')
) AS key_cols (key_cols)
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal = 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
COL_NAME(object_id, ic.column_id)
FOR XML PATH('')
) AS nonkey_cols (nonkey_cols)
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = i.object_id AND
dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
ius.database_id = DB_ID() AND
ius.object_id = i.object_id AND
ius.index_id = i.index_id
LEFT OUTER JOIN (
SELECT
database_id, object_id, MAX(user_scans) AS user_scans,
ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
WHERE
database_id = DB_ID()
--AND index_id > 0
GROUP BY
database_id, object_id
) AS ius2 ON
ius2.database_id = DB_ID() AND
ius2.object_id = i.object_id
LEFT OUTER JOIN (
SELECT
referenced_object_id, COUNT(*) AS Reference_Count
FROM sys.foreign_keys
WHERE
is_disabled = 0
GROUP BY
referenced_object_id
) AS fk ON
fk.referenced_object_id = i.object_id
WHERE
--EXISTS(SELECT 1 FROM sys.indexes i2 WHERE i2.object_id = i.object_id AND i2.index_id = 0) AND --$T
i.object_id > 100 AND
i.is_hypothetical = 0 AND
i.type IN (0, 1, 2) AND
o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND
(
o.name LIKE @table_name_pattern AND
o.name NOT LIKE 'dtprop%' AND
o.name NOT LIKE 'filestream[_]' AND
o.name NOT LIKE 'MSpeer%' AND
o.name NOT LIKE 'MSpub%' AND
--o.name NOT LIKE 'queue[_]%' AND
o.name NOT LIKE 'sys%' AND
)
--AND OBJECT_NAME(i.object_id, DB_ID()) IN ('tbl1', 'tbl2', 'tbl3')
ORDER BY
--row_count DESC,
--ius.user_scans DESC,
--ius2.row_num, --user_scans&|user_seeks
-- list clustered index first, if any, then other index(es)
db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name
PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 25, 2014 at 3:31 pm
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 25, 2014 at 3:39 pm
Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.
Huh?? It's doing a keyed read (SEEK predicate) to the exact start of the 1/10 of the index (yes, equivalent to table for this query's use of it), and all the rows that it needs to read are contiguous. How could that not reduce I/O by 90% vs scanning the entire index??
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 25, 2014 at 6:34 pm
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.
Huh?? It's doing a keyed read (SEEK predicate) to the exact start of the 1/10 of the index (yes, equivalent to table for this query's use of it), and all the rows that it needs to read are contiguous. How could that not reduce I/O by 90% vs scanning the entire index??
I know the magic number is supposed to be 20% of the table when the scan & seek shift in terms of performance, but that is a ballpark figure. We're moving into the range of it.
You'll note weasel words such as "if" and "much". Calm down dude.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 26, 2014 at 3:15 am
The query plan was 3 months. I have also attached the index usage and the actual query.
Table Structure For Subordertrans and shipmentSTL are state transitions below.
ID, SUborderid, FromState, ToState, CreateDate
1 123 Ordered Payment Recieved 2013-10-01 11:00
2 123 payment Recieved SomeotherState 2013-11-01 23:00
I'm just making this as a flat table . I know the source tables are poorly designed.
March 26, 2014 at 10:20 am
Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
ScottPletcher (3/25/2014)
Grant Fritchey (3/25/2014)
I can't see the query since it exceeds the size of what's stored in the plan, so suggestions from the plan alone are going to be a bit vague.First off, you're moving 2.5 million rows out of a 20 million row table. That's just not filtered enough to justify an index seek, which is what I'm seeing. So, either you've got hints in there to force this, or you have some seriously wonky statistics. I can't quite tell from an estimated plan. But that operation doesn't make sense when you consider the data involved.
You're also trying to sort millions of rows, after all those hash match joins, no wonder tempdb is getting killed.
Considering the size of the query, the number of columns and the tables involved, I'd suggest rolling back and reconstructing this from scratch. Or, look to see if there are pre-aggregations you can build into a different structure, a cube or something, so that you can get all those max values ahead of time.
If I could see the JOINs and WHERE clauses I might have more to say, but, that seek is concerning. You shouldn't be seeing that for returning 1/10th of the entire table.
I don't see any problem with the SEEK; indeed, it should cut the overhead for the query, since that is a covering index. As I see it, that SEEK simply does a keyed lookup to avoid reading rows unnecessarily.
Possibly. But it's returning a 10th of the table. A scan for that isn't likely to be much worse, if it's worse at all.
Huh?? It's doing a keyed read (SEEK predicate) to the exact start of the 1/10 of the index (yes, equivalent to table for this query's use of it), and all the rows that it needs to read are contiguous. How could that not reduce I/O by 90% vs scanning the entire index??
I know the magic number is supposed to be 20% of the table when the scan & seek shift in terms of performance, but that is a ballpark figure. We're moving into the range of it.
You'll note weasel words such as "if" and "much". Calm down dude.
You're confusing two different types of reads. We're not doing a key or rid lookup here, so even reading 98% of the rows that way would still save 2% of the I/O.
Therefore, the current seek is exactly what we want to see in the query. The problem lies in some other part of the query.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 26, 2014 at 3:04 pm
i have attached the query and the index usage above. hope that helps in identifying the issue.
March 26, 2014 at 5:31 pm
Based on the query plan, I think two these things 100% need done and should help:
1)
Table: SubOrderTrans
Index: idx_subordertransday
INCLUDE column SubOrderTransDeleted in that index
2)
Table: SUBORDER_ETA_CORRECTIONS
Index: IDX_NC_SubOrderID
Change it to a clustered index, that is, cluster this table on SUBORDER_ID.
This may or may not help, but it's definitely worth a try, since it will be a very small increase in the index size:
3)
Table: SubOrder
Index: idx_suborderday
INCLUDE column SubOrderStateID in that index
Then try the query again. If possible, an actual query plan is much better than only an estimated plan, since actual row counts are available only in an actual plan.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply