July 10, 2014 at 9:03 am
Hello all,
I have something I can't explain, hope some of you can?
We have a medium sized database with the next tables:
- PA: 525000 records
- PR: 780000 records
- R: 1000 records
- B: 45 records
PA: PK = PAARDCODE
PR: PK = PAARDREGISTERCODE
PR: FK = PAARDCODE
PR: FK = REGISTERCODE
R: PK = REGISTERCODE
R: FK = BOEKCODE
B: PK = BOEKCODE
When I group by B.BOEKCODE the query lasts: 10 (or more when 'where' option is added) seconds
When I group by R.BOEKCODE the query lasts less than 2 seconds.
SELECT B.BOEKOMSCHRIJVING, B.BOEKCODE -- or R.BOEKCODE
FROM PA
INNER JOIN PR ON PA.PAARDCODE = PR.PAARDCODE
INNER JOIN R ON R.REGISTERCODE = PR.REGISTERCODE
INNER JOIN B ON R.BOEKCODE = B.BOEKCODE
GROUP BY BOEKOMSCHRIJVING, B.BOEKCODE -- or R.BOEKCODE
ORDER BY BOEKOMSCHRIJVING
Why is the option of B.BOEKCODE a lot slower compared to R.BOEKCODE?
Thanks in advance,
Peter
July 10, 2014 at 9:18 am
Have you looked at the query's execution plan? This would likely reveal the reason. Also, is there more than one record in the table with alias B for a given BOEKCODE value ? In that case you may be traversing a lot of records that aren't needed, as including the B table in the result set requires retreiving all the records from that table for that BOEKCODE value. As you are using GROUP BY, you don't need the B table data if you aren't aggregating anything from it, so using the R table's version is fine.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 10, 2014 at 9:22 am
sgmunson (7/10/2014)
Have you looked at the query's execution plan? This would likely reveal the reason. Also, is there more than one record in the table with alias B for a given BOEKCODE value ? In that case you may be traversing a lot of records that aren't needed, as including the B table in the result set requires retreiving all the records from that table for that BOEKCODE value. As you are using GROUP BY, you don't need the B table data if you aren't aggregating anything from it, so using the R table's version is fine.
Hi Sgmunson,
Thanks for looking into this, I will look at the execution plan to see what is happening.
From the 'B' table, I need the 'BOEKOMSCHRIJVING', that's why I've added the 'B' table.
For the extra records, Every PA has (at least) one PR, each PR contains 1 R, and each R has one B
July 10, 2014 at 9:34 am
Given your additional info, this suggests that once you have the query plan, take a close look at the indexes on the B table. See if there are any that have both the BOEKCODE as well as the other selected field. I'm guessing not, and by selecting that field from the B table, you have to traverse all the records rather than hit an index that just contains the other selected field, but I wouldn't want to offer any guarantee on that. The query plan will likely make things clear as to what's happening when you select it from the B table instead of the R table. You may be able to anticipate the plan's answer just by running the following query:
SELECT BOEKCODE, BOEKOMSCHRIJVING, COUNT(*)
FROM B
GROUP BY BOEKCODE, BOEKOMSCHRIJVING
HAVING COUNT(*) > 1
If that query returns any rows, you might want to substitute the following into your query in place of the B table:
SELECT DISTINCT BOEKCODE, BOEKOMSCHRIJVING
FROM B
peter 67432 (7/10/2014)
sgmunson (7/10/2014)
Have you looked at the query's execution plan? This would likely reveal the reason. Also, is there more than one record in the table with alias B for a given BOEKCODE value ? In that case you may be traversing a lot of records that aren't needed, as including the B table in the result set requires retreiving all the records from that table for that BOEKCODE value. As you are using GROUP BY, you don't need the B table data if you aren't aggregating anything from it, so using the R table's version is fine.Hi Sgmunson,
Thanks for looking into this, I will look at the execution plan to see what is happening.
From the 'B' table, I need the 'BOEKOMSCHRIJVING', that's why I've added the 'B' table.
For the extra records, Every PA has (at least) one PR, each PR contains 1 R, and each R has one B
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 11, 2014 at 4:56 am
Hi Steve,
I checked the query execution plan, however it is the first time I am looking in such thing, and I have no idea how to see what is wrong. I tried to follow this website: link[/url], however I still don't understand it, can you help me with reading this plan?
Thanks in advance!
Peter
The slow one:
The fast one, also divided in 2 screens for readability.
July 11, 2014 at 5:50 am
One thing I noticed right away was that the "slow" one has two "Nested Loop" icons, while the "fast" one does not, and includes parallelism. Taking advantage of parallel operations usually speeds things up. Having no nested loops is also an advantage. I may not be an expert at reading these plans, but the conclusion I can draw from seeing the nested loop plan going a lot slower than one that takes good advantage of the indexes and operates in parallel is that my original thinking was quite likely correct. Having to retrieve the BOEKCODE from the B table is costing a lot, as it probably comes from the index in the other table, so not having to traverse the individual records from the B table gets you out of a costly pair of nested loops. Next time you want to post a SQL execution plan, be sure to save the plan with the .sqlplan extension, then zip the file and attach it to your post. It is much easier for others to look at the plan that way, and because I can't see the properties of the individual icons, I can't be certain of my conclusion, but I don't think I'm likely to learn differently by examining those properties in this specific case. Someone with more experience reading them might be in a better position to do so, and to better explain it than I can.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 11, 2014 at 11:41 am
If possible, please attach the actual query plan xml as an xml file, rather than just a picture of the plan. There are row counts and other stats available in the query plan that are critical to analyzing it but that can't be seen from a static picture alone.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 15, 2014 at 1:29 am
Hi,
Sorry for the late response, was a weekend out of town. The zip file should be attached.
Peter
July 15, 2014 at 10:42 am
Interesting, but not definitive.
Would you please run these commands on that database and post the results? That will show what indexes SQL "thinks" are missing, and how existing indexes are being used:
USE [DH_KWPN]
SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case
DECLARE @list_missing_indexes bit
DECLARE @table_name_pattern sysname
--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.
SET @list_missing_indexes = 1 --1=list missing index(es); 0=don't.
PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)
IF @list_missing_indexes = 1
BEGIN
SELECT
GETDATE() AS capture_date,
DB_NAME(mid.database_id) AS Db_Name,
OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
mid.equality_columns, mid.inequality_columns, mid.included_columns,
ca1.max_days_active,
migs.*,
mid.statement, mid.object_id, mid.index_handle
FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
) AS ca1
LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
mig.index_handle = mid.index_handle
LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
migs.group_handle = mig.index_group_handle
--order by
--DB_NAME, Table_Name, equality_columns
WHERE
1 = 1
AND mid.database_id = DB_ID()
AND OBJECT_NAME(mid.object_id) IN (
'SHA_PAARDACT',
'SHA_PAARDREGISTER',
'SHA_REGISTER',
'SHA_BOEK'
)
ORDER BY
--avg_total_user_cost * (user_seeks + user_scans) DESC,
Db_Name, Table_Name, equality_columns, inequality_columns
END --IF
PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)
-- list index usage stats (seeks, scans, etc.)
SELECT
ius2.row_num, DB_NAME() AS db_name,
i.name AS index_name,
OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,
i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
dps.row_count,
SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,
ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
fk.Reference_Count AS fk_ref_count,
FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
ca1.max_days_active,
ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON
o.object_id = i.object_id
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
) AS ca1
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal > 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
ic.key_ordinal
FOR XML PATH('')
) AS key_cols (key_cols)
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal = 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
COL_NAME(object_id, ic.column_id)
FOR XML PATH('')
) AS nonkey_cols (nonkey_cols)
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = i.object_id AND
dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
ius.database_id = DB_ID() AND
ius.object_id = i.object_id AND
ius.index_id = i.index_id
LEFT OUTER JOIN (
SELECT
database_id, object_id, MAX(user_scans) AS user_scans,
ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
WHERE
database_id = DB_ID()
--AND index_id > 0
GROUP BY
database_id, object_id
) AS ius2 ON
ius2.database_id = DB_ID() AND
ius2.object_id = i.object_id
LEFT OUTER JOIN (
SELECT
referenced_object_id, COUNT(*) AS Reference_Count
FROM sys.foreign_keys WITH (NOLOCK)
WHERE
is_disabled = 0
GROUP BY
referenced_object_id
) AS fk ON
fk.referenced_object_id = i.object_id
WHERE
i.object_id > 100 AND
i.is_hypothetical = 0 AND
i.type IN (0, 1, 2) AND
o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND
(
o.name IN (
'SHA_PAARDACT',
'SHA_PAARDREGISTER',
'SHA_REGISTER',
'SHA_BOEK'
)
)
ORDER BY
--row_count DESC,
--ius.user_scans DESC,
--ius2.row_num, --user_scans+user_seeks
-- list clustered index first, if any, then other index(es)
db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name
PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)
Edit: Corrected CROSS APPLY column alias.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 16, 2014 at 8:20 am
Hi Scott,
In attachment the results (hopefully good format)
Peter
July 16, 2014 at 10:17 am
I think that Query1 data got scrambled or something.
Can you use a spreadsheet instead? After you run the queries, in the Results/Output area, left-click in the empty box to the left of the first column name, which should highlight the entire result set. Then right-click, and do "Copy with Headers" and paste it into the spreadsheet. Both results can go into the same spreadsheet.
Based on analysis so far of the query2, you'd definitely gain performance by clustering SHA_PAARDREGISTER on PAARDCODE instead of on PAARDREGISTERCODE, although you'd definitely want to create a nonclustered index on PAARDREGISTERCODE.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 17, 2014 at 1:33 am
As asked also the results in grid format.
July 17, 2014 at 1:17 pm
Yep, looks great, thanks!
Here's my scripted recommendations for index changes/rebuilds. I don't have time right now, but can explain more later if/when you have qs. Hope this helps!
------------------------------------------------------------------------------------------------------------------------
--Table: SHA_PAARDACT
-- DROPs
DROP INDEX [IDX_FPS_PAARDACT_ACTUEEL] ON SHA_PAARDACT --will be covered by recreated IDX_FPS_PAARDACT_GESLACHT
--!!(see Comment1 below in CREATEs)
DROP INDEX [IDX_FPS_PAARDACT_GEBOORTEDATUM] ON SHA_PAARDACT --will be covered by recreated IDX_FPS_PAARDACT_GESLACHT
DROP INDEX [IDX_FPS_PAARDACT_GESLACHT] ON SHA_PAARDACT --to be recreated with included columns
DROP INDEX [IDX_FPS_PAARDACT_KEURSEIZ] ON SHA_PAARDACT --will be covered by recreated IDX_FPS_PAARDACT_GESLACHT
DROP INDEX [IDX_FPS_PAARDACT_REGISTER] ON SHA_PAARDACT --will be covered by recreated IDX_FPS_PAARDACT_GESLACHT
DROP INDEX [IX_SHA_PAARDACT] ON SHA_PAARDACT --will be covered by recreated IDX_FPS_PAARDACT_GESLACHT
-- CREATEs
CREATE NONCLUSTERED INDEX [IDX_FPS_PAARDACT_EXPORTDATUM_IMPORTDATUM]
ON SHA_PAARDACT ( EXPORTDATUM, IMPORTDATUM )
WITH ( FILLFACTOR = 99, ONLINE = ON, SORT_IN_TEMPDB = ON )
ON [PRIMARY]
--Comment1: i think [GEBOORTEDATUM] is birth date??(name looks Dutch(?), but based on my very limited knowledge of German);
-- if instead it's a long (~20+ bytes) column, you can leave it in its own index and remove it from this one
CREATE UNIQUE NONCLUSTERED INDEX [IDX_FPS_PAARDACT_GESLACHT]
ON SHA_PAARDACT ( GESLACHT, PAARDCODE )
INCLUDE ( ACTUEEL, DATUMOVERLIJDEN, GEBOORTEDATUM, KEURINGSSEIZOEN, REGISTERCODE )
WITH ( FILLFACTOR = 99, ONLINE = ON, SORT_IN_TEMPDB = ON )
ON [PRIMARY]
------------------------------------------------------------------------------------------------------------------------
--Table: SHA_PAARDREGISTER !*changing clustering key*!
-- DROPs
DROP INDEX [IDX_FPS_PAARDREGISTER_DATUMTOT] ON SHA_PAARDREGISTER --will be covered by recreated IDX_FPS_PAARDREGISTER_REG
DROP INDEX [IDX_FPS_PAARDREGISTER_DATUMVAN] ON SHA_PAARDREGISTER --will be covered by recreated IDX_FPS_PAARDREGISTER_REG
DROP INDEX [IDX_FPS_PAARDREGISTER_PAARD] ON SHA_PAARDREGISTER --will become new clustering key, so nonclus index no longer needed
DROP INDEX [IDX_FPS_PAARDREGISTER_REG] ON SHA_PAARDREGISTER --to be recreated with included columns
ALTER TABLE SHA_PAARDREGISTER DROP CONSTRAINT [PK_FPS_PAARDREGISTER] --changing clustered index, must drop existing one first
-- CREATEs
CREATE UNIQUE CLUSTERED INDEX [CLX_FPS_PAARDREGISTER]
ON SHA_PAARDREGISTER ( PAARDCODE, PAARDREGISTERCODE )
WITH ( FILLFACTOR = 99, ONLINE = ON, SORT_IN_TEMPDB = ON )
ON [PRIMARY]
ALTER TABLE SHA_PAARDREGISTER
ADD CONSTRAINT [PK_FPS_PAARDREGISTER] PRIMARY KEY ( PAARDREGISTERCODE )
WITH ( FILLFACTOR = 100, ONLINE = ON, SORT_IN_TEMPDB = ON )
ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IDX_FPS_PAARDREGISTER_REG]
ON SHA_PAARDREGISTER ( REGISTERCODE )
INCLUDE ( DATUMVAN, DATUMTOT, PAARDCODE )
WITH ( FILLFACTOR = 99, ONLINE = ON, SORT_IN_TEMPDB = ON )
ON [PRIMARY]
------------------------------------------------------------------------------------------------------------------------
--Table: SHA_REGISTER
-- DROPs
DROP INDEX [IDX_FPS_REGISTER_BOEK] ON SHA_REGISTER --will be covered by UIX_FPS_REGISTER
DROP INDEX [IDX_FPS_REGISTER_GESLACHT] ON SHA_REGISTER --will be covered by UIX_FPS_REGISTER
DROP INDEX [IDX_FPS_REGISTER_PREDIKAAT] ON SHA_REGISTER --will be covered by UIX_FPS_REGISTER
DROP INDEX [IDX_FPS_REGISTER_REGISTERTYPE] ON SHA_REGISTER --will be covered by UIX_FPS_REGISTER
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 25, 2014 at 1:13 am
Hi Scott,
When I try to run the scripts I get:
'Online index operations can only be performed in Enterprise edition of SQL Server.'
Is there a way to do this in the not enterprise version?
Peter
July 25, 2014 at 8:28 am
peter 67432 (7/25/2014)
Hi Scott,When I try to run the scripts I get:
'Online index operations can only be performed in Enterprise edition of SQL Server.'
Is there a way to do this in the not enterprise version?
Peter
Sorry, sure, just remove "ONLINE = ON," from all the index commands.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply