January 4, 2006 at 6:22 pm
Is there any way to speed up this query it takes almost 3 minutes to complete? I am running this on a dev server (2-way, 1GB RAM, all data files on 1 drive, 90GB database). I know....let's try to shoot a watermelon through a straw!!!
PS_KK_ACTIVITY_LOG = 12 million rows
PS_KK_SOURCE_HDR = 1.5 million rows
Query:
SELECT DISTINCT H.BUSINESS_UNIT ,A.LEDGER_GROUP FROM PS_KK_ACTIVITY_LOG A , PS_KK_SOURCE_HDR H WHERE H.KK_TRAN_ID = A.KK_TRAN_ID AND H.KK_TRAN_DT = A.KK_TRAN_DT
IO Statistics:
Table 'PS_KK_SOURCE_HDR'. Scan count 1, logical reads 7371, physical reads 2, read-ahead reads 7397.
Table 'PS_KK_ACTIVITY_LOG'. Scan count 1, logical reads 95857, physical reads 3, read-ahead reads 96598.
January 4, 2006 at 7:53 pm
What are the indexes on the 2 tables ?
January 5, 2006 at 3:55 am
Not sure any indexes is going to do much good.
The query is a join without further filtering, which means you'll have to scan both tables anyway since the requested result is a distinct list of all units + groups..?
Wihtout the distinct I'd assume the result would be all 12 million rows from the childtable.
/Kenneth
January 5, 2006 at 10:18 am
Here are the indexes. I have run out of ideas to fix.
PS_KK_ACTIVITY_LOG:
nonclustered located on PRIMARY - KK_TRAN_ID, KK_TRAN_DT, KK_TRAN_LN
nonclustered located on PRIMARY - KK_TRAN_ID, KK_TRAN_DT, LEDGER
nonclustered located on PRIMARY - LEDGER, KK_TRAN_DT, BUSINESS_UNIT, FISCAL_YEAR, ACCOUNTING_PERIOD, ACCOUNT, BUDGET_REF
nonclustered located on PRIMARY - LEDGER, DEPTID, FUND_CODE, ACCOUNT, BUDGET_REF
nonclustered located on PRIMARY - LEDGER, PROJECT_ID, DEPTID, FUND_CODE, ACCOUNT, BUDGET_REF
nonclustered located on PRIMARY - KK_TRAN_ID, KK_TRAN_DT, KK_TRAN_LN, BALANCING_LINE, BUSINESS_UNIT, LEDGER_GROUP, BUDGET_REF
clustered, unique located on PRIMARY - KK_TRAN_ID, KK_TRAN_DT, KK_TRAN_LN, REFERENCED_BUDGET, BALANCING_LINE, LEDGER_GROUP, LEDGER, FISCAL_YEAR, ACCOUNTING_PERIOD, SEQNBR, RVRSL_FLG
PS_KK_SOURCE_HDR:
nonclustered located on PRIMARY - KK_PROC_INSTANCE, KK_PROCESS_STATUS, KK_SOURCE_TRAN, SEQUENCE_NBR_9
nonclustered located on PRIMARY - KK_SOURCE_TRAN, BUSINESS_UNIT, REQ_ID
nonclustered located on PRIMARY - KK_SOURCE_TRAN, BUSINESS_UNIT, PO_ID
nonclustered located on PRIMARY - KK_SOURCE_TRAN, BUSINESS_UNIT, VOUCHER_ID
nonclustered located on PRIMARY - - KK_SOURCE_TRAN, BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE
nonclustered located on PRIMARY - KK_PROC_INSTANCE, KK_PROCESS_STATUS, KK_SOURCE_TRAN, SEQUENCE_NBR_9, BUSINESS_UNIT, VOUCHER_ID, KK_TRAN_ID, KK_TRAN_DT
nonclustered located on PRIMARY - DT_TIMESTAMP, KK_TRAN_ID, KK_TRAN_DT
nonclustered located on PRIMARY - BUSINESS_UNIT, PO_ID, KK_TRAN_ID, KK_TRAN_DT
nonclustered located on PRIMARY - BUSINESS_UNIT, VOUCHER_ID, KK_TRAN_ID, KK_TRAN_DT
clustered, unique located on PRIMARY - KK_TRAN_ID, KK_TRAN_DT
January 5, 2006 at 10:30 am
Well I re-ran a DBCC REINDEXDB on the two tables. Got the time down to 1 minute 15 seconds. Looking at perfmon I am CPU and DISK bound (queue lengths all are above 4).
January 5, 2006 at 10:40 am
We also need to see the query execution plan. Put this before the SQL and post the results:
Set Showplan_text on
go
>>The query is a join without further filtering, which means you'll have to scan both tables anyway
I disagree with this. You want to see scanning of the smaller PS_KK_SOURCE_HDR table, but for each attempt to join 1 of these rows to the 12 million details, you'd want an index or clustered-index seek.
I suspect the biggest issue here is the number of columns in the clustered index on PS_KK_ACTIVITY_LOG.
This makes the index huge, and don't forget, this also gets replicated in each non-clustered index of the table for use in bookmark lookups.
eg, run sp_spaceused on PS_KK_ACTIVITY_LOG. Note the results.
Now change the clustered index:
clustered, non-unique located on PRIMARY - KK_TRAN_ID, KK_TRAN_DT
Re-run your DBCC DbReindex and re-run sp_spaceused - how does the size usage change ? How does the query plan change ?
January 5, 2006 at 11:00 am
StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT DISTINCT H.BUSINESS_UNIT ,A.LEDGER_GROUP FROM PS_KK_ACTIVITY_LOG A ,
PS_KK_SOURCE_HDR H WHERE H.KK_TRAN_ID = A.KK_TRAN_ID AND H.KK_TRAN_DT
= A.KK_TRAN_DT
(1 row(s) affected)
StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Hash Match(Aggregate, HASH[H].[BUSINESS_UNIT], [A].[LEDGER_GROUP]), RESIDUAL[H].[BUSINESS_UNIT]=[H].[BUSINESS_UNIT] AND [A].[LEDGER_GROUP]=[A].[LEDGER_GROUP]))
|--Merge Join(Inner Join, MERGE[H].[KK_TRAN_ID], [H].[KK_TRAN_DT])=([A].[KK_TRAN_ID], [A].[KK_TRAN_DT]), RESIDUAL[A].[KK_TRAN_ID]=[H].[KK_TRAN_ID] AND [A].[KK_TRAN_DT]=[H].[KK_TRAN_DT]))
|--Clustered Index Scan(OBJECT[FXDUSR].[dbo].[PS_KK_SOURCE_HDR].[PS_KK_SOURCE_HDR] AS [H]), ORDERED FORWARD)
|--Index Scan(OBJECT[FXDUSR].[dbo].[PS_KK_ACTIVITY_LOG].[PSBKK_ACTIVITY_LOG] AS [A]), ORDERED FORWARD)
(4 row(s) affected)
StmtText
------------------------
set statistics io off
(1 row(s) affected)
January 5, 2006 at 2:28 pm
Looks like the index size grew.
1. clustered, unique
name rows reserved data index_size unused
--------------------------------------- ----------- ------------------ ------------------ ------------------ -----------
PS_KK_ACTIVITY_LOG 12556668 10376000 KB 4018136 KB 6357624 KB 240 KB
2. clustered, non-unique
name rows reserved data index_size unused
--------------------------------------- ----------- ------------------ ------------------ ------------------ -----------
PS_KK_ACTIVITY_LOG 12556668 10685552 KB 4018136 KB 6667176 KB 240 KB
Here is the showplan:
ELECT DISTINCT H.BUSINESS_UNIT ,A.LEDGER_GROUP FROM PS_KK_ACTIVITY_LOG A ,
PS_KK_SOURCE_HDR H WHERE H.KK_TRAN_ID = A.KK_TRAN_ID AND H.KK_TRAN_DT
= A.KK_TRAN_DT
(1 row(s) affected)
StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Hash Match(Aggregate, HASH[H].[BUSINESS_UNIT], [A].[LEDGER_GROUP]), RESIDUAL[H].[BUSINESS_UNIT]=[H].[BUSINESS_UNIT] AND [A].[LEDGER_GROUP]=[A].[LEDGER_GROUP]))
|--Merge Join(Inner Join, MERGE[H].[KK_TRAN_ID], [H].[KK_TRAN_DT])=([A].[KK_TRAN_ID], [A].[KK_TRAN_DT]), RESIDUAL[A].[KK_TRAN_ID]=[H].[KK_TRAN_ID] AND [A].[KK_TRAN_DT]=[H].[KK_TRAN_DT]))
|--Clustered Index Scan(OBJECT[FXDUSR].[dbo].[PS_KK_SOURCE_HDR].[PS_KK_SOURCE_HDR] AS [H]), ORDERED FORWARD)
|--Index Scan(OBJECT[FXDUSR].[dbo].[PS_KK_ACTIVITY_LOG].[PSBKK_ACTIVITY_LOG] AS [A]), ORDERED FORWARD)
(4 row(s) affected)
January 5, 2006 at 4:17 pm
I have a similar, but slightly smaller data set that I was able to test with. An Account table (500K rows) and transactions on each account (5 million rows). Selecting a Distinct column pair, with no WHERE clause that hits every record but summarises down to 3K rows, my query will always clustered index seek on the larger table and return in less than 10 seconds.
Depending on the relative selectivity of your columns, you could try changing the column order in the clustered index - are there multiple KK_TRAN_ID's per KK_TRAN_DT value or vice versa ?
You could also try creating a non-clustered covering index - since the only 3 columns needed from your larger table are KK_TRAN_ID, KK_TRAN_DT and LEDGER_GROUP, you could create a 3-column index on these 3 columns, and the query would be able to use the index only without needing bookmark lookups to the data pages.
January 6, 2006 at 3:40 pm
I added the covering key same results. Besides the disk queue length is stuck at between 4 and 12. Until then....waiting on hardware. Thanks for your help.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply