April 5, 2004 at 7:56 am
Hi!
I have a query that makes tempdb as large as 20GB and exceed the disk space on the drive where tempdb is located. This query is generated by application and I have very limited possibilities optimizing it. Probably all I can do is to change the order of tables I use in the numerous joints, based on table's size.
This is the query:
-----------------------------------------------------------------------------------------------------
select 'Document date'= datepart(yy, h.Doc_Date) ,'Sales summ'=sum( h.SALE_AMT_ALL)
from VSALES_ITEM_M h
join CUSTOMER d (NOLOCK) on h.Customer=d.IDCUST and convert (int,h .Customer) not in (1, 3)
join EXTENDED_ACC_SET eac (nolock) on eac.extended_acc_set=d.extended_acc_set
and EXTENDED_ACC_SET_TYPE@isLogistic (eac.extended_acc_set_type) =0
join ENTITY en (nolock) on d.entity = en.entity
left join GEOGRAPHY r0 (NOLOCK) on d.geography = r0.geography
left join CUSTOMER mc (NOLOCK) on d.parent = mc.customer
left join SUPPLIER sp (nolock) on sp.supplier = h.primary_supplier
left join CUS_REGION r (NOLOCK) on r0.cus_region = r.cus_region
left join DISCOUNT_TYPE dt (nolock) on mc.discount_type = dt.discount_type
left join CUSTOMER_REL crt (nolock) on crt.customer = d.customer and crt.customer_rel_type = 1
left join SPAY_TERM spt (nolock) on spt.spay_term = h.Terms
left join OA_TERMS_GROUP otg (nolock) on spt.oa_terms_group = otg.oa_terms_group
left join EXTENDED_ACC_SET eas (nolock) on eas.extended_acc_set = d.extended_acc_set
left join legacy..S11_SHIP_ORDER s11 (nolock) on s11.codinv = h.Invoice
left join OA_DDOCUMENT oad (nolock) on oad.int_ref=h.Invoice
left join legacy..ARRTB arr (nolock) on arr.CODETERM = spt.spay_term
left join EXTENDED_ACC_SET_TYPE east (nolock) on east.extended_acc_set_type = eas.extended_acc_set_type
where h.O_I_CRNote <> 'O' and h.Doc_Date between '20020301' and '22 Mar 2004'
and h.SALE_QTY_ALL <> 0 group by datepart(yy, h.Doc_Date)
order by 1
-----------------------------------------------------------------------------------------------------
this is EXTENDED_ACC_SET_TYPE@isLogistic function:
-----------------------------------------------------------------------------------------------------
create function dbo.EXTENDED_ACC_SET_TYPE@isLogistic
(@extended_acc_set_type int)
returns tinyint as
begin
declare @res tinyint
if @extended_acc_set_type in (3)
set @res = 1
else
set @res = 0
return @res
end
-----------------------------------------------------------------------------------------------------
These are statistics on all tables:
name | rows | reserved | data | index_size | unused | ||||
CUS_REGION | 9 | 24 | KB | 8 | KB | 16 | KB | 0 | KB |
CUSTOMER | 30413 | 12584 | KB | 8632 | KB | 3800 | KB | 152 | KB |
CUSTOMER_REL | 15404 | 1352 | KB | 600 | KB | 696 | KB | 56 | KB |
DISCOUNT_TYPE | 10 | 24 | KB | 8 | KB | 16 | KB | 0 | KB |
ENTITY | 24350 | 4552 | KB | 4280 | KB | 280 | KB | -8 | KB |
EXTENDED_ACC_SET | 49 | 40 | KB | 8 | KB | 32 | KB | 0 | KB |
EXTENDED_ACC_SET_TYPE | 12 | 24 | KB | 8 | KB | 16 | KB | 0 | KB |
GEOGRAPHY | 58177 | 3968 | KB | 3912 | KB | 16 | KB | 40 | KB |
OA_DDOCUMENT | 977721 | 521224 | KB | 232808 | KB | 41792 | KB | 246624 | KB |
OA_TERMS_GROUP | 7 | 24 | KB | 8 | KB | 16 | KB | 0 | KB |
legacy..S11_SHIP_ORDER | 700902 | 592688 | KB | 197072 | KB | 278360 | KB | 117256 | KB |
SPAY_TERM | 130 | 56 | KB | 24 | KB | 32 | KB | 0 | KB |
legacy..ARRTB | 129 | 112 | KB | 32 | KB | 24 | KB | 56 | KB |
SUPPLIER | 732 | 192 | KB | 160 | KB | 16 | KB | 16 | KB |
Is there a chance I can optimize it (not to allow tempdb to grow that much), just by linking large tables first or something like that?
Thanks.
April 5, 2004 at 8:53 am
separate group by, sum and order by from data extraction will help.
April 5, 2004 at 10:56 am
All the left joins will kill you because you have to hit EVERY single row in these tables, i.e. table scans.
If you can and there is not a lot of updating to the underlying tables, you might try denormalizing and pre-staging some of this information in a separate table that you query. You'd need triggers or modified SPs to ensure it's up to date, but it might save some query time.
April 6, 2004 at 12:19 am
This and other similar queries are generated in an application with Access like interface (more advanced though, as the number of queried tables and the number of conditions concerned). I can do some denormalization and prestaging for this particular query, but it may never be used again. So I need a simple rule, to avoid tempdb grouth to huge sizes (even at the cost of query time) and to minimize application changes. These and other tables used in such queries are on reporting server replicated from production server. There are a lot of updates on some of them.
April 6, 2004 at 3:44 am
"All the left joins will kill you because you have to hit EVERY single row in these tables, i.e. table scans."
Left joins do not automatically imply table-scans....if proper (and uptodate statistics-wise) indices exist to support the join-conditions, then the the indices will be used.
However in this situation, are you saying that it is possible to
sum( h.SALE_AMT_ALL) from VSALES_ITEM_M h
for customers which do not exist....(interesting concept...selling something to somebody who may not exist....i'm wondering how this turns into cash..)
and likewise for the other tables.....would inner joins not be more appropriate?
also...if you give index hints....you may be able to nudge the sql engine to going down a particular performance path and thus minimise the temp db utilisation.
finally...and at a more basic level....do you gain anything (ie does the result change) if you leave out all (most)the tables to which you are left joining...given that you are not applying any restrictions on the result set by their inclusion....in effect they seem to be clutter?
select sum(t1.amount) from table1 left join table2 on t1.code = t2.code
will give the exact same result as
select sum(t1.amount) from table1
in the particular instance where table2 is a parent(lookup) table in the relationship.....and where no restrictions are applied to table2.
In essence you end up doing far more work (dataaccess) than may be necessary.
April 8, 2004 at 9:53 pm
I agree w/Andrew Murphy. You should either eliminate all tables from GEOGRAPHY to EXTENDED_ACC_SET_TYPE--they're not contributing anything, given the outer joins; or else change these joins to inner joins so that they're properly restricting your results--if that's really what you want.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply