A query with too many joints...

  • Hi!

    I have a query that makes tempdb as large  as 20GB and exceed the disk space on the drive where tempdb is located.  This query is generated by application and I have very limited possibilities optimizing it.  Probably all I can do is to change the order of tables I use in the numerous joints, based on table's size.

    This is the query:

    -----------------------------------------------------------------------------------------------------

    select 'Document date'= datepart(yy, h.Doc_Date) ,'Sales summ'=sum( h.SALE_AMT_ALL)

    from VSALES_ITEM_M h

    join CUSTOMER d (NOLOCK) on h.Customer=d.IDCUST  and convert (int,h .Customer) not in (1, 3)

    join EXTENDED_ACC_SET eac (nolock) on eac.extended_acc_set=d.extended_acc_set

     and EXTENDED_ACC_SET_TYPE@isLogistic (eac.extended_acc_set_type) =0

    join ENTITY en (nolock) on d.entity = en.entity

    left join GEOGRAPHY r0 (NOLOCK) on d.geography = r0.geography

    left join CUSTOMER mc (NOLOCK) on d.parent = mc.customer

    left join SUPPLIER sp (nolock) on sp.supplier = h.primary_supplier

    left join CUS_REGION r (NOLOCK) on r0.cus_region = r.cus_region

    left join DISCOUNT_TYPE dt (nolock) on mc.discount_type = dt.discount_type

    left join CUSTOMER_REL crt (nolock) on crt.customer = d.customer and crt.customer_rel_type = 1

    left join SPAY_TERM spt (nolock) on spt.spay_term = h.Terms

    left join OA_TERMS_GROUP otg (nolock) on spt.oa_terms_group = otg.oa_terms_group

    left join EXTENDED_ACC_SET eas (nolock) on eas.extended_acc_set = d.extended_acc_set

    left join legacy..S11_SHIP_ORDER s11 (nolock) on s11.codinv = h.Invoice

    left join OA_DDOCUMENT oad (nolock) on oad.int_ref=h.Invoice

    left join legacy..ARRTB arr (nolock) on arr.CODETERM = spt.spay_term

    left join EXTENDED_ACC_SET_TYPE east (nolock) on east.extended_acc_set_type = eas.extended_acc_set_type

    where h.O_I_CRNote <> 'O' and h.Doc_Date between '20020301' and '22 Mar 2004'

      and h.SALE_QTY_ALL <> 0 group by datepart(yy, h.Doc_Date)

    order by 1

    -----------------------------------------------------------------------------------------------------

    this is EXTENDED_ACC_SET_TYPE@isLogistic  function:

    -----------------------------------------------------------------------------------------------------

    create function dbo.EXTENDED_ACC_SET_TYPE@isLogistic

    (@extended_acc_set_type int)

    returns tinyint as

    begin

    declare @res tinyint

    if @extended_acc_set_type in (3)

     set @res = 1

    else

     set @res = 0

    return @res

    end

    -----------------------------------------------------------------------------------------------------

     

    These are statistics on all tables:

    namerowsreserved data index_size unused 
    CUS_REGION924KB8KB16KB0KB
    CUSTOMER3041312584KB8632KB3800KB152KB
    CUSTOMER_REL154041352KB600KB696KB56KB
    DISCOUNT_TYPE1024KB8KB16KB0KB
    ENTITY243504552KB4280KB280KB-8KB
    EXTENDED_ACC_SET4940KB8KB32KB0KB
    EXTENDED_ACC_SET_TYPE1224KB8KB16KB0KB
    GEOGRAPHY581773968KB3912KB16KB40KB
    OA_DDOCUMENT977721521224KB232808KB41792KB246624KB
    OA_TERMS_GROUP724KB8KB16KB0KB
    legacy..S11_SHIP_ORDER700902592688KB197072KB278360KB117256KB
    SPAY_TERM13056KB24KB32KB0KB
    legacy..ARRTB129112KB32KB24KB56KB
    SUPPLIER732192KB160KB16KB16KB

     

    Is there a chance I can optimize it (not to allow tempdb to grow that much), just by linking large tables first or something like that?

    Thanks.

  • separate group by, sum and order by from data extraction will help.

  • All the left joins will kill you because you have to hit EVERY single row in these tables, i.e. table scans.

    If you can and there is not a lot of updating to the underlying tables, you might try denormalizing and pre-staging some of this information in a separate table that you query. You'd need triggers or modified SPs to ensure it's up to date, but it might save some query time.

  • This and other similar queries are generated in an application with Access like interface (more advanced though, as the number of queried tables and the number of conditions concerned).  I can do some denormalization and prestaging for this particular query, but it may never be used again.  So I need a simple rule, to avoid tempdb grouth to huge sizes (even at the cost of query time) and to minimize application changes.  These and other tables used in such queries are on reporting server replicated from production server.  There are a lot of updates on some of them.

  • "All the left joins will kill you because you have to hit EVERY single row in these tables, i.e. table scans."

    Left joins do not automatically imply table-scans....if proper (and uptodate statistics-wise) indices exist to support the join-conditions, then the the indices will be used.

     

    However in this situation, are you saying that it is possible to

    sum( h.SALE_AMT_ALL) from VSALES_ITEM_M h

    for customers which do not exist....(interesting concept...selling something to somebody who may not exist....i'm wondering how this turns into cash..)

    and likewise for the other tables.....would inner joins not be more appropriate?

     

    also...if you give index hints....you may be able to nudge the sql engine to going down a particular performance path and thus minimise the temp db utilisation.

     

     

    finally...and at a more basic level....do you gain anything (ie does the result change) if you leave out all (most)the tables to which you are left joining...given that you are not applying any restrictions on the result set by their inclusion....in effect they seem to be clutter?

     

    select sum(t1.amount) from table1 left join table2 on t1.code = t2.code

    will give the exact same result as

    select sum(t1.amount) from table1

    in the particular instance where table2 is a parent(lookup) table in the relationship.....and where no restrictions are applied to table2.

    In essence you end up doing far more work (dataaccess) than may be necessary.

     

  • I agree w/Andrew Murphy.  You should either eliminate all tables from GEOGRAPHY to EXTENDED_ACC_SET_TYPE--they're not contributing anything, given the outer joins; or else change these joins to inner joins so that they're properly restricting your results--if that's really what you want.

     

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply