Urgent help needed for tuning this failing query on big tables

  • Hello guys,

    I am sorry to post this question here as the query is actually from Impala, not SQL, but since it is SQL-like, and more important is I received lots of helpful advices from this forum, I decided to seek help again from here.

    I have a client submitting a query which fails due to memory limit, average peak memory usage is 33GB, I believe we can do some tuning on the query so that it eats less memory, here is the query:

    select p_year,
    p_month,
    client,
    grandparent_acti,
    origacctmnc,
    count(orderid) num_of_orders,
    sum(orderquantity) total_order_qty,
    sum(orderquantity * orderprice) total_notional
    FROM (
    select distinct d.dw_process_date, 
    d.p_year, 
    d.p_month,
    a.grandparent_acti,
    d.origacctmnc,
    case when grandparent_acti = '12345' then 'TIAA-CREF'
    when grandparent_acti = '23456' then 'Jane Street Capital'
    when (origacctmnc='ABCDEF' or grandparent_acti = '12345') then 'Blue Fire Capital'
    when (origacctmnc in ('BCDEFG', 'BCDEFG1','BCDEFG2') OR grandparent_acti = 'BCDEFG') then 'Marshall Wace LLP'
    else null end as Client, 
    d.orderid,
    d.orderquantity, 
    d.orderprice 
    from (
    select * from dim_order_2012
    where orderid in (
    select distinct rootorderid
    from dim_order_2012
    where directedmpid='IMAT'
    and p_month =5

    and p_month =5
    union
    select * from dim_order_2012
    where orderid in (
    select distinct root_orderid
    from fact_execution_Detail
    where dw_process_date between '2012-05-01' and '2012-05-31'
    and exec_destination_contra = 'ABCD'

    and p_month = 5
    ) d
    left outer join dim_account a on d.dw_acct_id=a.dw_account_id and a.grandparent_acti in ('00732', '58851', '71685', '00746', '00918', '02896', '09804' )left outer join dim_account a on d.dw_acct_id=a.dw_account_id and a.grandparent_acti in ('00732', '58851', '71685', '00746', '00918', '02896', '09804' )

    and d.sourceorderstatus like 'NEW_REQ%'
    and d.p_month = 5
    ) a
    group by p_year, p_month, client, grandparent_acti, origacctmnc
    order by p_year, p_month, client, grandparent_acti, origacctmnc

    Can anyone help?

    The highlighted part is eating too much memory, according to log it uses 23GB memory, so I would say there is something to do here.

    Thank you very much. Any clue is appreciated.

    Again, sorry to post this question here, thanks for understanding.

  • After closely looked into every sub set of query result, I was able to fix the query:

    The client (business side) might be using sort of fancy auto tool to create the query, a table contains 200 columns and the query has a select * from that table, which consumes too much un-needed resource.

    Thank you for watching. This really has nothing to do with Impala at all, and that's why I post it here hoping to get some clue. I am happy to sort it out myself. Anyway, hope this case could be of some help to you.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply