June 16, 2017 at 10:07 am
Hello guys,
I am sorry to post this question here as the query is actually from Impala, not SQL, but since it is SQL-like, and more important is I received lots of helpful advices from this forum, I decided to seek help again from here.
I have a client submitting a query which fails due to memory limit, average peak memory usage is 33GB, I believe we can do some tuning on the query so that it eats less memory, here is the query:
select p_year,
p_month,
client,
grandparent_acti,
origacctmnc,
count(orderid) num_of_orders,
sum(orderquantity) total_order_qty,
sum(orderquantity * orderprice) total_notional
FROM (
select distinct d.dw_process_date,
d.p_year,
d.p_month,
a.grandparent_acti,
d.origacctmnc,
case when grandparent_acti = '12345' then 'TIAA-CREF'
when grandparent_acti = '23456' then 'Jane Street Capital'
when (origacctmnc='ABCDEF' or grandparent_acti = '12345') then 'Blue Fire Capital'
when (origacctmnc in ('BCDEFG', 'BCDEFG1','BCDEFG2') OR grandparent_acti = 'BCDEFG') then 'Marshall Wace LLP'
else null end as Client,
d.orderid,
d.orderquantity,
d.orderprice
from (
select * from dim_order_2012
where orderid in (
select distinct rootorderid
from dim_order_2012
where directedmpid='IMAT'
and p_month =5
)
and p_month =5
union
select * from dim_order_2012
where orderid in (
select distinct root_orderid
from fact_execution_Detail
where dw_process_date between '2012-05-01' and '2012-05-31'
and exec_destination_contra = 'ABCD'
)
and p_month = 5
) d
left outer join dim_account a on d.dw_acct_id=a.dw_account_id and a.grandparent_acti in ('00732', '58851', '71685', '00746', '00918', '02896', '09804' )left outer join dim_account a on d.dw_acct_id=a.dw_account_id and a.grandparent_acti in ('00732', '58851', '71685', '00746', '00918', '02896', '09804' )
and d.sourceorderstatus like 'NEW_REQ%'
and d.p_month = 5
) a
group by p_year, p_month, client, grandparent_acti, origacctmnc
order by p_year, p_month, client, grandparent_acti, origacctmnc
Can anyone help?
The highlighted part is eating too much memory, according to log it uses 23GB memory, so I would say there is something to do here.
Thank you very much. Any clue is appreciated.
Again, sorry to post this question here, thanks for understanding.
June 16, 2017 at 12:50 pm
After closely looked into every sub set of query result, I was able to fix the query:
The client (business side) might be using sort of fancy auto tool to create the query, a table contains 200 columns and the query has a select * from that table, which consumes too much un-needed resource.
Thank you for watching. This really has nothing to do with Impala at all, and that's why I post it here hoping to get some clue. I am happy to sort it out myself. Anyway, hope this case could be of some help to you.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply