January 7, 2004 at 5:08 pm
Friends,
Here is a quite complicated SQL written for DB2.
The aim of this query is to select a lot of infor about the products BUT only list at the end the rows which have been modified after a given date.
My problem is that it runs for ever. The biggest tables (SB_PRODUCTREPORT, sb_product, product) have approximaty 30K rows which is not ver much. All other contains only severa hundreds or just a dozen of rows. The problem is that this version of DB2 doesnot support the temporary tables.
Normally this query should run not more then a minute.
Any idea?
SELECT
pd.pd_pdref,
af.af_pct,
pd.pd_author,
pd.pd_barcode,
bc.bc_name,
pd.pd_can_cons,
ca.ca_name,
CASE WHEN (pd.pd_dsref = 4 AND pd.pd_stock_qty = 0) THEN 'Utánrendelheto' ELSE st.ds_name END as ds_name,
pd.pd_edition_year,
pd.pd_has_image,
pd.pd_hun_title,
pd.pd_lang,
pd.pd_nov_exp,
pd.pd_nov_msg,
pd.pd_numpages,
pd.pd_orig_title,
ad.ad_name,
py.py_name,
pd.pd_serial_name,
pd.pd_serial_num,
pd.pd_spec_exp,
pd.pd_spec_msg,
pd.pd_stock_qty,
pd.pd_vtref,
pr.prheight,
pr.prlngth,
pr.prwght,
pr.prwidth,
pr.prldesc1,
pr.prnbr,
REP.PP_RETAILPRICE,
REP.DI_PCT
FROM DB2NC.SB_PRODUCTREPORT REP JOIN db2nc.sb_product pd ON REP.pd_pdref = pd.pd_pdref
JOIN db2nc.sb_afakod af ON pd.pd_afref=af.af_afref
LEFT JOIN db2nc.sb_binder_code bc ON pd.pd_bcref=bc.bc_bcref
JOIN db2nc.sb_prod_cat ca ON pd.pd_caref=ca.ca_caref
JOIN db2nc.sb_prod_stat st ON pd.pd_dsref=st.ds_dsref
JOIN db2nc.sb_partner pn ON pd.pd_puref=pn.pn_pnref
JOIN db2nc.sb_address ad ON pn.pn_sarfnbr_hq=ad.ad_adref
JOIN db2nc.sb_prod_type py ON pd.pd_pyref=py.py_pyref
JOIN db2nc.product pr ON pd.pd_pdref=pr.prrfnbr
WHERE REP.BS_BSREF = 318515
AND (EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'PD' AND MT_CODE = PD.PD_PDREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'AF' AND MT_CODE = AF_AFREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'BC' AND MT_CODE = BC_BCREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'CA' AND MT_CODE = CA_CAREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'ST' AND MT_CODE = DS_DSREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'PN' AND MT_CODE = PN_PNREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'AD' AND MT_CODE = AD_ADREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'PY' AND MT_CODE = PY_PYREF AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_TABLE = 'PR' AND MT_CODE = PRRFNBR AND MT_DATE >= '2003-11-29 00:00:00.000')
OR EXISTS(SELECT REP1.PD_PDREF FROM DB2NC.SB_PRODUCTREPORT REP1
WHERE REP1.BS_BSREF = REP.BS_BSREF AND REP1.PD_PDREF = REP.PD_PDREF
AND (REP1.PP_STARTDATE >= '2003-11-29' OR REP1.di_startdate >= ('2003-11-29')))
Bye
Gabor
January 8, 2004 at 2:59 am
This looks like an indexing problem.
Can you execute this query with set showplan_text on and paste the results here? This will allow us to look at the query plan for you.
Also it looks like your long list of exists statements could be reduced to a single statement. Something like
Select distinct g.MT_TABLE, g.lastdate, m.MT_CODE
from (
SELECT MT_TABLE, max(MT_DATE) as 'lastdate'
FROM DB2NC.SB_MODIFIED_TABLES
GROUP BY MT_TABLE
) g join DB2NC.SB_MODIFIED_TABLES m on g.MT_TABLE = m.MT_TABLE and g.lastdate = m.MT_DATE
where m.MT_DATE >= '2003-11-29 00:00:00.000'
This would give you the tables and codes last updated after that date in a smaller statement, but whether it was quicker would depend on the plan
Keith Henry
January 8, 2004 at 10:53 am
Keith,
Thanks for reply.
Some infos:
All indexes are fine and they are used in the query. With the exception of one table SB_PRODUCTREPORT where a table scan is used which is normal, because I'm going trough every rows. This is the base of the report and I'm joining all the other tables to it.
The SB_MODIFIED_TABLES is a special table which has 3 columns, MT_CODE, MT_TABLE, MT_DATE.
The purpose of this table is to store through the triggers on insert/update the modified rows of the parent tables here.
As an example take the table db2nc.sb_prod_cat which contains the product category. When I change now a product category #10, I will insert into the SB_MODIFIED_TABLES the followin row:
insert into SB_MODIFIED_TABLES(MT_CODE, MT_TABLE, MT_DATE) (values 'CA', 10, getdate())
The aim of my complicated query is to select ONLY the modified rows and not the whole product table.
That's why I think that you need all those exists subqueries. Because I have to select every rows where the product category, the price, the statistics... has been changed.
Bye
Gabor
January 8, 2004 at 7:05 pm
I'm not certain I understand your problem correctly from your post but it looks to me like what you really want is all the records from the SB_PRODUCTREPORT and SB_PRODUCT tables and only those from the rest of the tables where the joins are correct and the data is in the modified table. If that is the case try using something like this...
SELECT PD.pd_pdref,
AF.af_pct,
PD.pd_author,
PD.pd_barcode,
BC.bc_name,
PD.pd_can_cons,
CA.ca_name,
CASE WHEN (PD.pd_dsref = 4 AND PD.pd_stock_qty = 0) THEN 'Utánrendelheto'
ELSE st.ds_name
END as ds_name,
PD.pd_edition_year,
PD.pd_has_image,
PD.pd_hun_title,
PD.pd_lang,
PD.pd_nov_exp,
PD.pd_nov_msg,
PD.pd_numpages,
PD.pd_orig_title,
AD.ad_name,
PY.py_name,
PD.pd_serial_name,
PD.pd_serial_num,
PD.pd_spec_exp,
PD.pd_spec_msg,
PD.pd_stock_qty,
PD.pd_vtref,
PR.prheight,
PR.prlngth,
PR.prwght,
PR.prwidth,
PR.prldesc1,
PR.prnbr,
REP.PP_RETAILPRICE,
REP.DI_PCT
FROM SB_PRODUCTREPORT REP
JOIN SB_MODIFIED_TABLES MT ON MT.MT_CODE = PD.PD_PDREF
JOIN sb_product PD ON REP.pd_pdref = PD.pd_pdref
LEFT JOIN sb_afakod AF ON PD.pd_afref = AF.af_afref AND MT.MT_TABLE = 'AF'
LEFT JOIN sb_binder_code BC ON PD.pd_bcref = BC.bc_bcref AND MT.MT_TABLE = 'BC'
LEFT JOIN sb_prod_cat CA ON PD.pd_caref = CA.ca_caref AND MT.MT_TABLE = 'CA'
LEFT JOIN sb_prod_stat ST ON PD.pd_dsref = ST.ds_dsref AND MT.MT_TABLE = 'ST'
LEFT JOIN sb_partner PN ON PD.pd_puref = PN.pn_pnref AND MT.MT_TABLE = 'PN'
LEFT JOIN sb_address AD ON PN.pn_sarfnbr_hq = AD.ad_adref AND MT.MT_TABLE = 'AD'
LEFT JOIN sb_prod_type PY ON PD.pd_pyref = PY.py_pyref AND MT.MT_TABLE = 'PY'
LEFT JOIN product PR ON PD.pd_pdref = PR.prrfnbr AND MT.MT_TABLE = 'PR'
WHERE REP.BS_BSREF = 318515
AND MT_DATE >= '2003-11-29 00:00:00.000'
Please note I did remove the "db2nc." and it should be added back if needed.
Gary Johnson
Microsoft Natural Language Group
DBA, Sr. DB Engineer
This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.
January 8, 2004 at 8:33 pm
With reference to gljjr reply:
Will it make much of a difference if "REP.BS_BSREF = 318515 AND MT_DATE >= '2003-11-29 00:00:00.000'" is moved into the relevant ON part.
Lastly I have the habit of casting string dates to DateTime.
AND MT_DATE >= Cast('2003-11-29 00:00:00.000' as DateTime)
January 8, 2004 at 9:16 pm
Without having data to test with I can only shoot a bit here but this should work better simple because it replaces the exists which are you primary reason for bogdown.
SELECT
pd.pd_pdref,
af.af_pct,
pd.pd_author,
pd.pd_barcode,
bc.bc_name,
pd.pd_can_cons,
ca.ca_name,
(CASE WHEN (pd.pd_dsref = 4 AND pd.pd_stock_qty = 0) THEN 'Utánrendelheto' ELSE st.ds_name END) as ds_name,
pd.pd_edition_year,
pd.pd_has_image,
pd.pd_hun_title,
pd.pd_lang,
pd.pd_nov_exp,
pd.pd_nov_msg,
pd.pd_numpages,
pd.pd_orig_title,
ad.ad_name,
py.py_name,
pd.pd_serial_name,
pd.pd_serial_num,
pd.pd_spec_exp,
pd.pd_spec_msg,
pd.pd_stock_qty,
pd.pd_vtref,
pr.prheight,
pr.prlngth,
pr.prwght,
pr.prwidth,
pr.prldesc1,
pr.prnbr,
REP.PP_RETAILPRICE,
REP.DI_PCT
FROM
DB2NC.SB_PRODUCTREPORT REP
INNER JOIN
db2nc.sb_product pd
ON
REP.pd_pdref = pd.pd_pdref
INNER JOIN
db2nc.sb_afakod af
ON
pd.pd_afref = af.af_afref
LEFT JOIN
db2nc.sb_binder_code bc
ON
pd.pd_bcref = bc.bc_bcref
INNER JOIN
db2nc.sb_prod_cat ca
ON
pd.pd_caref = ca.ca_caref
INNER JOIN
db2nc.sb_prod_stat st
ON
pd.pd_dsref = st.ds_dsref
INNER JOIN
db2nc.sb_partner pn
ON
pd.pd_puref = pn.pn_pnref
INNER JOIN
db2nc.sb_address ad
ON
pn.pn_sarfnbr_hq = ad.ad_adref
INNER JOIN
db2nc.sb_prod_type py
ON
pd.pd_pyref = py.py_pyref
INNER JOIN
db2nc.product pr
ON
pd.pd_pdref = pr.prrfnbr
LEFT JOIN
DB2NC.SB_MODIFIED_TABLES MT
ON
MT.MT_CODE = (CASE MT.MT_TABLE
WHEN 'PD' THEN PD.PD_PDREF
WHEN 'AF' THEN AF.AF_AFREF
WHEN 'BC' THEN BC.BC_BCREF
WHEN 'CA' THEN CA.CA_CAREF
WHEN 'ST' THEN DS.DS_DSREF
WHEN 'PN' THEN PN.PN_PNREF
WHEN 'AD' THEN AD.AD_ADREF
WHEN 'PY' THEN PY.PY_PYREF
WHEN 'PR' THEN PR.PRRFNBR)
END)
AND MT.MT_DATE >= '20031129'
LEFT JOIN
DB2NC.SB_PRODUCTREPORT REP1
ON
REP1.BS_BSREF = REP.BS_BSREF AND
REP1.PD_PDREF = REP.PD_PDREF AND
(
REP1.PP_STARTDATE >= '20031129' OR
REP1.di_startdate >= '20031129'
)
WHERE
REP.BS_BSREF = 318515 AND
(
MT.MT_CODE IS NOT NULL OR
REP1.PD_PDREF IS NOT NULL
)
If I see anything else to help I will reply.
January 9, 2004 at 1:09 am
I guess you're stuck to the exists, because any join could give n-rows because of more than one MT_TABLE qualifies.
How about this ? (check the execution plan)
SELECT pd.pd_pdref,
af.af_pct,
pd.pd_author,
pd.pd_barcode,
bc.bc_name,
pd.pd_can_cons,
ca.ca_name,
CASE WHEN (pd.pd_dsref = 4 AND pd.pd_stock_qty = 0)
THEN 'Utánrendelheto'
ELSE st.ds_name
END as ds_name,
pd.pd_edition_year,
pd.pd_has_image,
pd.pd_hun_title,
pd.pd_lang,
pd.pd_nov_exp,
pd.pd_nov_msg,
pd.pd_numpages,
pd.pd_orig_title,
ad.ad_name,
py.py_name,
pd.pd_serial_name,
pd.pd_serial_num,
pd.pd_spec_exp,
pd.pd_spec_msg,
pd.pd_stock_qty,
pd.pd_vtref,
pr.prheight,
pr.prlngth,
pr.prwght,
pr.prwidth,
pr.prldesc1,
pr.prnbr,
REP.PP_RETAILPRICE,
REP.DI_PCT
, RepCrit.PD_PDREF
FROM DB2NC.SB_PRODUCTREPORT REP
INNER JOIN db2nc.sb_product pd ON REP.pd_pdref = pd.pd_pdref
INNER JOIN db2nc.sb_afakod af ON pd.pd_afref=af.af_afref
LEFT JOIN db2nc.sb_binder_code bc ON pd.pd_bcref=bc.bc_bcref
INNER JOIN db2nc.sb_prod_cat ca ON pd.pd_caref=ca.ca_caref
INNER JOIN db2nc.sb_prod_stat st ON pd.pd_dsref=st.ds_dsref
INNER JOIN db2nc.sb_partner pn ON pd.pd_puref=pn.pn_pnref
INNER JOIN db2nc.sb_address ad ON pn.pn_sarfnbr_hq=ad.ad_adref
INNER JOIN db2nc.sb_prod_type py ON pd.pd_pyref=py.py_pyref
INNER JOIN db2nc.product pr ON pd.pd_pdref=pr.prrfnbr
left join
(SELECT REP1.PD_PDREF, REP1.BS_BSREF
FROM DB2NC.SB_PRODUCTREPORT REP1
AND (REP1.PP_STARTDATE >= '2003-11-29' --index ?
OR REP1.di_startdate >= '2003-11-29' ) --index ?
group by REP1.PD_PDREF, REP1.BS_BSREF
) RepCrit
on RepCrit.BS_BSREF = REP.BS_BSREF
AND RepCrit.PD_PDREF = REP.PD_PDREF
WHERE REP.BS_BSREF = 318515
and ( RepCrit.PD_PDREF is not null
or exists (select *
FROM DB2NC.SB_MODIFIED_TABLES
WHERE MT_DATE >= '2003-11-29 00:00:00.000' -- index ?
--and MT_TABLE in ('PD','AF','BC','CA','ST','PN','AD','PY','PR') -- index ??
and case MT_TABLE
when 'PD' then PD.PD_PDREF
when 'AF' then AF_AFREF
when 'BC' then BC_BCREF
when 'CA' then CA_CAREF
when 'ST' then DS_DSREF
when 'PN' then PN_PNREF
when 'AD' then AD_ADREF
when 'PY' then PY_PYREF
when 'PR' then PRRFNBR
end
)
)
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
January 9, 2004 at 1:58 am
Lastly I have the habit of casting string dates to DateTime.
AND MT_DATE >= Cast('2003-11-29 00:00:00.000' as DateTime)
This is implit anyway, so it will make no difference to the plan, but you should always do an explicit cast, usually using convert rather than cast because then you can specify the date format, and this means that you are not dependant on the windows date settings of the server to recognise the string.
Keith Henry
January 9, 2004 at 3:32 pm
Thanks guys for all the replies.
I've liked very the solution of Antares and alzdba but those solutions is still doing a table scan on SB_MODIFIED_TABLES.
As info in the SB_MODIFIED_TABLES there are over 40K rows and in the SB_PRODUCT table over 30K rows.
So you can imagine how fast can be 40K times a full table scan of 40K rows!
Gary's solution is using all the indexes but and retrieves the main data for the modified rows but all the columns which contain the data coming from the left joined AND NOT MODIFIED tables are NULL. Which of cours is not correct.
Bye
Gabor
January 9, 2004 at 4:26 pm
As an additional info a propiety (price, partner info...) could have been modified more then once since the given date.
We only have to retrieve each product info once, regardles how often andhow many of its propiety it has been changed.
Bye
Gabor
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply