August 15, 2002 at 7:01 pm
Hi,
This issue is not really a problem - I'm more curious to know if there is any way to improve the performance of a stored proc which effectively mines data every morning.
The situation is the daily rebuild of a reporting table, which is based from a transaction history table containing around 30,000,000 records. The sql to do this is as follows:
SELECT * INTO reporting_table
FROM transaction_history th1
WHERE NOT EXISTS
(SELECT *
FROM transaction_history th2
WHERE th1.field1 = th2.field1 AND
th1.field2 = th2.field2 AND
th1.field3 = th2.field3 AND
th2.trans_date > th1.trans_date
)
As you can see, the table is referencing itself to get the last existing record based on the required key(fields). The process takes around 1hr 15min to get approx 1.7 million records.
Is there a way to improve the performance with an index (index tuning wizard was no real help), or a change in the structure of the sql.
I've already thought of changing the process to be a cumulative effect instead (finding only those records that haven't been selected already), but I was wondering if there was anything else that could enhance what is currently.
thanks
August 15, 2002 at 8:02 pm
What do you have indexed? Where is the clustered index? You could optimize some by creating a permanent table and just truncating it each time. Not sure how good 'not exists' is, nots are usually not great. Certainly changing to only process changed records might help. How about using a trigger to keep the reporting table current as you go? Do you really need to select * into the reporting table? How about just putting the primarykey? That would reduce disk io a lot. What kind of hardware do you have?
Andy
August 16, 2002 at 12:45 am
I've tried different indexes with little result. A cluster index isn't really going to help, considering that the history table is updated daily with 5000+ records.
I don't particularly like "Not Exists", yet the only other way of trying to achieve the same ends is by using a cursor (yuck!).
Unfortunately the table only has 7 columns and the primary key is a compound of 5 of them. Apart from that, I also need all fields for other purposes.
Platform is NT 4.0, running on SQL 7 on twin PIII 550 with 780Mb RAM, SCSI equipped.
I'm experimenting with a cumulative option, and will see how it goes....
August 16, 2002 at 4:44 am
If you put just the primary key into your table you'd just have to join back to get the other details. Would be interesting to see if it made much difference, you could just add an identity col to your table and index unique no nulls, treat it like a pkey.
Suggest post the query asking for alternative solutions, pretty sure can be done without not exists without a cursor.
Not sure I agree about clustered - then again, I havent tested your scenario!
780m is not a lot. Hows the server holding up during the query? Could you post the query plan?
Andy
August 16, 2002 at 5:03 am
This might be a method to get rid of the NOT EXISTS ... (I think). Don't know about performance though...
SELECT * INTO reporting_table
FROM transaction_history th1
WHERE th1.trans_date =
(SELECT MAX(th2.trans_date) FROM
transaction_history th2
WHERE th1.field1 = th2.field1 AND
th1.field2 = th2.field2 AND
th1.field3 = th2.field3)
Edited by - NPeeters on 08/16/2002 05:03:13 AM
August 18, 2002 at 11:53 pm
cheers,
gave the max date a whirl, but it was still going after 2 hours, so not much benefit there.
I tried a cumulative option, but it's harder than I first thought, since every record added may be replacing one already there - ie. it's the last record occurence for that group, so adding records isn't the go.
I'll post the question on T-sql forum and see what people suggest.
thanks
August 19, 2002 at 11:00 pm
A different thought.....
Apprently there is not much can be done but if you follow this method performance will certainly improve......
SP_PH_WHO2
SELECT
FIELD1 + FIELD3 + FIELD3 AS COMB_FIELDS,
FIELD1,
FIELD3,
FIELD3,
TRANS_DATE
INTO ##TMP_TRANSACTION_HISTORY
FROM TRANSACTION_HISTORY
CREATE CLUSTERED INDEX P_##TMP_TRANSACTION_HISTORY ON ##TMP_TRANSACTION_HISTORY(COMB_FIELDS,TRANS_DATE) WITH PAD_INDEX,FILLFACTOR = 100,STATISTICS_NORECOMPUTE
SELECT * INTO REPORTING_TABLE
FROM ##TMP_TRANSACTION_HISTORY TH1
WHERE NOT EXISTS
(SELECT *
FROM ##TMP_TRANSACTION_HISTORY TH2
WHERE TH1.COMB_FIELDS = TH2.COMB_FIELDS AND
TH2.TRANS_DATE > TH1.TRANS_DATE
)
DROP TABLE ##TMP_TRANSACTION_HISTORY
This method will not take more than 30 minutes on 3 million records.
Three things are here
1. Adding theses columns will not create any problem as you want exactly theses three columns to be matched
2. Move processing into tempdb as that is usually 0+1 drive and logging is much less there.
3. Clustered index will make it very fast for comparing
In REPORTING_TABLE table if you need some extra columns to be added add it in the first step while inserting data into ##tmp table.
Cheers,
Prakash
Prakash Heda
Lead DBA Team - www.sqlfeatures.com
Video sessions on Performance Tuning and SQL 2012 HA
August 20, 2002 at 5:55 pm
Note really sure you need the subquery as you have it.
SELECT * INTO reporting_table
FROM transaction_history th1
WHERE th1.trans_date =
(SELECT MAX(th2.trans_date) FROM
transaction_history th2
WHERE th1.field1 = th2.field1 AND
th1.field2 = th2.field2 AND
th1.field3 = th2.field3)
or
SELECT * INTO reporting_table
FROM transaction_history th1
WHERE NOT EXISTS
(SELECT *
FROM transaction_history th2
WHERE th1.field1 = th2.field1 AND
th1.field2 = th2.field2 AND
th1.field3 = th2.field3 AND
th2.trans_date > th1.trans_date
)
Looks to me like you want to get the MAX value of th1.trans_date but you are adding overhead by having to match values first.
Try
WHERE th1.trans_date = (SELECT MAX(th2.trans_date) FROM transaction_history th2)
Now the other thing is SELECT INTO is notorious for locking on tables and can lead to other issues.
Instead it is considered to do INSERT (INTO) instead. Now I assume the reporting_table is already built so you should be able to do
INSERT reporting_table (collist)
SELECT collist FROM transaction_history th1
WHERE th1.trans_date = (SELECT MAX(th2.trans_date) FROM transaction_history th2)
If this is slow then try do
SET SHOWPLAN_TEXT ON
GO
INSERT reporting_table (collist)
SELECT collist FROM transaction_history th1
WHERE th1.trans_date = (SELECT MAX(th2.trans_date) FROM transaction_history th2)
GO
SET SHOWPLAN_TEXT OFF
GO
And post the execution plan here so we can look at what is happening and offer suggestions.
"Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)
August 22, 2002 at 12:23 am
Confused, can you explain what exactly are you trying?
Finding newly added records or for each combination of (fld1 + fld2 + fld3) latest records........
Cheers,
Prakash
Prakash Heda
Lead DBA Team - www.sqlfeatures.com
Video sessions on Performance Tuning and SQL 2012 HA
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply