December 30, 2002 at 12:23 pm
I have a query that's been running fine with the following WHERE clause:
WHERE P.PH1 IS NOT NULL
It froze last week, so I took the query apart and found that if I excluded the above WHERE clause, it ran in 15 seconds.
I then modified the WHERE clause as follows:
WHERE ISNUMERIC(SUBSTRING(P.PH1,1,1)) = 1)
Result: Query still froze.
I then modified the WHERE clause as follows:
WHERE SUBSTRING(P.PH1,1,1) IN ('1','2','3','4','5','6','7','8','9')
Result: Query ran in 3 seconds
What gives?????
December 31, 2002 at 5:14 am
Query optimizer is stumbling somewhere. If you have an index you may have out of date statistics causing this or the index is fragmented or any of serveral other issues.
As a basic practice I do the foloowing once a week on most of my DBs
DBCC DBREINDEX on 7/2000 or DBCC INDEXDEFRAG on 2000
sp_updatestats
DBCC UPDATEUSAGE
ALso, even when things are in clean and smooth state the other queries may execute better based on the condition.
Consider in the ISNUMERIC item you get the value of a particular position in the string, then you check if it is a number and you make sure that the return from ISNUMERIC is 1. In the second thou you get the value of the string position and compare to the a list of string values 1-9 which is much less complex and has a much lower cost per row porcessed. Not to mention the comppiler may build a temp table for the list of values 1-9 and actually perform a join so now you have an added performance enhancement you cannot get the other way.
In the first case thou where you have IS NOT NULL then you may need to add an index or stats are out of wack.
January 2, 2003 at 11:34 am
I think indexes won't help here. IS NOT NULL (or <>) condition has to go through every row anyway. This is different from IS NULL (or =). Also, once you are using a function (ISNUMERIC, SUBSTRING< etc.) on a column, forget about index.
On the other hand, try to replace SUBSTRING( x, 1, 1) with LEFT(x, 1). LEFT is much faster than SUBSTRING.
I think Antares686 is right about temp table that might be created by the optimizer in case of IN('1', ...).
Here is what I whould try in this situation to be able to use index:
WHERE P.PH1 > 0
There is no need to switch to the char types.
Let us know what will happen at the end.
January 6, 2003 at 6:03 pm
Thanks for the help from each of you. mromm seems to be on the right track about indexes - DBCC DEFRAG, sp_udatestats and DBCC updateusage had no effect on the query (but I think that stale indexes will be a problem in the future, so I appreciate your direction on this, antares686!).
Your clues on eliminating NOT constructs were helpful as well as your alerts about the cost of functions in the WHERE clause.
Hoever, subsequent to my posting and your kind replies, the query hung again, despite the syntax
WHERE SUBSTRING(P.PH1,1,1) IN ('1','2','3','4','5','6','7','8','9')
I finally postulated that I was running into an index size and complexity problem. Here's the complete code for the query:
INSERT MOAT_Component
(MxC_Price,
MxC_Type,
MxC_Qty,
MxC_ProcessDate,
MxC_HasReturnEnvelope,
MxC_PrinterPlexType,
MxC_Trailer,
TSP_ID,
MxC_LoadDate,
CL_ID)
SELECT DISTINCT
CASE
WHEN (M.M_PostageRateType = '1' AND
M.M_Trailer = 'N') THEN MAX(TSP.TSP_DomesticPostageRate)
WHEN (M.M_PostageRateType = '2' AND
M.M_Trailer = 'N') THEN MAX(TSP.TSP_MexCanPostageRate)
WHEN (M.M_PostageRateType = '3' AND
M.M_Trailer = 'N') THEN MAX(TSP.TSP_OffContPostageRate)
WHEN (M.M_PostageRateType = '1' AND
M.M_Trailer = 'Y') THEN MAX(TSP.TSP_DomesticPostageRate) +
MAX(TSP.TSP_TrailerSurcharge)
WHEN (M.M_PostageRateType = '2' AND
M.M_Trailer = 'Y') THEN MAX(TSP.TSP_MexCanPostageRate) +
MAX(TSP.TSP_TrailerSurcharge)
WHEN (M.M_PostageRateType = '3' AND
M.M_Trailer = 'Y') THEN MAX(TSP.TSP_OffContPostageRate) +
MAX(TSP.TSP_TrailerSurcharge)
END,
CASE
WHEN M.M_PostageRateType = '1' THEN 'P' --
Domestic
WHEN M.M_PostageRateType = '2' THEN 'M' --
Continental (Mexico/Canada)
WHEN M.M_PostageRateType = '3' THEN 'O' --
Off-Continent
END,
COUNT(P.CID),
@EarliestDate,
M.M_HasReturnEnvelope,
M.M_PrinterPlexType,
M.M_Trailer,
TSP.TSP_ID,
GETDATE(),
M.CL_ID
FROM PACLOADER P INNER JOIN MOAT M ON ((P.CID = M.M_CID) AND
(M.M_RecordID = p.RID) AND (M.M_ProcessDate = @EarliestDate) )
INNER JOIN WorkTransSpec TSP ON
(TSP.CL_ID = M.CL_ID AND TSP.TSP_Name = P.Ltr AND TSP.PG_ID = M.PG_ID)
WHERE (M.M_PricingCode = 'Y') AND
(M.M_RecordType IN ('TrHd','LL','TJ','JL')) AND
(M.M_ServiceLevel IN ('0','1','2','3','4','5')) AND
(P.Prt > '0')
GROUP BY M.CL_ID, M.M_PostageRateType, M.M_Trailer,
M.M_HasReturnEnvelope, M.M_PrinterPlexType, TSP.TSP_ID
The MOAT table is about 15 million rows and growing by about 300,000 records per day. I then use the above query to post records to MOAT_Component. M_ID is an identity column. I decided to change M_ID to a non-identity column and implement an identically structured WorkMOAT table with M_ID as the identity field. I do MAX(MID) on MOAT before the load, add 1 to it and reset the seed on WorkMOAT. I then load to WorkMOAT and use it (only 300,000 records) instead of MOAT to do the complex joins above. I finally INSERT WorkMOAT into MOAT.
Result: The entire load (there are many more pieces)runs quickly and predictably. I assume that my hypothesis about index size is correct, but if you have any other explanations, I'm all ears.
January 6, 2003 at 9:16 pm
1) Migrate all the where clauses into the ON part of the statement if possible. Move the m.(MOAT) to be the first table in the FROM and make the M.M_ProcessDate = @EarliestDate the first in the ON part. Presumably there is a covering date index.
2) Save a cycle or two by shorten the case statements to
CASE M.M_PostageRateType
WHEN '1' THEN MAX(TSP.TSP_DomesticPostageRate) +
CASE When M.M_Trailer = 'Y' Then MAX(TSP.TSP_TrailerSurcharge) Else 0 End
WHEN '2' THEN MAX(TSP.TSP_MexCanPostageRate) +
Case When M.M_Trailer = 'Y' Then MAX(TSP.TSP_TrailerSurcharge) Else 0 End
WHEN '3' THEN MAX(TSP.TSP_OffContPostageRate) +
Case When M.M_Trailer = 'Y' Then MAX(TSP.TSP_TrailerSurcharge) Else 0 End
END,
CASE M.M_PostageRateType
WHEN '1' THEN 'P' -- Domestic
WHEN '2' THEN 'M' -- Continental (Mexico/Canada)
WHEN '3' THEN 'O' -- Off-Continent
END
Interested to know what the final solution was!
January 7, 2003 at 10:15 am
5409045121009, thanks for the ideas on query optimization. Yes, the date field is indexed. I can see the reasons for all your suggestions except placing MOAT as the first table in the FROM clause. I was under the impression that the optimal arrangement is ordering the tables in the FROM clause by ascending table size. Why are you recommending placing MOAT first?
January 7, 2003 at 1:05 pm
AS Antares686 was saying "Query optimiser is stumbling somewhere." I thought it may just tickle it to generate a "better" / different plan. Agree in principal with your statement "ascending table sizes".
January 7, 2003 at 3:13 pm
Actually order of tables does not always work like you would assume. I find the joins faster when you start from the least resultset size to the larger since you deal with the minimal data output that way. There still are no guarantees as the Query Engine makes so odd choices sometimes, just test different ways if any exist.
ANd since you are doing INNER JOINS thru this you may try
FROM PACLOADER P INNER JOIN MOAT M ON ((P.CID = M.M_CID) AND
(M.M_RecordID = p.RID) AND (M.M_ProcessDate = @EarliestDate) )
INNER JOIN WorkTransSpec TSP ON
(TSP.CL_ID = M.CL_ID AND TSP.TSP_Name = P.Ltr AND TSP.PG_ID = M.PG_ID)
WHERE (M.M_PricingCode = 'Y') AND
(M.M_RecordType IN ('TrHd','LL','TJ','JL')) AND
(M.M_ServiceLevel IN ('0','1','2','3','4','5')) AND
(P.Prt > '0')
as
FROM
PACLOADER P
INNER JOIN
MOAT M
ON
(P.CID = M.M_CID AND
M.M_RecordID = p.RID AND
M.M_ProcessDate = @EarliestDate AND
(M.M_PricingCode = 'Y') AND
(M.M_RecordType IN ('TrHd','LL','TJ','JL')) AND
(M.M_ServiceLevel IN ('0','1','2','3','4','5')) AND
(P.Prt > '0') )
INNER JOIN
WorkTransSpec TSP
ON
(TSP.CL_ID = M.CL_ID AND
TSP.TSP_Name = P.Ltr AND
TSP.PG_ID = M.PG_ID)
Or
FROM PACLOADER P INNER JOIN
(SELECT columnsyouneedinyourfinaloutputandtocompletejoins FROM MOAT
WHERE M_ProcessDate = @EarliestDate AND (M_PricingCode = 'Y') AND
(M_RecordType IN ('TrHd','LL','TJ','JL')) AND
(M_ServiceLevel IN ('0','1','2','3','4','5')) ) AS M
ON ((P.CID = M.M_CID) AND
(M.M_RecordID = p.RID))
INNER JOIN WorkTransSpec TSP ON
(TSP.CL_ID = M.CL_ID AND TSP.TSP_Name = P.Ltr AND TSP.PG_ID = M.PG_ID)
WHERE (P.Prt > '0')
And there are some other options, just try them and see what it does to the query output and time.
January 10, 2003 at 12:19 pm
Thanks to all for your help. The most effective tweak by far was to avoid doing joins on the archive table, using instead a temp version of the archive table containing only the data currently being added. Seems that index size (as a function of table size) is a crucial determinant.
January 13, 2003 at 2:27 am
I Agree with Carl. Most of my speed improvemnets was moving the small resultset to a amemory table and then join that large resultset with a small memory table,
or sometimes even better
Count group etc... the bigger resultset into a mem table as well and join both mem tables.... (only if there ar not too many records)
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply