October 24, 2006 at 10:18 am
I have been asked to make recommendation to this worst-case query. It is created by a reporting application that gives the user the ability to select "anything"...
I will run an explain plan to confirm; but my memory (and granted, Oracle experience) tells me that three or more ANDs and/or ORs reverts to a sequential search without indices.
Short of telling them "don't do this" what do I tell my client?
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 11:44 am
Actually a single or can be enough to ignore an index.
I'm curious about the 165 ors. It this something you could put in a table and join to it??
Also it would be very helpfull to see a little more of the query if you want a more specific answer .
October 24, 2006 at 2:28 pm
Okay, I had to clean it up a little bit so as not to "reveal" my customer... and I snipped out of the middle to make it shorter:
SELECT COALESCE(CAST(client_contacts.DateContacted As smalldatetime), '') As DateContacted, COALESCE([objects].ClientLastName, '') As ClientLastName, COALESCE([objects].ClientFirstName, '') As ClientFirstName, COALESCE([ranks].Rate, '') As Rank, COALESCE([objects].MaritalStatus, '') As MaritalStatus, (SELECT ats_units.UnitName FROM ats_unitsWHERE ObjectID = [objects].Unit) AS Unit, COALESCE(client_contacts.ServiceID, '') As ServiceID, COALESCE([objects].Branch, '') As Branch, COALESCE([objects].Status, '') As Status, COALESCE(client_contacts.ReferralSource, '') As ReferralSource, COALESCE(x.Name, '') As PRP, COALESCE(x_genders.Name, '') As Gender, XWHierarchy.Name As CommunityID, COALESCE(client_contacts.StaffContact, '') As StaffMember, COALESCE(client_contacts.CaseType, '') As CaseType, COALESCE(CAST([objects].TAFMSD As smalldatetime), '') As TAFMSD, [objects].ObjectID, COALESCE(x_genders.Name, '') As SummaryGender, COALESCE([objects].MaritalStatus, '') As SummaryMaritalStatus, COALESCE(x.Name, '') As SummaryPRP, COALESCE([objects].Branch, '') As SummaryBranch, COALESCE([objects].Status, '') As SummaryDutyStatus FROM client_contactsINNER JOIN [objects] ON [objects].ObjectID = client_contacts.ClientID LEFT JOIN XWHierarchy ON client_contacts.OrigCommunityID = XWHierarchy.LegacyID LEFT OUTER JOIN x_genders ON [objects].Gender = x_genders.Value LEFT OUTER JOIN [ranks] ON [objects].Rank = [ranks].ItemID LEFT OUTER JOIN x ON [objects].PRP = x.Value WHERE (client_contacts.OrigCommunityID = '30167' OR client_contacts.OrigCommunityID = '30167' <SNIP>OR client_contacts.OrigCommunityID = '30260' OR client_contacts.OrigCommunityID = '30261' OR client_contacts.OrigCommunityID = '30262') AND (client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006')I ran some explain plans and there are only three full table scans... two of those tables each have only two rows (Yes/No, Male/Female) and the third only 164 rows, so that shouldn't be the problem (no indices). Although, they are the biggest CPU hit.
I also tried putting the ORed info in an "into list", and then a wildcard "like '%'" (which returns the same result as this query. This query was given to me as the "worst case scenario."
I am not finding anything. The temp table idea sounded good; but I think the "in list" would be about the same result.
Ideas appreciated.
David
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 2:42 pm
Are these
OR client_contacts.OrigCommunityID = '30260'
hardcoded in your query or built using some kind of dynamic sql from the application?
October 24, 2006 at 2:48 pm
They are built dynamically in the report writer application. The customer specifies what they want to see... and at the "store" level they are okay, performance wise, can only see what's in their store. but at headquarters, they can see everything, and that's where the problem is... 5 minutes, or application time out.
I could have some leverage over how it is parsed, if I can make a statement as to why something else is necessary. The developer's are in this building, and I am charging to them for this task. They want recommendations.
Got any?
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 2:48 pm
Guessing that it's built in the query by the application.
Anyways this :
WHERE (client_contacts.OrigCommunityID = '30167' OR client_contacts.OrigCommunityID = '30167' <SNIP>OR client_contacts.OrigCommunityID = '30260' OR client_contacts.OrigCommunityID = '30261' OR client_contacts.OrigCommunityID = '30262') is the equivalent of joining 2 tables. You certainly could use a permenant table to keep those ids and use a join. I'm sure you'd get a boost with correct indexing, but it's hard to tell you how much without having much more information.October 24, 2006 at 2:51 pm
Well, it could be a permanent table; but the contents would have to float based on customer input... I will try. Not sure how this will be any better than the "in list" syntax though.
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 3:45 pm
Here are some of the results from explain plan before and after your recommended change. The change added five steps to the expain plan, and seems to have increased the overall time to execute - hard to judge; but at least not subjective.
Before (with multiple ORs) 16 steps
Estimated RowsIO CPU AVG ROW Total Subtree
27.663755 NULL NULL NULL 0.22645532 27.663755 0.0 2.7663755E-6 1294 0.22645532 27.663755 0.0 1.156345E-4 2753 0.22645254 27.663755 0.0 2.3126899E-4 2627 0.21946308 27.663755 0.0 1.156345E-4 2564 0.17730561 27.663755 0.0 2.3126899E-4 2434 0.17170592 27.663755 0.0 1.8278537E-2 2397 0.12954845 26.663755 0.0 1.114545E-4 880 6.0498733E-2 26.663755 0.0 0.01149122 464 5.3103823E-2 26.663755 4.1282203E-2 3.3040001E-4 464 4.1612603E-2 1.0 3.2034251E-3 7.9603E-5 426 7.2834538E-3 164.0 4.0541463E-2 2.5889999E-4 1526 4.9459316E-2 2.0 3.7578501E-2 8.0700003E-5 45 4.1904069E-2 1.0 3.2034251E-3 7.9603997E-5 139 5.4840706E-3 2.0 3.7578501E-2 8.0700003E-5 71 4.1904069E-2 1.0 3.2034251E-3 7.9603E-5 136 6.8738232E-3
After (with table containing value and joined) 21 steps
Estimated Rows IO CPU AVG ROW Total Subtree
72.0 NULL NULL NULL 0.3119736 72.0 0.0 7.1999998E-6 1294 0.3119736 72.0 0.0 0.00030096 1323 0.31196639 72.0 0.0 6.0192001E-4 1197 0.30047402 72.0 0.0 0.00030096 1134 0.25085211 72.0 0.0 6.0192001E-4 1004 0.24153772 72.0 3.4429999E-3 6.6084801E-3 967 0.1919158 71.0 0.0 0.00029678 861 0.12781723 71.0 0.002504 7.6416912E-3 443 0.11508917 71.0 1.1261261E-2 7.8158401E-4 438 5.3856969E-2 71.0 4.1282203E-2 3.3040001E-4 464 4.1612603E-2 163.0 1.1261261E-2 1.9686506E-3 16 5.1082511E-2 163.0 0.0 0.0000163 16 0.0378526 163.0 3.7578501E-2 2.5780001E-4 20 3.7836298E-2 1.0 3.2034251E-3 7.9603E-5 426 1.2431274E-2 164.0 1.1261261E-2 1.9824712E-3 115 5.4044094E-2 164.0 4.0541463E-2 2.5889999E-4 1526 4.0800363E-2 2.0 3.7578501E-2 8.0700003E-5 45 4.8962399E-2 1.0 3.2034251E-3 7.9603997E-5 139 9.0134135E-3 2.0 3.7578501E-2 8.0700003E-5 71 4.8962399E-2 1.0 3.2034251E-3 7.9603E-5 136 1.1191417E-2
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 4:05 pm
Yes, the permanent table is the best way to do it. Especially when you deal with big sets.
But I know how it's difficult for developers to think in "table" way. Almost impossible.
So in that case you may concatenate all yours OrigCommunityID into delimited string and use Split table function inside SP.
Then yu may join this function to the query:
...
INNER JOIN dbo.SplitIDs(@StrOrigCommunityIDs, ',') V ON client_contacts.OrigCommunityID = V.IDValue
It still does not use index, so when it's too big for your memory it will cause delays.
_____________
Code for TallyGenerator
October 24, 2006 at 4:09 pm
I respect your advice; but doesn't the explain plan I posted say the opposite?
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 4:16 pm
BTW, speaking of explain plans... I decided that I had better test production to see that I was getting the same result in the plan... and I am not. Here are the same rows from that DB:
Production:
Estimated Rows IO CPU AVG ROW Total Subtree
83975.727 NULL NULL NULL 90.123978 83975.727 0.0 8.3975727E-3 1246 90.123978 83975.727 0.0 1.5245925 409 90.115578 10534.04.2122584E-2 5.8329501E-3 170 9.5911063E-2 83975.727 0.0 0.48099968 372 88.495071 2.0 0.0 0.0000002 17 3.7659399E-2 2.0 3.7578501E-2 8.0700003E-5 26 3.7659202E-2 83975.727 0.0 0.6598534 369 87.97641 28.0 3.7578501E-2 0.0001093 183 3.7687801E-2 83975.727 0.0 8.3975727E-3 317 87.27887 83975.727 0.0 0.6585502 321 87.27047 2.0 0.0 0.0000002 40 3.7659399E-2 2.0 3.7578501E-2 8.0700003E-5 45 3.7659202E-2 83975.7270.0 0.68848419 302 86.574257 160.0 4.0541463E-2 2.5449999E-4 1514 4.0795963E-2 83975.727 0.0 19.206863 200 85.844971 84096.414 0.0 0.63704693 96 12.384162 164.0 0.0 1.64157E-4 11 1.64157E-4 84757.367 11.301282 0.2476269 693 11.548909 787198.0 53.387947 0.8659963 273 54.253944
But, it is a TOTALLY different plan... I think the first thing for me to do is get the database locally and perform maintenance to get it acting the same way at both places.
Keep coming with the ideas and recommendations though... I will try them as long as I have a charge number... and THANKS! David
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 24, 2006 at 4:57 pm
Don't see anything opposite.
Keep in mind that "OR" means "rescan" (or "re-seek", if you are that lucky to have proper indexes set up ).
Anyway, it means "repeat search". If you have 165 "OR"s than you have 165 queries united implicitly, behind the scene.
Join memory table will almost definitely perform HASH JOIN, but it will do it once. In your case it will be much faster.
_____________
Code for TallyGenerator
October 25, 2006 at 12:49 am
I'm no expert here, but here are some ideas:
1) Measure actual I/O. (I rarely can seem to read query plans...)
2) Try iteratively simplifying the query in steps until it's faster. Eg. create a sample query, remove one join at a time, run each until you see an order of magnitude speedup. Or sometimes you can try the reverse: start with a simple query, and add one join at a time.
3) Try changing the correlated subquery in the column list with a LEFT OUTER JOIN. I'm also wondering if this value could be denormalized into the table, or if you could create an indexed view (materialized) with this value already in it.
(SELECT ats_units.UnitName
FROM ats_units
WHERE ObjectID = [objects].Unit)
4) Might it make a difference to use integers instead of string literals in your OR list? Sometimes datatypes affect index usage.
5) Maybe a materialized view could be created on top of this table with all your joins in it already, so that you can query a denormalized table directly. Use WITH SCHEMABINDING, and create a clustered index, perhaps on OrigCommunityID, DateContacted. Just be aware that updates to the base table will be slower if you create a materialized view.
6) A clustered index on date might help, if most of your queries are confined to small date ranges. It might be tempting to add OrigCommunityID, too, but be careful when creating a clustered index not to use too many columns. Also, if a date is the first column of the clustered index key, and there are varying time values in the dates, then I wouldn't add any more columns to the key.
6) Some wild speculation: I wonder if text search might work better for this. Making keywords out of the OrigCommunityID seems a bit strange, and I suspect a "hash join" is similar to what text search does, anyway, so I rather doubt this would help, unless you can make keywords out of the date ranges too. Think of it as an optimization similar to bounding boxes used for geographic lookups, or to the technique of storing a hash column in a table to speed certain queries. You would create change the table to have a text column, backfill it with keywords relating to your data, then at search time you would generate corresponding keywords, and add them to your query, then filter those results with your more precise criteria. So you'd have to add a column, we'll call it textkeywords, and your query might contain:
... AND CONTAINS(textkeywords, '(DT200510 OR DT200511 OR ... DT200610) AND (CommID30167 OR CommID30260 [...] OR CommID30262)')
AND (client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006')
I know, this is complicated to set up and requires some schema changes, backfilling, update insert and delete triggers, so may not be practical for you, but if you frequently performing complicated queries against this table perhaps it's worth it. You would want to run some benchmarks against a sample dataset first. Again, I think a hash join should be almost the same thing (?). Text indexing typically use bitmap joins, I think, which can be quite quick, as evidenced by Google, etc.
You would have to schedule updates to the text index, if you use this approach.
October 25, 2006 at 8:35 am
There are joins to look up the values "Yes/No" and "Male/Female"? OK...
Why not try to eliminate those? It may not help much, but you said there was a lot of CPU time from the full table scans. You can replace those tables with a CASE expression.
October 25, 2006 at 9:24 am
This is a complete stab in the dark but this is the initial approach I would take...
SELECT COALESCE(CAST(client_contacts.DateContacted As smalldatetime), '') As DateContacted,
COALESCE([objects].ClientLastName, '') As ClientLastName,
COALESCE([objects].ClientFirstName, '') As ClientFirstName,
COALESCE([ranks].Rate, '') As Rank,
COALESCE([objects].MaritalStatus, '') As MaritalStatus,
U.UnitName AS Unit,
COALESCE(client_contacts.ServiceID, '') As ServiceID,
COALESCE([objects].Branch, '') As Branch,
COALESCE([objects].Status, '') As Status,
COALESCE(client_contacts.ReferralSource, '') As ReferralSource,
COALESCE(x.Name, '') As PRP,
COALESCE(x_genders.Name, '') As Gender, XWHierarchy.Name As CommunityID,
COALESCE(client_contacts.StaffContact, '') As StaffMember,
COALESCE(client_contacts.CaseType, '') As CaseType,
COALESCE(CAST([objects].TAFMSD As smalldatetime), '') As TAFMSD, [objects].ObjectID,
COALESCE(x_genders.Name, '') As SummaryGender,
COALESCE([objects].MaritalStatus, '') As SummaryMaritalStatus,
COALESCE(x.Name, '') As SummaryPRP,
COALESCE([objects].Branch, '') As SummaryBranch,
COALESCE([objects].Status, '') As SummaryDutyStatus
FROM client_contacts
INNER JOIN [objects] ON [objects].ObjectID = client_contacts.ClientID
LEFT JOIN XWHierarchy ON client_contacts.OrigCommunityID = XWHierarchy.LegacyID
LEFT OUTER JOIN x_genders ON [objects].Gender = x_genders.Value
LEFT OUTER JOIN [ranks] ON [objects].Rank = [ranks].ItemID
LEFT OUTER JOIN x ON [objects].PRP = x.Value
INNER JOIN ats_units as U ON U.ObjectID = [objects].Unit
WHERE client_contacts.OrigCommunityID IN ('30167', '30167', <SNIP>, '30260', '30261', '30262')
AND client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006'
One thing I've done was eliminate the sub query you had in your original query, ie. what Aaron West suggested (although I used an inner join instead of an outer join). My experience has shown that the SQL Server query optimizer doesn't deal with those very well.
One thing that would help us is to know the counts of all the records of all the tables involved as well as what indexes the query plan was using if any. Also, how long does it take SQL Server to begin returning rows when you execute this query? Is the problem the processing of the query or the time required to marshal the data between SQL Server and the report writer?
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply