5 table join with 165 ORs - returns 93,000 rows in 5 minutes...

Question

Post reply

5 table join with 165 ORs - returns 93,000 rows in 5 minutes...

debrucer1

SSCertifiable

Points: 5366
More actions
October 24, 2006 at 10:18 am

#121010

I have been asked to make recommendation to this worst-case query. It is created by a reporting application that gives the user the ability to select "anything"...
I will run an explain plan to confirm; but my memory (and granted, Oracle experience) tells me that three or more ANDs and/or ORs reverts to a sequential search without indices.
Short of telling them "don't do this" what do I tell my client?
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Actually a single or can be enough to ignore an index.

I'm curious about the 165 ors. It this something you could put in a table and join to it??

Also it would be very helpfull to see a little more of the query if you want a more specific answer .

debrucer1 SSCertifiable Points: 5366 More actions · Answer 2

Okay, I had to clean it up a little bit so as not to "reveal" my customer... and I snipped out of the middle to make it shorter:

SELECT COALESCE(CAST(client_contacts.DateContacted As smalldatetime), '') As DateContacted,

COALESCE([objects].ClientLastName, '') As ClientLastName,

COALESCE([objects].ClientFirstName, '') As ClientFirstName,

COALESCE([ranks].Rate, '') As Rank,

COALESCE([objects].MaritalStatus, '') As MaritalStatus,

(SELECT ats_units.UnitName

FROM ats_units

WHERE ObjectID = [objects].Unit) AS Unit,

COALESCE(client_contacts.ServiceID, '') As ServiceID,

COALESCE([objects].Branch, '') As Branch,

COALESCE([objects].Status, '') As Status,

COALESCE(client_contacts.ReferralSource, '') As ReferralSource,

COALESCE(x.Name, '') As PRP,

COALESCE(x_genders.Name, '') As Gender, XWHierarchy.Name As CommunityID,

COALESCE(client_contacts.StaffContact, '') As StaffMember,

COALESCE(client_contacts.CaseType, '') As CaseType,

COALESCE(CAST([objects].TAFMSD As smalldatetime), '') As TAFMSD, [objects].ObjectID,

COALESCE(x_genders.Name, '') As SummaryGender,

COALESCE([objects].MaritalStatus, '') As SummaryMaritalStatus,

COALESCE(x.Name, '') As SummaryPRP,

COALESCE([objects].Branch, '') As SummaryBranch,

COALESCE([objects].Status, '') As SummaryDutyStatus

FROM client_contacts

INNER JOIN [objects] ON [objects].ObjectID = client_contacts.ClientID

LEFT JOIN XWHierarchy ON client_contacts.OrigCommunityID = XWHierarchy.LegacyID

LEFT OUTER JOIN x_genders ON [objects].Gender = x_genders.Value

LEFT OUTER JOIN [ranks] ON [objects].Rank = [ranks].ItemID

LEFT OUTER JOIN x ON [objects].PRP = x.Value

WHERE (client_contacts.OrigCommunityID = '30167'

OR client_contacts.OrigCommunityID = '30167'

<SNIP>

OR client_contacts.OrigCommunityID = '30260'

OR client_contacts.OrigCommunityID = '30261'

OR client_contacts.OrigCommunityID = '30262')

AND (client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006')

I ran some explain plans and there are only three full table scans... two of those tables each have only two rows (Yes/No, Male/Female) and the third only 164 rows, so that shouldn't be the problem (no indices). Although, they are the biggest CPU hit.

I also tried putting the ORed info in an "into list", and then a wildcard "like '%'" (which returns the same result as this query. This query was given to me as the "worst case scenario."

I am not finding anything. The temp table idea sounded good; but I think the "in list" would be about the same result.

Ideas appreciated.

David

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

Gareth Evans-223244 SSC Veteran Points: 271 More actions · Answer 3

Are these

OR client_contacts.OrigCommunityID = '30260'

hardcoded in your query or built using some kind of dynamic sql from the application?

debrucer1 SSCertifiable Points: 5366 More actions · Answer 4

They are built dynamically in the report writer application. The customer specifies what they want to see... and at the "store" level they are okay, performance wise, can only see what's in their store. but at headquarters, they can see everything, and that's where the problem is... 5 minutes, or application time out.

I could have some leverage over how it is parsed, if I can make a statement as to why something else is necessary. The developer's are in this building, and I am charging to them for this task. They want recommendations.

Got any?

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Guessing that it's built in the query by the application.

Anyways this :

WHERE (client_contacts.OrigCommunityID = '30167'

OR client_contacts.OrigCommunityID = '30167'

<SNIP>

OR client_contacts.OrigCommunityID = '30260'

OR client_contacts.OrigCommunityID = '30261'

OR client_contacts.OrigCommunityID = '30262')

is the equivalent of joining 2 tables. You certainly could use a permenant table to keep those ids and use a join. I'm sure you'd get a boost with correct indexing, but it's hard to tell you how much without having much more information.

debrucer1 SSCertifiable Points: 5366 More actions · Answer 6

Well, it could be a permanent table; but the contents would have to float based on customer input... I will try. Not sure how this will be any better than the "in list" syntax though.

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

debrucer1 SSCertifiable Points: 5366 More actions · Answer 7

Here are some of the results from explain plan before and after your recommended change. The change added five steps to the expain plan, and seems to have increased the overall time to execute - hard to judge; but at least not subjective.

Before (with multiple ORs) 16 steps

Estimated RowsIO  CPU  AVG ROW Total Subtree

27.663755 NULL  NULL  NULL 0.22645532
27.663755 0.0  2.7663755E-6 1294 0.22645532
27.663755 0.0  1.156345E-4 2753 0.22645254
27.663755 0.0 2.3126899E-4 2627 0.21946308
27.663755 0.0  1.156345E-4 2564 0.17730561
27.663755 0.0  2.3126899E-4 2434 0.17170592
27.663755 0.0  1.8278537E-2 2397 0.12954845
26.663755 0.0  1.114545E-4 880 6.0498733E-2
26.663755 0.0  0.01149122 464 5.3103823E-2
26.663755 4.1282203E-2 3.3040001E-4 464 4.1612603E-2
1.0  3.2034251E-3 7.9603E-5 426 7.2834538E-3
164.0  4.0541463E-2 2.5889999E-4 1526 4.9459316E-2
2.0  3.7578501E-2 8.0700003E-5 45 4.1904069E-2
1.0  3.2034251E-3 7.9603997E-5 139 5.4840706E-3
2.0  3.7578501E-2 8.0700003E-5 71 4.1904069E-2
1.0  3.2034251E-3 7.9603E-5 136 6.8738232E-3

After (with table containing value and joined) 21 steps

Estimated Rows IO  CPU  AVG ROW Total Subtree

72.0  NULL  NULL  NULL 0.3119736
72.0  0.0  7.1999998E-6 1294 0.3119736
72.0  0.0  0.00030096 1323 0.31196639
72.0  0.0  6.0192001E-4 1197 0.30047402
72.0  0.0  0.00030096 1134 0.25085211
72.0  0.0  6.0192001E-4 1004 0.24153772
72.0  3.4429999E-3 6.6084801E-3 967 0.1919158
71.0  0.0  0.00029678 861 0.12781723
71.0  0.002504 7.6416912E-3 443 0.11508917
71.0  1.1261261E-2 7.8158401E-4 438 5.3856969E-2
71.0  4.1282203E-2 3.3040001E-4 464 4.1612603E-2
163.0 1.1261261E-2 1.9686506E-3 16 5.1082511E-2
163.0 0.0  0.0000163 16 0.0378526
163.0  3.7578501E-2 2.5780001E-4 20 3.7836298E-2
1.0  3.2034251E-3 7.9603E-5 426 1.2431274E-2
164.0  1.1261261E-2 1.9824712E-3 115 5.4044094E-2
164.0  4.0541463E-2 2.5889999E-4 1526 4.0800363E-2
2.0  3.7578501E-2 8.0700003E-5 45 4.8962399E-2
1.0  3.2034251E-3 7.9603997E-5 139 9.0134135E-3
2.0  3.7578501E-2 8.0700003E-5 71 4.8962399E-2
1.0  3.2034251E-3 7.9603E-5 136 1.1191417E-2

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

Sergiy SSC Guru Points: 110208 More actions · Answer 8

Yes, the permanent table is the best way to do it. Especially when you deal with big sets.

But I know how it's difficult for developers to think in "table" way. Almost impossible.

So in that case you may concatenate all yours OrigCommunityID into delimited string and use Split table function inside SP.

Then yu may join this function to the query:

...

INNER JOIN dbo.SplitIDs(@StrOrigCommunityIDs, ',') V ON client_contacts.OrigCommunityID = V.IDValue

It still does not use index, so when it's too big for your memory it will cause delays.

_____________
Code for TallyGenerator

debrucer1 SSCertifiable Points: 5366 More actions · Answer 9

I respect your advice; but doesn't the explain plan I posted say the opposite?

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

debrucer1 SSCertifiable Points: 5366 More actions · Answer 10

BTW, speaking of explain plans... I decided that I had better test production to see that I was getting the same result in the plan... and I am not. Here are the same rows from that DB:

Production:

Estimated Rows IO  CPU  AVG ROW Total Subtree

83975.727 NULL  NULL NULL 90.123978
83975.727 0.0  8.3975727E-3 1246 90.123978
83975.727 0.0  1.5245925 409 90.115578
10534.04.2122584E-2 5.8329501E-3 170 9.5911063E-2
83975.727 0.0  0.48099968 372 88.495071
2.0  0.0  0.0000002 17 3.7659399E-2
2.0  3.7578501E-2 8.0700003E-5 26 3.7659202E-2
83975.727 0.0  0.6598534 369 87.97641
28.0  3.7578501E-2 0.0001093 183 3.7687801E-2
83975.727 0.0  8.3975727E-3 317 87.27887
83975.727 0.0  0.6585502 321 87.27047
2.0 0.0  0.0000002 40 3.7659399E-2
2.0  3.7578501E-2 8.0700003E-5 45 3.7659202E-2
83975.7270.0 0.68848419 302 86.574257
160.0  4.0541463E-2 2.5449999E-4 1514 4.0795963E-2
83975.727 0.0  19.206863 200 85.844971
84096.414 0.0  0.63704693 96 12.384162
164.0  0.0  1.64157E-4 11 1.64157E-4
84757.367 11.301282 0.2476269 693 11.548909
787198.0 53.387947 0.8659963 273 54.253944

But, it is a TOTALLY different plan... I think the first thing for me to do is get the database locally and perform maintenance to get it acting the same way at both places.

Keep coming with the ideas and recommendations though... I will try them as long as I have a charge number... and THANKS! David

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

Sergiy SSC Guru Points: 110208 More actions · Answer 11

Don't see anything opposite.

Keep in mind that "OR" means "rescan" (or "re-seek", if you are that lucky to have proper indexes set up ).

Anyway, it means "repeat search". If you have 165 "OR"s than you have 165 queries united implicitly, behind the scene.

Join memory table will almost definitely perform HASH JOIN, but it will do it once. In your case it will be much faster.

_____________
Code for TallyGenerator

Aaron West SSC Eights! Points: 808 More actions · Answer 12

I'm no expert here, but here are some ideas:

1) Measure actual I/O. (I rarely can seem to read query plans...)

2) Try iteratively simplifying the query in steps until it's faster. Eg. create a sample query, remove one join at a time, run each until you see an order of magnitude speedup. Or sometimes you can try the reverse: start with a simple query, and add one join at a time.

3) Try changing the correlated subquery in the column list with a LEFT OUTER JOIN. I'm also wondering if this value could be denormalized into the table, or if you could create an indexed view (materialized) with this value already in it.

(SELECT ats_units.UnitName

FROM ats_units

WHERE ObjectID = [objects].Unit)

4) Might it make a difference to use integers instead of string literals in your OR list? Sometimes datatypes affect index usage.

5) Maybe a materialized view could be created on top of this table with all your joins in it already, so that you can query a denormalized table directly. Use WITH SCHEMABINDING, and create a clustered index, perhaps on OrigCommunityID, DateContacted. Just be aware that updates to the base table will be slower if you create a materialized view.

6) A clustered index on date might help, if most of your queries are confined to small date ranges. It might be tempting to add OrigCommunityID, too, but be careful when creating a clustered index not to use too many columns. Also, if a date is the first column of the clustered index key, and there are varying time values in the dates, then I wouldn't add any more columns to the key.

6) Some wild speculation: I wonder if text search might work better for this. Making keywords out of the OrigCommunityID seems a bit strange, and I suspect a "hash join" is similar to what text search does, anyway, so I rather doubt this would help, unless you can make keywords out of the date ranges too. Think of it as an optimization similar to bounding boxes used for geographic lookups, or to the technique of storing a hash column in a table to speed certain queries. You would create change the table to have a text column, backfill it with keywords relating to your data, then at search time you would generate corresponding keywords, and add them to your query, then filter those results with your more precise criteria. So you'd have to add a column, we'll call it textkeywords, and your query might contain:

... AND CONTAINS(textkeywords, '(DT200510 OR DT200511 OR ... DT200610) AND (CommID30167 OR CommID30260 [...] OR CommID30262)')

AND (client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006')

I know, this is complicated to set up and requires some schema changes, backfilling, update insert and delete triggers, so may not be practical for you, but if you frequently performing complicated queries against this table perhaps it's worth it. You would want to run some benchmarks against a sample dataset first. Again, I think a hash join should be almost the same thing (?). Text indexing typically use bitmap joins, I think, which can be quite quick, as evidenced by Google, etc.

You would have to schedule updates to the text index, if you use this approach.

Stephanie Giovannini SSCertifiable Points: 7422 More actions · Answer 13

There are joins to look up the values "Yes/No" and "Male/Female"? OK...

Why not try to eliminate those? It may not help much, but you said there was a lot of CPU time from the full table scans. You can replace those tables with a CASE expression.

Ed Hammar Ten Centuries Points: 1084 More actions · Answer 14

This is a complete stab in the dark but this is the initial approach I would take...

SELECT COALESCE(CAST(client_contacts.DateContacted As smalldatetime), '') As DateContacted,

COALESCE([objects].ClientLastName, '') As ClientLastName,

COALESCE([objects].ClientFirstName, '') As ClientFirstName,

COALESCE([ranks].Rate, '') As Rank,

COALESCE([objects].MaritalStatus, '') As MaritalStatus,

U.UnitName AS Unit,

COALESCE(client_contacts.ServiceID, '') As ServiceID,

COALESCE([objects].Branch, '') As Branch,

COALESCE([objects].Status, '') As Status,

COALESCE(client_contacts.ReferralSource, '') As ReferralSource,

COALESCE(x.Name, '') As PRP,

COALESCE(x_genders.Name, '') As Gender, XWHierarchy.Name As CommunityID,

COALESCE(client_contacts.StaffContact, '') As StaffMember,

COALESCE(client_contacts.CaseType, '') As CaseType,

COALESCE(CAST([objects].TAFMSD As smalldatetime), '') As TAFMSD, [objects].ObjectID,

COALESCE(x_genders.Name, '') As SummaryGender,

COALESCE([objects].MaritalStatus, '') As SummaryMaritalStatus,

COALESCE(x.Name, '') As SummaryPRP,

COALESCE([objects].Branch, '') As SummaryBranch,

COALESCE([objects].Status, '') As SummaryDutyStatus

FROM client_contacts

INNER JOIN [objects] ON [objects].ObjectID = client_contacts.ClientID

LEFT JOIN XWHierarchy ON client_contacts.OrigCommunityID = XWHierarchy.LegacyID

LEFT OUTER JOIN x_genders ON [objects].Gender = x_genders.Value

LEFT OUTER JOIN [ranks] ON [objects].Rank = [ranks].ItemID

LEFT OUTER JOIN x ON [objects].PRP = x.Value

INNER JOIN ats_units as U ON U.ObjectID = [objects].Unit

WHERE client_contacts.OrigCommunityID IN ('30167', '30167', <SNIP>, '30260', '30261', '30262')

AND client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006'

One thing I've done was eliminate the sub query you had in your original query, ie. what Aaron West suggested (although I used an inner join instead of an outer join). My experience has shown that the SQL Server query optimizer doesn't deal with those very well.

One thing that would help us is to know the counts of all the records of all the tables involved as well as what indexes the query plan was using if any. Also, how long does it take SQL Server to begin returning rows when you execute this query? Is the problem the processing of the query or the time required to marshal the data between SQL Server and the report writer?