5 table join with 165 ORs - returns 93,000 rows in 5 minutes...

  • I have been asked to make recommendation to this worst-case query.  It is created by a reporting application that gives the user the ability to select "anything"...

    I will run an explain plan to confirm; but my memory (and granted, Oracle experience) tells me that three or more ANDs and/or ORs reverts to a sequential search without indices.

    Short of telling them "don't do this" what do I tell my client?

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Actually a single or can be enough to ignore an index.

     

    I'm curious about the 165 ors. It this something you could put in a table and join to it??

     

    Also it would be very helpfull to see a little more of the query if you want a more specific answer .

  • Okay, I had to clean it up a little bit so as not to "reveal" my customer... and I snipped out of the middle to make it shorter:

    SELECT COALESCE(CAST(client_contacts.DateContacted As smalldatetime), '') As DateContacted,
    COALESCE([objects].ClientLastName, '') As ClientLastName,
    COALESCE([objects].ClientFirstName, '') As ClientFirstName,
    COALESCE([ranks].Rate, '') As Rank,
    COALESCE([objects].MaritalStatus, '') As MaritalStatus,
    (SELECT ats_units.UnitName
    FROM ats_units
    WHERE ObjectID = [objects].Unit) AS Unit,
    COALESCE(client_contacts.ServiceID, '') As ServiceID,
    COALESCE([objects].Branch, '') As Branch,
    COALESCE([objects].Status, '') As Status,
    COALESCE(client_contacts.ReferralSource, '') As ReferralSource,
    COALESCE(x.Name, '') As PRP,
    COALESCE(x_genders.Name, '') As Gender, XWHierarchy.Name As CommunityID,
    COALESCE(client_contacts.StaffContact, '') As StaffMember,
    COALESCE(client_contacts.CaseType, '') As CaseType,
    COALESCE(CAST([objects].TAFMSD As smalldatetime), '') As TAFMSD, [objects].ObjectID,
    COALESCE(x_genders.Name, '') As SummaryGender,
    COALESCE([objects].MaritalStatus, '') As SummaryMaritalStatus,
    COALESCE(x.Name, '') As SummaryPRP,
    COALESCE([objects].Branch, '') As SummaryBranch,
    COALESCE([objects].Status, '') As SummaryDutyStatus
    FROM client_contacts
    INNER JOIN [objects] ON [objects].ObjectID = client_contacts.ClientID
    LEFT JOIN XWHierarchy ON client_contacts.OrigCommunityID = XWHierarchy.LegacyID
    LEFT OUTER JOIN x_genders ON [objects].Gender = x_genders.Value
    LEFT OUTER JOIN [ranks] ON [objects].Rank = [ranks].ItemID
    LEFT OUTER JOIN x ON [objects].PRP = x.Value
    WHERE (client_contacts.OrigCommunityID = '30167'
    OR client_contacts.OrigCommunityID = '30167'
    <SNIP>
    OR client_contacts.OrigCommunityID = '30260'
    OR client_contacts.OrigCommunityID = '30261'
    OR client_contacts.OrigCommunityID = '30262')
    AND (client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006')
     

    I ran some explain plans and there are only three full table scans... two of those tables each have only two rows (Yes/No, Male/Female) and the third only 164 rows, so that shouldn't be the problem (no indices).  Although, they are the biggest CPU hit.

    I also tried putting the ORed info in an "into list", and then a wildcard "like '%'" (which returns the same result as this query.  This query was given to me as the "worst case scenario."

    I am not finding anything.  The temp table idea sounded good; but I think the "in list" would be about the same result.

    Ideas appreciated.

    David

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Are these

    OR client_contacts.OrigCommunityID = '30260'

    hardcoded in your query or built using some kind of dynamic sql from the application?

  • They are built dynamically in the report writer application.  The customer specifies what they want to see... and at the "store" level they are okay, performance wise, can only see what's in their store.  but at headquarters, they can see everything, and that's where the problem is... 5 minutes, or application time out.

    I could have some leverage over how it is parsed, if I can make a statement as to why something else is necessary.  The developer's are in this building, and I am charging to them for this task.  They want recommendations.

    Got any?

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Guessing that it's built in the query by the application.

     

    Anyways this :

    WHERE (client_contacts.OrigCommunityID = '30167'
    OR client_contacts.OrigCommunityID = '30167'
    <SNIP>
    OR client_contacts.OrigCommunityID = '30260'
    OR client_contacts.OrigCommunityID = '30261'
    OR client_contacts.OrigCommunityID = '30262')
     
    is the equivalent of joining 2 tables.  You certainly could use a permenant table to keep those ids and use a join.  I'm sure you'd get a boost with correct indexing, but it's hard to tell you how much without having much more information.
  • Well, it could be a permanent table; but the contents would have to float based on customer input... I will try.  Not sure how this will be any better than the "in list" syntax though.

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Here are some of the results from explain plan before and after your recommended change.  The change added five steps to the expain plan, and seems to have increased the overall time to execute - hard to judge; but at least not subjective.

    Before (with multiple ORs) 16 steps

    Estimated RowsIO  CPU  AVG ROW Total Subtree
    27.663755 NULL  NULL  NULL 0.22645532
    27.663755 0.0  2.7663755E-6 1294 0.22645532
    27.663755 0.0  1.156345E-4 2753 0.22645254
    27.663755 0.0 2.3126899E-4 2627 0.21946308
    27.663755 0.0  1.156345E-4 2564 0.17730561
    27.663755 0.0  2.3126899E-4 2434 0.17170592
    27.663755 0.0  1.8278537E-2 2397 0.12954845
    26.663755 0.0  1.114545E-4 880 6.0498733E-2
    26.663755 0.0  0.01149122 464 5.3103823E-2
    26.663755 4.1282203E-2 3.3040001E-4 464 4.1612603E-2
    1.0  3.2034251E-3 7.9603E-5 426 7.2834538E-3
    164.0  4.0541463E-2 2.5889999E-4 1526 4.9459316E-2
    2.0  3.7578501E-2 8.0700003E-5 45 4.1904069E-2
    1.0  3.2034251E-3 7.9603997E-5 139 5.4840706E-3
    2.0  3.7578501E-2 8.0700003E-5 71 4.1904069E-2
    1.0  3.2034251E-3 7.9603E-5 136 6.8738232E-3
    After (with table containing value and joined) 21 steps
    Estimated Rows IO  CPU  AVG ROW Total Subtree
    72.0  NULL  NULL  NULL 0.3119736
    72.0  0.0  7.1999998E-6 1294 0.3119736
    72.0  0.0  0.00030096 1323 0.31196639
    72.0  0.0  6.0192001E-4 1197 0.30047402
    72.0  0.0  0.00030096 1134 0.25085211
    72.0  0.0  6.0192001E-4 1004 0.24153772
    72.0  3.4429999E-3 6.6084801E-3 967 0.1919158
    71.0  0.0  0.00029678 861 0.12781723
    71.0  0.002504 7.6416912E-3 443 0.11508917
    71.0  1.1261261E-2 7.8158401E-4 438 5.3856969E-2
    71.0  4.1282203E-2 3.3040001E-4 464 4.1612603E-2
    163.0 1.1261261E-2 1.9686506E-3 16 5.1082511E-2
    163.0 0.0  0.0000163 16 0.0378526
    163.0  3.7578501E-2 2.5780001E-4 20 3.7836298E-2
    1.0  3.2034251E-3 7.9603E-5 426 1.2431274E-2
    164.0  1.1261261E-2 1.9824712E-3 115 5.4044094E-2
    164.0  4.0541463E-2 2.5889999E-4 1526 4.0800363E-2
    2.0  3.7578501E-2 8.0700003E-5 45 4.8962399E-2
    1.0  3.2034251E-3 7.9603997E-5 139 9.0134135E-3
    2.0  3.7578501E-2 8.0700003E-5 71 4.8962399E-2
    1.0  3.2034251E-3 7.9603E-5 136 1.1191417E-2
    

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Yes, the permanent table is the best way to do it. Especially when you deal with big sets.

    But I know how it's difficult for developers to think in "table" way. Almost impossible.

    So in that case you may concatenate all yours OrigCommunityID into delimited string and use Split table function inside SP.

    Then yu may join this function to the query:

    ...

    INNER JOIN dbo.SplitIDs(@StrOrigCommunityIDs, ',') V ON client_contacts.OrigCommunityID = V.IDValue

    It still does not use index, so when it's too big for your memory it will cause delays.

    _____________
    Code for TallyGenerator

  • I respect your advice; but doesn't the explain plan I posted say the opposite?

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • BTW, speaking of explain plans... I decided that I had better test production to see that I was getting the same result in the plan... and I am not.  Here are the same rows from that DB:

    Production:
    Estimated Rows IO  CPU  AVG ROW Total Subtree
    83975.727 NULL  NULL NULL 90.123978
    83975.727 0.0  8.3975727E-3 1246 90.123978
    83975.727 0.0  1.5245925 409 90.115578
    10534.04.2122584E-2 5.8329501E-3 170 9.5911063E-2
    83975.727 0.0  0.48099968 372 88.495071
    2.0  0.0  0.0000002 17 3.7659399E-2
    2.0  3.7578501E-2 8.0700003E-5 26 3.7659202E-2
    83975.727 0.0  0.6598534 369 87.97641
    28.0  3.7578501E-2 0.0001093 183 3.7687801E-2
    83975.727 0.0  8.3975727E-3 317 87.27887
    83975.727 0.0  0.6585502 321 87.27047
    2.0 0.0  0.0000002 40 3.7659399E-2
    2.0  3.7578501E-2 8.0700003E-5 45 3.7659202E-2
    83975.7270.0 0.68848419 302 86.574257
    160.0  4.0541463E-2 2.5449999E-4 1514 4.0795963E-2
    83975.727 0.0  19.206863 200 85.844971
    84096.414 0.0  0.63704693 96 12.384162
    164.0  0.0  1.64157E-4 11 1.64157E-4
    84757.367 11.301282 0.2476269 693 11.548909
    787198.0 53.387947 0.8659963 273 54.253944
    

    But, it is a TOTALLY different plan... I think the first thing for me to do is get the database locally and perform maintenance to get it acting the same way at both places.

    Keep coming with the ideas and recommendations though... I will try them as long as I have a charge number... and THANKS!  David

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Don't see anything opposite.

    Keep in mind that "OR" means "rescan" (or "re-seek", if you are that lucky to have proper indexes set up ).

    Anyway, it means "repeat search". If you have 165 "OR"s than you have 165 queries united implicitly, behind the scene.

    Join memory table will almost definitely perform HASH JOIN, but it will do it once. In your case it will be much faster.

    _____________
    Code for TallyGenerator

  • I'm no expert here, but here are some ideas:

    1) Measure actual I/O. (I rarely can seem to read query plans...)

    2) Try iteratively simplifying the query in steps until it's faster. Eg. create a sample query, remove one join at a time, run each until you see an order of magnitude speedup. Or sometimes you can try the reverse: start with a simple query, and add one join at a time.

    3) Try changing the correlated subquery in the column list with a LEFT OUTER JOIN. I'm also wondering if this value could be denormalized into the table, or if you could create an indexed view (materialized) with this value already in it.

    (SELECT ats_units.UnitName

    FROM ats_units

    WHERE ObjectID = [objects].Unit)

    4) Might it make a difference to use integers instead of string literals in your OR list? Sometimes datatypes affect index usage.

    5) Maybe a materialized view could be created on top of this table with all your joins in it already, so that you can query a denormalized table directly. Use WITH SCHEMABINDING, and create a clustered index, perhaps on OrigCommunityID, DateContacted. Just be aware that updates to the base table will be slower if you create a materialized view.

    6) A clustered index on date might help, if most of your queries are confined to small date ranges. It might be tempting to add OrigCommunityID, too, but be careful when creating a clustered index not to use too many columns. Also, if a date is the first column of the clustered index key, and there are varying time values in the dates, then I wouldn't add any more columns to the key.

    6) Some wild speculation: I wonder if text search might work better for this. Making keywords out of the OrigCommunityID seems a bit strange, and I suspect a "hash join" is similar to what text search does, anyway, so I rather doubt this would help, unless you can make keywords out of the date ranges too. Think of it as an optimization similar to bounding boxes used for geographic lookups, or to the technique of storing a hash column in a table to speed certain queries. You would create change the table to have a text column, backfill it with keywords relating to your data, then at search time you would generate corresponding keywords, and add them to your query, then filter those results with your more precise criteria. So you'd have to add a column, we'll call it textkeywords, and your query might contain:

    ... AND CONTAINS(textkeywords, '(DT200510 OR DT200511 OR ... DT200610) AND (CommID30167 OR CommID30260 [...] OR CommID30262)')

    AND (client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006')

    I know, this is complicated to set up and requires some schema changes, backfilling, update insert and delete triggers, so may not be practical for you, but if you frequently performing complicated queries against this table perhaps it's worth it. You would want to run some benchmarks against a sample dataset first. Again, I think a hash join should be almost the same thing (?). Text indexing typically use bitmap joins, I think, which can be quite quick, as evidenced by Google, etc.

    You would have to schedule updates to the text index, if you use this approach.

  • There are joins to look up the values "Yes/No" and "Male/Female"? OK...

    Why not try to eliminate those? It may not help much, but you said there was a lot of CPU time from the full table scans. You can replace those tables with a CASE expression.

  • This is a complete stab in the dark but this is the initial approach I would take...

     

    SELECT COALESCE(CAST(client_contacts.DateContacted As smalldatetime), '') As DateContacted,

    COALESCE([objects].ClientLastName, '') As ClientLastName,

    COALESCE([objects].ClientFirstName, '') As ClientFirstName,

    COALESCE([ranks].Rate, '') As Rank,

    COALESCE([objects].MaritalStatus, '') As MaritalStatus,

    U.UnitName AS Unit,

    COALESCE(client_contacts.ServiceID, '') As ServiceID,

    COALESCE([objects].Branch, '') As Branch,

    COALESCE([objects].Status, '') As Status,

    COALESCE(client_contacts.ReferralSource, '') As ReferralSource,

    COALESCE(x.Name, '') As PRP,

    COALESCE(x_genders.Name, '') As Gender, XWHierarchy.Name As CommunityID,

    COALESCE(client_contacts.StaffContact, '') As StaffMember,

    COALESCE(client_contacts.CaseType, '') As CaseType,

    COALESCE(CAST([objects].TAFMSD As smalldatetime), '') As TAFMSD, [objects].ObjectID,

    COALESCE(x_genders.Name, '') As SummaryGender,

    COALESCE([objects].MaritalStatus, '') As SummaryMaritalStatus,

    COALESCE(x.Name, '') As SummaryPRP,

    COALESCE([objects].Branch, '') As SummaryBranch,

    COALESCE([objects].Status, '') As SummaryDutyStatus

    FROM client_contacts

    INNER JOIN [objects] ON [objects].ObjectID = client_contacts.ClientID

    LEFT JOIN XWHierarchy ON client_contacts.OrigCommunityID = XWHierarchy.LegacyID

    LEFT OUTER JOIN x_genders ON [objects].Gender = x_genders.Value

    LEFT OUTER JOIN [ranks] ON [objects].Rank = [ranks].ItemID

    LEFT OUTER JOIN x ON [objects].PRP = x.Value

    INNER JOIN ats_units as U ON U.ObjectID = [objects].Unit

    WHERE client_contacts.OrigCommunityID IN ('30167', '30167', <SNIP>, '30260', '30261', '30262')

    AND client_contacts.DateContacted BETWEEN '10/23/2005' AND '10/23/2006'

    One thing I've done was eliminate the sub query you had in your original query, ie. what Aaron West suggested (although I used an inner join instead of an outer join).  My experience has shown that the SQL Server query optimizer doesn't deal with those very well.

    One thing that would help us is to know the counts of all the records of all the tables involved as well as what indexes the query plan was using if any.  Also, how long does it take SQL Server to begin returning rows when you execute this query?  Is the problem the processing of the query or the time required to marshal the data between SQL Server and the report writer?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply