Multiple OUTER JOIN query is slow

  • GilaMonster (7/9/2011)


    pdanes2 (7/9/2011)


    How do YOU deal with possible overflows of text input? Same logic in the front-end as in the database? Wait for the DB to throw an error?

    Validation in the application. Max lengths defined in the application. Fail fast always.

    Hm, makes sense - let the user know what's going on as soon as possible. So if you have a field of varchar(100), you would code a test for length 100 in your front-end and reject oversize immediately? I guess that the DB field length is a second check, sort of a 'safety-net'? Do you even bother testing for DB length rejection, or do you trust your own front-end code to do the job?

    What about changes - you manually change both the code and the DB design? In Access, you can interrogate the DB engine as to permitted field length, so front-end code can actually reference the underlying table design. I've done stuff like that, so that when a change was made to the table design, the code automatically conformed. Can SQL Server provide such info to a front-end?

    That's just to start though. For indexing queries one typically starts with the columns in the where clause. Order of the key makes a major difference, order of the includes does not.

    But that's one of the issues here - I'm using outer joins and so can't index the query. Or can I? SSMSE says I can't.

    I wasn't talking about indexing the view.

    OK, you've lost me. What, then?

    Then the tuning efforts need to be focused on the queries, not the view definition. Trying to tune a select * from view with no where clause when all the queries have filters is almost as pointless as trying to tune select * from table when all the queries have filters.

    I'm beginning to think my whole approach is wrong. The origin of this is an Excel spreadsheet, back in the mid-1990s, before I ever got involved. The users have gotten used to seeing a bunch of data on start-up, then scrolling, searching and filtering. If the app opens and they don't see anything, they get tense, which is why I've been opening with this monster view. But maybe I should just rig it so they have to define some filter criteria before displaying anything - that makes the view much quicker. They'll whine at first, but if I can tell them that is what is necessary to make it go fast and it does noticeably improve performance, they'll adapt. Nobody can realistically scroll through 30,000 records anyway - they always wind up searching or filtering for something.

    Test the common queries. Find the ones that are unacceptably slow. Tune them.

    I have done so, and the results are good. What I was trying to do was speed up this one all-encompassing view, which is the default on opening. Maybe it would even be enough just to make this not the default - have it available for those who want it, with the warning that it's slow, but open with NoFilter=Nothing, rather than NoFilter=Everything.

    p.s. What's with the timestamp columns?

    I've read in several design books and tutorials that it's a good field to include, that SQL Server automatically uses it (if present) to determine whether a record was changed by someone else while it was being edited, that otherwise it has to compare the entire record. Is that not the case?

    No.

    Timestamp is great if the front end app wants to tell if the record has changed since it was read. The SQL engine does not need an additional, manually added column to do that. It's purely for easy of front end application development.

    Drat. Found it in several 'best practices' texts, but maybe they front-end part was assumed. Or maybe the authors didn't know what they were talking about. Okay, out they go.

  • LutzM (7/9/2011)


    Here's what I would do:

    Step 1: fix the data type used in / returned by the functions involved (e.g. [fnsFormatAkcesit] will never exceed 17 byte, therefore the data type for the return value should be varchar(17) instead of varchar(max))

    Step 2: change the CONSTRAINT [PK_Podrobnosti] on the Podrobnosti table to PRIMARY KEY CLUSTERED

    Step 3: change the CONSTRAINT [PK_Akces] on the Akces table to PRIMARY KEY CLUSTERED

    Step 4: add an index on dbo.Podrobnosti.AkcesAutoID (as Gail suggested: one key per foreign key reference)

    Step 5: add indexes on dbo.Podrobnosti for the other foreign key references. Start with the lookup tables with the most rows (e.g. TableOfDepozitars preferred over TableOfSystems)

    Step 6: if needed, add addition indexes to the lookup tables in the structure of *AutoID INCLUDE(columns_used_in_SELECT)

    Ok, I'll get on that and see what happens. But not tonight, it's 2am here and I'm beat.

    One of the things I was looking at with MAX in functions was not laziness, but reducing re-write necessity. After all, if I code a function for 17 characters and a field changes, I now have to modify the function as well. With MAX it just continues to work. I see the issue with fields sizes and indexes, but is there a problem with functions? What about the built-in functions, like substring and charindex? They take arbitrary length strings - is there a problem with a UDF doing the same?

    If those steps won't help to bring the elapsed time into an acceptable range you might have to consider using pass through queries from your ACCESS front end. With a pass through query you could use an in-line-table valued function (AKA parameterized view) and rewrite that function to limit the rows of dbo.Podrobnosti before the joins even start. You might even need to write two separate functions: one for EvidenceLetter IN and one for AkcesAutoID.

    If you need to go down that route I strongly recommend to setup a dedicated login that will only be allowed to call those functions since the login as well as password are stored in the ACCESS frontend in plain text, visible to everyone who just looks a little closer....

    I'm using an ODBC link with Windows security - wouldn't that work for the pass-throughs as well?

  • In addition to all of the advice given so far - you really need to look at how you are using those functions. Scalar functions are notoriously slow and will definitely cause performance issues.

    If those functions are used throughout your application code, try redesigning them as inline table valued functions (iTVF) and using CROSS APPLY to get the results. Search for Paul White and CROSS APPLY for some articles on how this can improve performance for you.

    If you can - move the code into the view instead of using the functions. I have used both techniques to improve performance on quite a few queries.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams-493691 (7/9/2011)


    In addition to all of the advice given so far - you really need to look at how you are using those functions. Scalar functions are notoriously slow and will definitely cause performance issues.

    If those functions are used throughout your application code, try redesigning them as inline table valued functions (iTVF) and using CROSS APPLY to get the results. Search for Paul White and CROSS APPLY for some articles on how this can improve performance for you.

    If you can - move the code into the view instead of using the functions. I have used both techniques to improve performance on quite a few queries.

    In general I agree. But in this case all the functions perform a conditional string concatenation of the input parameter with no other table access. I don't see an issue with this solution. If there would have been calls to other tables, it would have been a different issue.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • ... They take arbitrary length strings - is there a problem with a UDF doing the same?

    Have a look at this link[/url]. there seems to be a significant performance degrade.

    I'm using an ODBC link with Windows security - wouldn't that work for the pass-throughs as well?

    Yes, Windows security will work. The issue I mentioned was related to SQL Server security and storing the login data in the connection string.

    As a side note:

    I found an interesting comment at stackoverflow:

    Many if not most client APIs (i.e. ODBC drivers, JDBC drivers, etc) allocate memory buffers on the client that are large enough to store the maximum size of a particular column. So even though the database only has to store the actual data, you may substantially increase the amount of memory the client application uses.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • pdanes2 (7/9/2011)


    GilaMonster (7/9/2011)


    pdanes2 (7/9/2011)


    How do YOU deal with possible overflows of text input? Same logic in the front-end as in the database? Wait for the DB to throw an error?

    Validation in the application. Max lengths defined in the application. Fail fast always.

    Hm, makes sense - let the user know what's going on as soon as possible. So if you have a field of varchar(100), you would code a test for length 100 in your front-end and reject oversize immediately? I guess that the DB field length is a second check, sort of a 'safety-net'? Do you even bother testing for DB length rejection, or do you trust your own front-end code to do the job?

    No need to do any DB checks. If you try to insert 101 characters into a varchar(100) it will fail. Just make sure that code handles errors properly

    What about changes - you manually change both the code and the DB design? In Access, you can interrogate the DB engine as to permitted field length, so front-end code can actually reference the underlying table design. I've done stuff like that, so that when a change was made to the table design, the code automatically conformed. Can SQL Server provide such info to a front-end?

    I prefer manual changes. You can get the table defintions from SQL, but unless you have cause to believe that the schema will change often and without notice, it's easier just to define the validation in the front end and leave it. Like that, when people file bugs asking for the column to be made larger, they ask for the application to allow it, so you change the app and then make sure that the DB changes as well (or pick up that you forgot in early testing)

    That's just to start though. For indexing queries one typically starts with the columns in the where clause. Order of the key makes a major difference, order of the includes does not.

    But that's one of the issues here - I'm using outer joins and so can't index the query. Or can I? SSMSE says I can't.

    I wasn't talking about indexing the view.

    OK, you've lost me. What, then?

    The tables that are used in the view.

    But maybe I should just rig it so they have to define some filter criteria before displaying anything - that makes the view much quicker. They'll whine at first, but if I can tell them that is what is necessary to make it go fast and it does noticeably improve performance, they'll adapt. Nobody can realistically scroll through 30,000 records anyway - they always wind up searching or filtering for something.

    Sounds like a good plan. Showing 30 000 records just so the user can ignore it and start filtering is a massive waste of resources all over the place.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeffrey Williams-493691 (7/9/2011)


    In addition to all of the advice given so far - you really need to look at how you are using those functions. Scalar functions are notoriously slow and will definitely cause performance issues.

    Okay, I'll see if that helps. I like the tidiness of defined functions, but sometimes performance requirements dictate less elegant solutions.

    If those functions are used throughout your application code, try redesigning them as inline table valued functions (iTVF) and using CROSS APPLY to get the results. Search for Paul White and CROSS APPLY for some articles on how this can improve performance for you.

    I looked at that - interesting article. I'll have to give that a try as well.

    If you can - move the code into the view instead of using the functions. I have used both techniques to improve performance on quite a few queries.

    Ok, I just tried it. One of the things that gives me fits here is that the performance varies - a LOT! Yesterday, this query was executing in around a minute. When I changed it get rid of some MAX field definitions, it slowed down to almost four minutes, which is where I left it and went to bed. Just now I made some of the function changes you suggested (not all, some make use of a temp variable to concatenate fields), and the execute time was 15 seconds. So I tried the same query as yesterday, and today it runs in 18 seconds. Stuff like this would turn my hair gray, if it wasn't already - I made no changes AT ALL from yesterday, besides creating a second copy of the view with inline replacements, and the exec time went from 3:42 to 0:18.

    Nonetheless, your suggestion does make a difference. Changing two of the four concatenating functions to inline (and the two less complicated ones, at that) consistently produces run times of 15-16 seconds, while the unchanged one always runs in 18 seconds (today, anyway). Not a huge difference, but a significant percentage. Since you brought this up and have done this elsewhere, do you have any thoughts on how to convert functions with temp variables to inline code? Here's one of them:

    ALTER FUNCTION [dbo].[fnsStratigrafie](@system varchar(max), @series varchar(max), @stage varchar(max), @substage varchar(max), @LithographicUnit varchar(max), @LithographicSubUnit varchar(max), @zone varchar(max))

    RETURNS varchar(max)

    WITH SCHEMABINDING

    AS

    BEGIN

    -- Declare the return variable here

    DECLARE @StTmp Varchar(max)

    If @system <> ''

    SET @StTmp = @system

    If @series <> ''

    BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @series ELSE SET @StTmp = @series END

    If @stage <> ''

    BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @stage ELSE SET @StTmp = @stage END

    If @substage <> ''

    BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @substage ELSE SET @StTmp = @substage END

    If @LithographicUnit <> ''

    BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @LithographicUnit ELSE SET @StTmp = @LithographicUnit END

    If @LithographicSubUnit <> ''

    BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @LithographicSubUnit ELSE SET @StTmp = @LithographicSubUnit END

    If @zone <> ''

    BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @zone ELSE SET @StTmp = @zone END

    Return @StTmp

    END

  • LutzM (7/10/2011)


    ... They take arbitrary length strings - is there a problem with a UDF doing the same?

    Have a look at this link[/url]. there seems to be a significant performance degrade.

    Wow. When I first saw the MAX attribute, it seems a neat and painless way of dealing with arbitrary length text strings. Never would have guessed that it could cause so many problems. Just goes to show - something, I suppose, not sure quite what. In any case, I'm convinced. To quote Watterson, 'Max is gone'.

    As a side note:

    I found an interesting comment at stackoverflow:

    Many if not most client APIs (i.e. ODBC drivers, JDBC drivers, etc) allocate memory buffers on the client that are large enough to store the maximum size of a particular column. So even though the database only has to store the actual data, you may substantially increase the amount of memory the client application uses.

    Hm, I'll see if I can track down any indication of that in Access. 2GB is the absolute maximum size for an Access DB file, and I can vouch from personal experience that it corrupts and crashes when that is exceeded. Obviously, my app isn't trying to allocate the full 2GB for each field, or I'd have blown my memory allocations through the roof, but it may still be trying to do more than necessary. I'll look into it and post back if I find out something more on the subject.

  • GilaMonster (7/10/2011)


    pdanes2 (7/9/2011)


    GilaMonster (7/9/2011)


    pdanes2 (7/9/2011)

    That's just to start though. For indexing queries one typically starts with the columns in the where clause. Order of the key makes a major difference, order of the includes does not.

    But that's one of the issues here - I'm using outer joins and so can't index the query. Or can I? SSMSE says I can't.

    I wasn't talking about indexing the view.

    OK, you've lost me. What, then?

    The tables that are used in the view.

    Sorry for being so dense, but I'm still mixed up. Let me see if I can pick through an example:

    TableOfFamilys is one of the lookup tables in this pesky joined view. It has two columns - an autonumber ID field. FamilyAutoID, which is also the primary key, and an associated text field, Family. That was one of the MAX fields, now it's 255. Besides the primary key, I also have an index on the table, Family_Idx, defined one of two ways - Index or Unique Key - I still can't figure out the difference:

    /****** Object: Index [Family_Idx] Script Date: 07/11/2011 19:44:20 ******/

    --As Index

    CREATE UNIQUE NONCLUSTERED INDEX [Family_Idx] ON [dbo].[TableOfFamilys]

    ([Family] ASC)WITH

    (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    or

    /****** Object: Index [Family_Idx] Script Date: 07/11/2011 19:43:32 ******/

    --As Unique Key

    ALTER TABLE [dbo].[TableOfFamilys] ADD CONSTRAINT [Family_Idx] UNIQUE NONCLUSTERED

    ([Family] ASC)WITH

    (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    SSMSE shows both ways as an index, as a Unique Key it also shows up in the Keys branch.

    You say index the table with included columns, but the primary key is clustered automatically (and not just by default, as you wrote earlier, but unchangeably so), and I can't include columns in a clustered index. Should I add another index on the autonumber field, unclustered, duplicating the primary key but including the associated text column? Does the primary key even make any sense to have then?

    I read that included columns are actually only included in the leaf nodes. The leaf nodes of the (clustered) primary key are the actual records, aren't they? So a lookup (or join) on the primary key should automatically yield the associated text field, rendering an included column moot (is that why you can't include columns in a clustered index?).

    If I'm joining on the primary key, is there any other index that even makes sense?

    Sounds like a good plan. Showing 30 000 records just so the user can ignore it and start filtering is a massive waste of resources all over the place.

    Old habits die hard. Excel shows data by default and Access at least makes it cheap to do so. Users get accustomed to an app opening with a screen full of data. True client-server apps require a whole new way of thinking for me - not just knowing what happens, but remembering and being able to apply that knowledge in daily design. I'm working on it, though - these forums help a lot.

  • pdanes2 (7/11/2011)


    Family_Idx, defined one of two ways - Index or Unique Key - I still can't figure out the difference:

    There isn't much of one.

    You say index the table with included columns, but the primary key is clustered automatically (and not just by default, as you wrote earlier, but unchangeably so)

    By default only. It's perfectly changable.

    and I can't include columns in a clustered index.

    No, because it already implicitly includes all columns, being the clustered index.

    If I'm joining on the primary key, is there any other index that even makes sense?

    Not on the lookup tables unless you are filtering on them by a non-key column. You put indexes on foreign keys (on the non-primary key side) and on columns you're filtering on.

    Old habits die hard. Excel shows data by default and Access at least makes it cheap to do so.

    Access makes it easy to do so, not cheap. The cost of getting all those rows over is still there, you just don't see it until cases like this. Excel shows data for the same reason that Word does. They're documents, not databases

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/11/2011)


    pdanes2 (7/11/2011)


    You say index the table with included columns, but the primary key is clustered automatically (and not just by default, as you wrote earlier, but unchangeably so)

    By default only. It's perfectly changable.

    I guess that's an artefact of SSMSE, then. I'ts grayed out and unchangeable there - I didn't try actually doing anything with TSQL.

    If I'm joining on the primary key, is there any other index that even makes sense?

    Not on the lookup tables unless you are filtering on them by a non-key column. You put indexes on foreign keys (on the non-primary key side) and on columns you're filtering on.

    I will be, and more so now that I'm going away from the default of showing everything. So it seems that for all my simple (two-field) lookup tables, I should have the primary key on the autonumber ID field, and an index on the text field with the autonumber ID as an included field. Does that sound right?

    Old habits die hard. Excel shows data by default and Access at least makes it cheap to do so.

    Access makes it easy to do so, not cheap. The cost of getting all those rows over is still there, you just don't see it until cases like this. Excel shows data for the same reason that Word does. They're documents, not databases

    Yes, and Access seems to cache a lot of stuff in memory, besides being local to begin with. You don't realize how much stuff is being shoveled around until you have to talk to another machine about it.

  • pdanes2 (7/11/2011)


    GilaMonster (7/11/2011)


    pdanes2 (7/11/2011)


    You say index the table with included columns, but the primary key is clustered automatically (and not just by default, as you wrote earlier, but unchangeably so)

    By default only. It's perfectly changable.

    I guess that's an artefact of SSMSE, then. I'ts grayed out and unchangeable there - I didn't try actually doing anything with TSQL.

    You can't change it without dropping the constraint (and the foreign keys), however this is entirely valid:

    CREATE TABLE SomeTable

    SomeKey INT IDENTITY PRIMARY KEY NONCLUSTERED ...

    ...

    );

    Creates a nonclustered primary key.

    There are a lot of limitations in the GUI tools. One reason many use T-SQL scripts when they want to do anything complex. Besides, these days Ic an define a table faster in T-SQL script than I can with the GUI.

    If I'm joining on the primary key, is there any other index that even makes sense?

    Not on the lookup tables unless you are filtering on them by a non-key column. You put indexes on foreign keys (on the non-primary key side) and on columns you're filtering on.

    I will be, and more so now that I'm going away from the default of showing everything. So it seems that for all my simple (two-field) lookup tables, I should have the primary key on the autonumber ID field, and an index on the text field with the autonumber ID as an included field. Does that sound right?

    Sounds like a place to start.

    It's all about giving the optimiser options and optimal access paths. What it'll chose is another matter.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/11/2011)


    So it seems that for all my simple (two-field) lookup tables, I should have the primary key on the autonumber ID field, and an index on the text field with the autonumber ID as an included field. Does that sound right?

    Sounds like a place to start.

    It's all about giving the optimiser options and optimal access paths. What it'll chose is another matter.

    Ok, I'll give all that a try - get rid of all the MAX field definitions, fix the functions and make sure all these indexes are set up this way - and see where it lands me. Many thanks to you and others for all the explanations - I'm looking forward to getting good enough at this to where I can start contributing something as well, like I used to in the MSOffice newsgroups.

  • Gila, it was my understanding that SQL uses the timestamp (rowversion) column for incremental population of Full Text Indexes. It's not purely for Front-End app use.

  • Never worked with full text indexes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply