Query optimization

  • Hello,

    I want to optimise the following query that is running so slowly.

    Please help or gimme an advice.

    MERGE dbo.LocalizedCategories AS T

    USING (SELECT L.LanguageID,SC.CategoryID,BC.Value

    FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC

    CROSS APPLY(SELECT TOP 1 CategoryID FROM dbo.SourceCategories Where Identifier=CONVERT(NVARCHAR(10),BC.BetradarCategoryID) AND SourceID=@SourceID) SC

    CROSS APPLY(SELECT TOP 1 LanguageID FROM dbo.Languages WHERE LanguageCulture=BC.[Language] )L)

    AS S (LanguageID,CategoryID,Value)

    ON (T.CategoryID = S.CategoryID AND T.LanguageID=S.LanguageID)

    WHEN MATCHED AND T.Name<>S.Value

    THEN UPDATE SET

    T.Name=S.Value

    WHEN NOT MATCHED BY TARGET

    THEN INSERT

    ( CategoryID ,LanguageID, Name )

    VALUES ( S.CategoryID,S.LanguageID,S.Value);

  • DDL & DML would help.

    First of all, you've created a temp table #LocalizedCategories, but you are doing a Select Distinct on it. Is there any reason you couldn't have selected Distinct results in to it at creation? You're adding the Distinct overhead to a an already complicated query.

    TheBI (12/5/2016)


    MERGE dbo.LocalizedCategories AS T

    USING (SELECT L.LanguageID,SC.CategoryID,BC.Value

    FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC

    Another thing I would do is move your Merge source query into a CTE to visually separate the source logic from your Merge. That would allow you to tune your source query separately, which is my next question.

    Are you seeing performance issues with your source query on its own?

    Wes
    (A solid design is always preferable to a creative workaround)

  • TheBI (12/5/2016)


    Hello,

    I want to optimise the following query that is running so slowly.

    Please help or gimme an advice.

    MERGE dbo.LocalizedCategories AS T

    USING (SELECT L.LanguageID,SC.CategoryID,BC.Value

    FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC

    CROSS APPLY(SELECT TOP 1 CategoryID FROM dbo.SourceCategories Where Identifier=CONVERT(NVARCHAR(10),BC.BetradarCategoryID) AND SourceID=@SourceID) SC

    CROSS APPLY(SELECT TOP 1 LanguageID FROM dbo.Languages WHERE LanguageCulture=BC.[Language] )L)

    AS S (LanguageID,CategoryID,Value)

    ON (T.CategoryID = S.CategoryID AND T.LanguageID=S.LanguageID)

    WHEN MATCHED AND T.Name<>S.Value

    THEN UPDATE SET

    T.Name=S.Value

    WHEN NOT MATCHED BY TARGET

    THEN INSERT

    ( CategoryID ,LanguageID, Name )

    VALUES ( S.CategoryID,S.LanguageID,S.Value);

    For those of us who struggle to read T-SQL presented as plain text, here is a formatted version:

    MERGE dbo.LocalizedCategories T

    USING

    (

    SELECT

    L.LanguageID

    , SC.CategoryID

    , BC.Value

    FROM

    (

    SELECT DISTINCT

    BetradarCategoryID

    , Value

    , Language

    FROM #LocalizedCategories

    WHERE Language <> 'BET'

    ) BC

    CROSS APPLY

    (

    SELECT TOP 1

    CategoryID

    FROM dbo.SourceCategories

    WHERE

    Identifier = CONVERT(NVARCHAR(10), BC.BetradarCategoryID) AND

    SourceID = @SourceID

    ) SC

    CROSS APPLY

    (

    SELECT TOP 1

    LanguageID

    FROM dbo.Languages

    WHERE LanguageCulture = BC.Language

    ) L

    ) S

    (LanguageID, CategoryID, Value)

    ON

    (

    T.CategoryID = S.CategoryID AND

    T.LanguageID = S.LanguageID

    )

    WHEN MATCHED AND T.Name <> S.Value THEN

    UPDATE SET T.Name = S.Value

    WHEN NOT MATCHED BY TARGET THEN

    INSERT

    (

    CategoryID

    , LanguageID

    , Name

    )

    VALUES

    (

    S.CategoryID, S.LanguageID, S.Value

    );

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • One thing you could also try is putting the results of the entire source query into a temp table, then indexing the temp table, then merging from only the temp table.

    Whether it improves things much depends on where your slowness originates.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • whenriksen (12/5/2016)


    DDL & DML would help.

    First of all, you've created a temp table #LocalizedCategories, but you are doing a Select Distinct on it. Is there any reason you couldn't have selected Distinct results in to it at creation? You're adding the Distinct overhead to a an already complicated query.

    TheBI (12/5/2016)


    MERGE dbo.LocalizedCategories AS T

    USING (SELECT L.LanguageID,SC.CategoryID,BC.Value

    FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC

    Another thing I would do is move your Merge source query into a CTE to visually separate the source logic from your Merge. That would allow you to tune your source query separately, which is my next question.

    Are you seeing performance issues with your source query on its own?

    I'll have to check about the DISTINCT. it seems that can be removed, but i'm not sure about whether there could be duplicates. will it make a bigger results set?

    Will see about implementing it with CTE.

    The source query is the performance issue.

    Thank you

    Igor Micev,My blog: www.igormicev.com

  • 1) What does the query plan look like? Post up estimated query plan for review.

    2) Table structures? Post those up too, including indexing on them.

    3) Do you have any mismatched data types?

    4) You have a function around a column in the WHERE clause. BIG NO NO!!!

    5) As others mentioned the DISTINCT is a performance hog and since you are just hitting a temp table I have to wonder if you could have just populated it with DISTINCT and be done with it. Or skip the temp table and go straight to the real source.

    6) Any triggers or foreign keys in play here?

    7) Did you use sp_whoisactive to check for simply blocking?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Igor Micev (12/5/2016)


    I'll have to check about the DISTINCT. it seems that can be removed, but i'm not sure about whether there could be duplicates. will it make a bigger results set?

    Will see about implementing it with CTE.

    The source query is the performance issue.

    - If there are duplicates, the result set will be bigger if you completely remove the Distinct. However, my suggestion is to move the distinct to the statement that populates the temp table, rather than perform the Distinct in the Merge.

    - Using the CTE by itself won't fix anything. I just think it will make it easier to read/debug.

    - If the query used as the Merge source is slow, the Merge will also be slow. Work on tuning that before you start troubleshooting the Merge statement.

    Wes
    (A solid design is always preferable to a creative workaround)

  • Slight detour from the task at hand. You have a couple subqueries here using TOP 1 but there is no order by. Without an order by you have know of ensuring which row will be returned. Perhaps in this case they would all be the same so it doesn't matter but I do find it worth mentioning.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If there is a temp table in the query batch, these are known to give you recompiles [ Edit : Depending on the difference in stats].

    So I suggest adding option(recompile) so that you can benefit from that extra level of constant folding since you will be doing a traditional recompile anyway [If above is true].

    Also, the option recompile goes through a different code path from the normal batch compilation, so you can get a different plan, most likely be better, but may be worse.

    Anyway, something to try.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • >> I want to optimise the following query that is running so slowly. <<

    First of all, we need some DDL what you did post is really bad. You put attributes in tables, you have columns with names like "category_id" which violate basic data modeling as well as ISO 11179 naming rules. Let me be specific; you can have a "<something in particular >_category" or a "<something in particular>_id" as the attribute of some entity; but by itself. This is a combination of what ISO calls attribute propertires. The format for a column name is [<role>_]<attribute>_<property>. Just think about how silly something like "category_value_id_nbr" would be. But the important point of attribute properties is that they are all distinct. This means crap like "T.generic_name <> S.generic_value" are absurd, and show that we have a horrible improperly done schema on our hands.

    Then you give vague attribute names like "language"; is this the ISO standard language code (looks like it because of three letters in a constant, this is still a guess because you do not post specs), a name, is this human or programming languages, etc.?

    Finally, things like a table of language codes along with their name are referenced in other tables, not join to them. This is where we get into the data modeling thing about strong attributes, etc. in a correctly designed schema, we do not include metadata. Things like "identifier" are metadata; they tell us how something is used and what it means a higher level in the design. In short, you probably have what we call an EAV design. We hate it. We do not do it.

    Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult and portability impossible.

    Just think about a name like "LocalizedCategories"; why are these totally different from the other generic <nothing in particular>_categories in your data model? Localized is an attribute or type or something, not a whole separate kind. You are telling us that these things are totally different, as much unlike as automobiles and squids, from mere "<nothing in particular >_categories" ! Also camel or Pascal case does not work for several reasons. The first is that many ISO standards (and products) are not case-sensitive, and some are. Thus ,"LocalizedCategories", "localizedcategories", "localizedCategories", etc. may or may not be the same name.

    The second reason is that your eye jumps to uppercase letters and pauses. You 20 and since you This is why the ISO and metadata committee standard spaces underscores. Underscores are actually easy to read because the Latin alphabet has what is called baseline (nothing to do with music). This is where the eye adjusts, and starts looking for spatial separation between units of language. Did you know it, one point manuscripts in the Middle Ages were written is continuous strings of letters, without spacing?

    Did you know that VALUE is both too vague to be column name and a reserved word in ANSI/ISO standard SQL? "Identifier" is metadata and has no place in the table at all.

    I have been doing this for 30+ years, so please trust me when I tell you that most of the time, the problems start with bad DDL and an improper data model. After that you spend far too much time in the DML trying to correct these bad designs.

    Can you throw out this garbage, start over and do it right?

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • CELKO (12/6/2016)


    Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult

    Now you're just making up stuff to support your claims.

    and portability impossible.

    Who cares?!?! Most people here are not at all interested in creating portable implementations of their code.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (12/6/2016)


    CELKO (12/6/2016)


    Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult

    Now you're just making up stuff to support your claims.

    and portability impossible.

    Who cares?!?! Most people here are not at all interested in creating portable implementations of their code.

    Drew

    Agreed. This is a SQL Server site.

    ANSI_SQL_central.com does not exist.

    Wes
    (A solid design is always preferable to a creative workaround)

  • CELKO (12/6/2016)


    Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult and portability impossible.

    Joe you make me laugh. You know that we have two options, either portable code or fast code. You can't have both. Your continued preaching about portability is seriously one of most tired and absurd arguments. Businesses don't just decide to switch to a new DBMS as frequently as you seem to think. In my experience that has happened exactly 0 times.

    How would you propose we get the TOP row? We could use a query to get that one on its own but all the DBMS have their own version of TOP, LIMIT etc.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • You know that we have two options, either portable code or fast code. You can't have both.

    No, I do not know that because the building entire career on having portable code along with decent performance. But more than performance, I look at the whole system. What is the total cost of ownership? When I upgrade to the next release of you in my current database (forget about the fact it may have to ported to other platforms), what is a cost me to move the code? In the SQL Server world let me give you some real examples. (1) when we replaced *= with LEFT OUTER JOIN and had rewrite code . My clients only had to take the comment marks of the ANSI standard code I left for them and drop the old code. I attested it both ways. What took other people weeks or months or years, my clients did in a weekend. (2) when the BIT stopped being an assembly language bit flag, and became a numeric data type (all datatypes are NULL-able) , my clients had never used bit flag assembly language coding, so they did not even notice.

    If you really, really wanted fast code you would write an assembly language and skip SQL altogether. But in the real world with large systems 90% of the total cost of the system is maintaining. The more maintainable a database is, the more portable a database is, the more standardized a database is, the cheaper it is in the total life of system. Are you familiar with the comedy routine that features "Larry the Cable Guy"? His tag phrase as he does horrible broadbrush quick kludges is "git'er done"; some of them will blow up immediately and some will last long enough for him to get out the door to the next job. This is cowboy coding and all too many programmers do it.

    In my experience that has happened exactly 0 times.

    I built a whole career on this and it is why I can charge an insanely large hourly rate:-). But frankly, I had rather help train people write good code at the start than to correct bad systems after the fact. I know that writing proprietary code gives you job security. I just do not think that is professional

    How would you propose we get the TOP row? We could use a query to get that one on its own but all the DBMS have their own version of TOP, LIMIT etc.

    Yes this is true; but they also have the ANSI/ISO standard "MAX(x) OVER (..)" and you can usually use that. If you have done a decent schema design, this will work; if you have to do multiple conditions to create a sorted "table", then I found after 30 years, it has probably got some design flaws.

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • CELKO (12/6/2016)


    You know that we have two options, either portable code or fast code. You can't have both.

    No, I do not know that because the building entire career on having portable code along with decent performance. But more than performance, I look at the whole system. What is the total cost of ownership? When I upgrade to the next release of you in my current database (forget about the fact it may have to ported to other platforms), what is a cost me to move the code? In the SQL Server world let me give you some real examples. (1) when we replaced *= with LEFT OUTER JOIN and had rewrite code . My clients only had to take the comment marks of the ANSI standard code I left for them and drop the old code. I attested it both ways. What took other people weeks or months or years, my clients did in a weekend. (2) when the BIT stopped being an assembly language bit flag, and became a numeric data type (all datatypes are NULL-able) , my clients had never used bit flag assembly language coding, so they did not even notice.

    If you really, really wanted fast code you would write an assembly language and skip SQL altogether. But in the real world with large systems 90% of the total cost of the system is maintaining. The more maintainable a database is, the more portable a database is, the more standardized a database is, the cheaper it is in the total life of system. Are you familiar with the comedy routine that features "Larry the Cable Guy"? His tag phrase as he does horrible broadbrush quick kludges is "git'er done"; some of them will blow up immediately and some will last long enough for him to get out the door to the next job. This is cowboy coding and all too many programmers do it.

    *= and bits? Most people who are working with databases today don't even understand what that syntax for join means. It has been removed from databases for a long time. All of your arguments here are about language modifications that happened 20 years ago. Can you update your arguments for portable code to something that is relevant in today's world?

    I agree 10000% about helping people understand how to do it right instead of right now. But the reality is that most business don't care enough about the database to consider switching. As long as it stores and retrieves data accurately and fast it doesn't really matter that much. I am glad you can charge crazy high rates for your consulting time. You have certainly put enough effort into databases that it is well deserved. But also keep in mind that most people couldn't care at all about portability and that by not using some of the proprietary portions you will continue to have code that is portable and the performance is decent. I prefer my code to be screaming fast. Each option has a cost, for me the cost is less portable code. For you, the cost is in performance. I am not saying that either one is better than the other. We clearly have different opinions on what is most important there. 😀

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply