Query optimization

Question

Query optimization

TheBI

SSC Veteran

Points: 215
More actions
December 5, 2016 at 7:47 am

#337885

Hello,
I want to optimise the following query that is running so slowly.
Please help or gimme an advice.
MERGE dbo.LocalizedCategories AS T
USING (SELECT L.LanguageID,SC.CategoryID,BC.Value
FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC
CROSS APPLY(SELECT TOP 1 CategoryID FROM dbo.SourceCategories Where Identifier=CONVERT(NVARCHAR(10),BC.BetradarCategoryID) AND SourceID=@SourceID) SC
CROSS APPLY(SELECT TOP 1 LanguageID FROM dbo.Languages WHERE LanguageCulture=BC.[Language] )L)
AS S (LanguageID,CategoryID,Value)
ON (T.CategoryID = S.CategoryID AND T.LanguageID=S.LanguageID)
WHEN MATCHED AND T.Name<>S.Value
THEN UPDATE SET
T.Name=S.Value
WHEN NOT MATCHED BY TARGET
THEN INSERT
( CategoryID ,LanguageID, Name )
VALUES ( S.CategoryID,S.LanguageID,S.Value);

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 1

DDL & DML would help.

First of all, you've created a temp table #LocalizedCategories, but you are doing a Select Distinct on it. Is there any reason you couldn't have selected Distinct results in to it at creation? You're adding the Distinct overhead to a an already complicated query.

TheBI (12/5/2016)
MERGE dbo.LocalizedCategories AS T
USING (SELECT L.LanguageID,SC.CategoryID,BC.Value
FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC

Another thing I would do is move your Merge source query into a CTE to visually separate the source logic from your Merge. That would allow you to tune your source query separately, which is my next question.

Are you seeing performance issues with your source query on its own?

Wes
(A solid design is always preferable to a creative workaround)

Phil Parkin SSC Guru Points: 246986 More actions · Answer 2

TheBI (12/5/2016)
Hello,
I want to optimise the following query that is running so slowly.
Please help or gimme an advice.
MERGE dbo.LocalizedCategories AS T
USING (SELECT L.LanguageID,SC.CategoryID,BC.Value
FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC
CROSS APPLY(SELECT TOP 1 CategoryID FROM dbo.SourceCategories Where Identifier=CONVERT(NVARCHAR(10),BC.BetradarCategoryID) AND SourceID=@SourceID) SC
CROSS APPLY(SELECT TOP 1 LanguageID FROM dbo.Languages WHERE LanguageCulture=BC.[Language] )L)
AS S (LanguageID,CategoryID,Value)
ON (T.CategoryID = S.CategoryID AND T.LanguageID=S.LanguageID)
WHEN MATCHED AND T.Name<>S.Value
THEN UPDATE SET
T.Name=S.Value
WHEN NOT MATCHED BY TARGET
THEN INSERT
( CategoryID ,LanguageID, Name )
VALUES ( S.CategoryID,S.LanguageID,S.Value);

For those of us who struggle to read T-SQL presented as plain text, here is a formatted version:

MERGE dbo.LocalizedCategories T

USING

(

SELECT

L.LanguageID

, SC.CategoryID

, BC.Value

FROM

(

SELECT DISTINCT

BetradarCategoryID

, Value

, Language

FROM #LocalizedCategories

WHERE Language <> 'BET'

) BC

CROSS APPLY

(

SELECT TOP 1

CategoryID

FROM dbo.SourceCategories

WHERE

Identifier = CONVERT(NVARCHAR(10), BC.BetradarCategoryID) AND

SourceID = @SourceID

) SC

CROSS APPLY

(

SELECT TOP 1

LanguageID

FROM dbo.Languages

WHERE LanguageCulture = BC.Language

) L

) S

(LanguageID, CategoryID, Value)

ON

(

T.CategoryID = S.CategoryID AND

T.LanguageID = S.LanguageID

)

WHEN MATCHED AND T.Name <> S.Value THEN

UPDATE SET T.Name = S.Value

WHEN NOT MATCHED BY TARGET THEN

INSERT

(

CategoryID

, LanguageID

, Name

)

VALUES

(

S.CategoryID, S.LanguageID, S.Value

);

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Phil Parkin SSC Guru Points: 246986 More actions · Answer 3

One thing you could also try is putting the results of the entire source query into a temp table, then indexing the temp table, then merging from only the temp table.

Whether it improves things much depends on where your slowness originates.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 4

whenriksen (12/5/2016)
DDL & DML would help.
First of all, you've created a temp table #LocalizedCategories, but you are doing a Select Distinct on it. Is there any reason you couldn't have selected Distinct results in to it at creation? You're adding the Distinct overhead to a an already complicated query.
TheBI (12/5/2016)
MERGE dbo.LocalizedCategories AS T
USING (SELECT L.LanguageID,SC.CategoryID,BC.Value
FROM (SELECT DISTINCT BetradarCategoryID,Value,[Language] FROM #LocalizedCategories WHERE [Language]<>'BET') AS BC
Another thing I would do is move your Merge source query into a CTE to visually separate the source logic from your Merge. That would allow you to tune your source query separately, which is my next question.
Are you seeing performance issues with your source query on its own?

I'll have to check about the DISTINCT. it seems that can be removed, but i'm not sure about whether there could be duplicates. will it make a bigger results set?

Will see about implementing it with CTE.

The source query is the performance issue.

Thank you

Igor Micev,My blog: www.igormicev.com

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

1) What does the query plan look like? Post up estimated query plan for review.

2) Table structures? Post those up too, including indexing on them.

3) Do you have any mismatched data types?

4) You have a function around a column in the WHERE clause. BIG NO NO!!!

5) As others mentioned the DISTINCT is a performance hog and since you are just hitting a temp table I have to wonder if you could have just populated it with DISTINCT and be done with it. Or skip the temp table and go straight to the real source.

6) Any triggers or foreign keys in play here?

7) Did you use sp_whoisactive to check for simply blocking?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 6

Igor Micev (12/5/2016)
I'll have to check about the DISTINCT. it seems that can be removed, but i'm not sure about whether there could be duplicates. will it make a bigger results set?
Will see about implementing it with CTE.
The source query is the performance issue.

- If there are duplicates, the result set will be bigger if you completely remove the Distinct. However, my suggestion is to move the distinct to the statement that populates the temp table, rather than perform the Distinct in the Merge.

- Using the CTE by itself won't fix anything. I just think it will make it easier to read/debug.

- If the query used as the Merge source is slow, the Merge will also be slow. Work on tuning that before you start troubleshooting the Merge statement.

Wes
(A solid design is always preferable to a creative workaround)

Sean Lange SSC Guru Points: 286590 More actions · Answer 7

Slight detour from the task at hand. You have a couple subqueries here using TOP 1 but there is no order by. Without an order by you have know of ensuring which row will be returned. Perhaps in this case they would all be the same so it doesn't matter but I do find it worth mentioning.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

MadAdmin SSChampion Points: 11261 More actions · Answer 8

If there is a temp table in the query batch, these are known to give you recompiles [ Edit : Depending on the difference in stats].

So I suggest adding option(recompile) so that you can benefit from that extra level of constant folding since you will be doing a traditional recompile anyway [If above is true].

Also, the option recompile goes through a different code path from the normal batch compilation, so you can get a different plan, most likely be better, but may be worse.

Anyway, something to try.

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

Joe Celko SSCertifiable Points: 5749 More actions · Answer 9

>> I want to optimise the following query that is running so slowly. <<

First of all, we need some DDL what you did post is really bad. You put attributes in tables, you have columns with names like "category_id" which violate basic data modeling as well as ISO 11179 naming rules. Let me be specific; you can have a "<something in particular >_category" or a "<something in particular>_id" as the attribute of some entity; but by itself. This is a combination of what ISO calls attribute propertires. The format for a column name is [<role>_]<attribute>_<property>. Just think about how silly something like "category_value_id_nbr" would be. But the important point of attribute properties is that they are all distinct. This means crap like "T.generic_name <> S.generic_value" are absurd, and show that we have a horrible improperly done schema on our hands.

Then you give vague attribute names like "language"; is this the ISO standard language code (looks like it because of three letters in a constant, this is still a guess because you do not post specs), a name, is this human or programming languages, etc.?

Finally, things like a table of language codes along with their name are referenced in other tables, not join to them. This is where we get into the data modeling thing about strong attributes, etc. in a correctly designed schema, we do not include metadata. Things like "identifier" are metadata; they tell us how something is used and what it means a higher level in the design. In short, you probably have what we call an EAV design. We hate it. We do not do it.

Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult and portability impossible.

Just think about a name like "LocalizedCategories"; why are these totally different from the other generic <nothing in particular>_categories in your data model? Localized is an attribute or type or something, not a whole separate kind. You are telling us that these things are totally different, as much unlike as automobiles and squids, from mere "<nothing in particular >_categories" ! Also camel or Pascal case does not work for several reasons. The first is that many ISO standards (and products) are not case-sensitive, and some are. Thus ,"LocalizedCategories", "localizedcategories", "localizedCategories", etc. may or may not be the same name.

The second reason is that your eye jumps to uppercase letters and pauses. You 20 and since you This is why the ISO and metadata committee standard spaces underscores. Underscores are actually easy to read because the Latin alphabet has what is called baseline (nothing to do with music). This is where the eye adjusts, and starts looking for spatial separation between units of language. Did you know it, one point manuscripts in the Middle Ages were written is continuous strings of letters, without spacing?

Did you know that VALUE is both too vague to be column name and a reserved word in ANSI/ISO standard SQL? "Identifier" is metadata and has no place in the table at all.

I have been doing this for 30+ years, so please trust me when I tell you that most of the time, the problems start with bad DDL and an improper data model. After that you spend far too much time in the DML trying to correct these bad designs.

Can you throw out this garbage, start over and do it right?

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

drew.allen SSC Guru Points: 76997 More actions · Answer 10

CELKO (12/6/2016)
Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult

Now you're just making up stuff to support your claims.

and portability impossible.

Who cares?!?! Most people here are not at all interested in creating portable implementations of their code.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 11

drew.allen (12/6/2016)
CELKO (12/6/2016)
Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult
Now you're just making up stuff to support your claims.
and portability impossible.
Who cares?!?! Most people here are not at all interested in creating portable implementations of their code.
Drew

Agreed. This is a SQL Server site.

ANSI_SQL_central.com does not exist.

Wes
(A solid design is always preferable to a creative workaround)

Sean Lange SSC Guru Points: 286590 More actions · Answer 12

CELKO (12/6/2016)
Using proprietary dialect constructs like CROSS APPLY and TOP make optimization difficult and portability impossible.

Joe you make me laugh. You know that we have two options, either portable code or fast code. You can't have both. Your continued preaching about portability is seriously one of most tired and absurd arguments. Businesses don't just decide to switch to a new DBMS as frequently as you seem to think. In my experience that has happened exactly 0 times.

How would you propose we get the TOP row? We could use a query to get that one on its own but all the DBMS have their own version of TOP, LIMIT etc.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Joe Celko SSCertifiable Points: 5749 More actions · Answer 13

You know that we have two options, either portable code or fast code. You can't have both.

No, I do not know that because the building entire career on having portable code along with decent performance. But more than performance, I look at the whole system. What is the total cost of ownership? When I upgrade to the next release of you in my current database (forget about the fact it may have to ported to other platforms), what is a cost me to move the code? In the SQL Server world let me give you some real examples. (1) when we replaced *= with LEFT OUTER JOIN and had rewrite code . My clients only had to take the comment marks of the ANSI standard code I left for them and drop the old code. I attested it both ways. What took other people weeks or months or years, my clients did in a weekend. (2) when the BIT stopped being an assembly language bit flag, and became a numeric data type (all datatypes are NULL-able) , my clients had never used bit flag assembly language coding, so they did not even notice.

If you really, really wanted fast code you would write an assembly language and skip SQL altogether. But in the real world with large systems 90% of the total cost of the system is maintaining. The more maintainable a database is, the more portable a database is, the more standardized a database is, the cheaper it is in the total life of system. Are you familiar with the comedy routine that features "Larry the Cable Guy"? His tag phrase as he does horrible broadbrush quick kludges is "git'er done"; some of them will blow up immediately and some will last long enough for him to get out the door to the next job. This is cowboy coding and all too many programmers do it.

In my experience that has happened exactly 0 times.

I built a whole career on this and it is why I can charge an insanely large hourly rate:-). But frankly, I had rather help train people write good code at the start than to correct bad systems after the fact. I know that writing proprietary code gives you job security. I just do not think that is professional

How would you propose we get the TOP row? We could use a query to get that one on its own but all the DBMS have their own version of TOP, LIMIT etc.

Yes this is true; but they also have the ANSI/ISO standard "MAX(x) OVER (..)" and you can usually use that. If you have done a decent schema design, this will work; if you have to do multiple conditions to create a sorted "table", then I found after 30 years, it has probably got some design flaws.

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

Sean Lange SSC Guru Points: 286590 More actions · Answer 14

CELKO (12/6/2016)
You know that we have two options, either portable code or fast code. You can't have both.
No, I do not know that because the building entire career on having portable code along with decent performance. But more than performance, I look at the whole system. What is the total cost of ownership? When I upgrade to the next release of you in my current database (forget about the fact it may have to ported to other platforms), what is a cost me to move the code? In the SQL Server world let me give you some real examples. (1) when we replaced *= with LEFT OUTER JOIN and had rewrite code . My clients only had to take the comment marks of the ANSI standard code I left for them and drop the old code. I attested it both ways. What took other people weeks or months or years, my clients did in a weekend. (2) when the BIT stopped being an assembly language bit flag, and became a numeric data type (all datatypes are NULL-able) , my clients had never used bit flag assembly language coding, so they did not even notice.
If you really, really wanted fast code you would write an assembly language and skip SQL altogether. But in the real world with large systems 90% of the total cost of the system is maintaining. The more maintainable a database is, the more portable a database is, the more standardized a database is, the cheaper it is in the total life of system. Are you familiar with the comedy routine that features "Larry the Cable Guy"? His tag phrase as he does horrible broadbrush quick kludges is "git'er done"; some of them will blow up immediately and some will last long enough for him to get out the door to the next job. This is cowboy coding and all too many programmers do it.

*= and bits? Most people who are working with databases today don't even understand what that syntax for join means. It has been removed from databases for a long time. All of your arguments here are about language modifications that happened 20 years ago. Can you update your arguments for portable code to something that is relevant in today's world?

I agree 10000% about helping people understand how to do it right instead of right now. But the reality is that most business don't care enough about the database to consider switching. As long as it stores and retrieves data accurately and fast it doesn't really matter that much. I am glad you can charge crazy high rates for your consulting time. You have certainly put enough effort into databases that it is well deserved. But also keep in mind that most people couldn't care at all about portability and that by not using some of the proprietary portions you will continue to have code that is portable and the performance is decent. I prefer my code to be screaming fast. Each option has a cost, for me the cost is less portable code. For you, the cost is in performance. I am not saying that either one is better than the other. We clearly have different opinions on what is most important there. 😀

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/