April 18, 2016 at 11:14 am
It's absolutely not coding "laziness". It's simply to allow for data type changes later without causing performance issues in the code.
Completely disagree with the first sentence. And even more vehemently disagree with the implied acceptance of changing data types on columns without doing an impact analysis and changing ALL references to said column (including other table's matching columns) to the new data type!!
I will go on the record though as being QUITE happy that most developers I have come across feel exactly like you do. Well, they do until I get through showing them how bad their practices and the resulting code actually is. Thanks to this there is a nearly infinite amount of work out there for consultants like myself. :hehe:
Hmm, perhaps another Guruism in the making:
Developers should do a little extra work so SQL Server doesn't have to.
Yeah, I like that one.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 18, 2016 at 11:21 am
Developers using implicit conversions is how I earned my director's award during a major application upgrade once. The developer mentality says that fewer queries is better and simpler activity is better. So they had a stored proc that had an input of one data type which was used in a single two-table query to get a discreet value. The input filtered the first table and the join was needed to get the required value. However, the two columns on the join did not have the same data type. So even though it was logical to do this in one query, I convinced them (on the fly as there was a large and time-constrained load going on) to change it to two queries -- one to get the value from the first table to join to the second table in a proper data type, and one to get the final value from the second table. As non-intuitive as that was to them, what was going to take 3 days at the rate it was going finished with the desired results in 6 hours and on time -- thus making my director very happy with me. You will never convince me that data type mismatch is not the single biggest issue when dealing with development problems.
April 18, 2016 at 12:25 pm
A little background...
All of the views and functions defined in this database are as-is from Microsoft. This is the Microsoft System Center 2012 Configuration Manager database.
The data in v_R_System is what SCCM has discovered about a system from Active Directory, including the AD attribute managedBy, which is stored as the DN of the system owner (CN=Smith\, John,OU=Users,DC=domain,DC=pri). We use v_R_System.managedBy0 to lookup the user in v_R_User.Distinguished_Name0, and bring back v_R_User.Full_User_Name0 (Smith, John). It is this Full_User_Name0 value that I'd like to use for Assignee. If multiple values were possible (hence the 'IN'), it would be nice to have, but not necessary.
And a big thank you to all of you who are participating. Interesting to see all the suggestions and debate!
April 18, 2016 at 12:49 pm
TheSQLGuru (4/18/2016)
It's absolutely not coding "laziness". It's simply to allow for data type changes later without causing performance issues in the code.
Completely disagree with the first sentence. And even more vehemently disagree with the implied acceptance of changing data types on columns without doing an impact analysis and changing ALL references to said column (including other table's matching columns) to the new data type!!
I will go on the record though as being QUITE happy that most developers I have come across feel exactly like you do. Well, they do until I get through showing them how bad their practices and the resulting code actually is. Thanks to this there is a nearly infinite amount of work out there for consultants like myself. :hehe:
Hmm, perhaps another Guruism in the making:
Developers should do a little extra work so SQL Server doesn't have to.
Yeah, I like that one.
I'm not a developer, I'm a DBA, and I understand the data types involved. Unnecessary forced data typing causes far more issues than it would solve. IF a conversion does in fact help SQL, yes do it, as I noted above. But your point was that it was an absolute rule, PERIOD, to force a specific data type on all comparisons. That's a horrible idea, with tons of wasted code and increased opportunity for bad performance later.
It's not my fault that many of the developers you speak of are too ignorant to know when they should and should not convert. But that's enough justification for a blanket rule that forces vast amounts of extra code and additional re-work.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 18, 2016 at 12:57 pm
Can you give any example of where converting a literal value to appropriate non-unicode literal could ever cause SQL a performance issue?
Or for dates. Our rule is you convert a variable date/datetime value to a varchar in format YYYYMMDD[ hh:mm[:ss]|[ss.sss]|[ss.sssssss] ]. SQL will implicitly convert it back to the appropriate data type for the column.
Edit: SQL now will automatically do this for, say, a datetime variable against a date column, but that used to force an implicit conversion of the table column which caused the usual non-sargable issues.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 18, 2016 at 4:10 pm
Sean Lange (4/15/2016)
Well then apparently you think myself and many others are constantly speaking nothing but BS???
Only if you say so.
You forgot to mention scalar functions and multi-statement table valued functions.
And CTE's. Beloved CTE's.
So beloved that I've recently seen a query where 6 tables were joined in a single CTE which was then "re-used" 4 times, and each time columns from only 1 or 2 tables were retrieved, and when they needed data from a look-up table they had to use DISTINCT to select unique values (they were unique in that table by themselves).
You have been around this technology long enough to know that all of these things make sense in the right time and place.
Exactly my point.
"All of these things make sense" - absolutely right.
Same as kitchen knives, ladders, chain saws, etc.
After so many people have died or been badly injured by those tools - they should be forbidden, right?
Right?
The problem is that they very often get over used and abused to the point where entire systems are crippled. We see it time and time again on forums here and everywhere else.
Don't blame tools.
Grapevine is not guilty of anything.
Guns don't kill. People do.
_____________
Code for TallyGenerator
April 18, 2016 at 4:25 pm
ScottPletcher (4/18/2016)
I'm not a developer, I'm a DBA, and I understand the data types involved.
That's it.
This condition can allow you, or me, or somebody with the same level of understanding to use implicit conversions - we understand the implications and can think it through.
As for developers who cannot tell the difference between values 5 and '5' - implicit conversions in their code is a source of continuous errors of all kinds.
_____________
Code for TallyGenerator
April 18, 2016 at 7:06 pm
Good discussion but still off the rails. Anyone going to help the OP? 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2016 at 7:33 pm
Jeff Moden (4/18/2016)
Good discussion but still off the rails. Anyone going to help the OP? 😉
At 11:09am two days ago I gave the OP 2 things to try (although I just corrected a copy-paste error to get to this: and vru.Full_User_Name0 =(@Assignee) instead of this and vru.Full_User_Name0 IN (@Assignee)). I don't thing the OP ever reported back on his testing with those things.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 18, 2016 at 7:35 pm
Mike Frazer (4/14/2016)
Thank you...attached.Edit: I added the DDL for the related objects as scripts.txt
I looked for the 68% cost node in the execution plan and I'm not finding it. What I am finding is that there are a number of index seeks that contain more than 1200 individual executions and some scans in the same condition except they are returning more than 18 million rows behind the scenes. Those rows still "travel" behind the scenes.
The fact that you also have a DISTINCT in the code to eliminate duplicates means that something isn't quite right with the JOIN or WHERE criteria or, perhaps, the wrong index is being picked up because there's no other to chose from. That may be what's happening when you add that final condition... it can and will change this rather large and complicated execution plan in ways you never imagined and it can do it in different ways depending on what's already in memory or not.
The views are necessarily bad but they hide a lot from you and will frequently add complexity and extra rows where none are needed. Depending on the columns they return, they may actually cause the optimizer to make bad decisions as to which indexes to use for your particular query.
My recommendation would be to "Divide'n'Conquer" with the biggest goal being the one to avoid the use of DISTINCT or GROUP BY (some folks use that to hide the fact that they need a DISTINCT).
The best thing to do would be to figure out what the core of this query is... that is, which table or 2 tables contain the core information that the rest of the query will rely on. If the views are written in a generic fashion and they return more columns that what you need, you may want to write some code to replace the view in your query.
And, to be sure, it's not necessary to do this in a single query. A lot of people will scream "avoid the use of TEMP TABLES!" and they'd be right but the wrong way. Where do you think the two 18 million row and one 72 million row hash-matches are going to resolve.
Break this query up. Store the "core" data in a temp table if you need to (and don't mistake a CTE for any form of data materialization because it's not and neither is a view). Remember that "set based" has nothing to do with "all in one huge query".
--Jeff Moden
Change is inevitable... Change for the better is not.
April 19, 2016 at 12:00 am
Jeff Moden (4/18/2016)
I looked for the 68% cost node in the execution plan and I'm not finding it. What I am finding is that there are a number of index seeks that contain more than 1200 individual executions and some scans in the same condition except they are returning more than 18 million rows behind the scenes. Those rows still "travel" behind the scenes.
I'm with Jeff here.
We can see 18 mil rows selected from [CM_ZCM].[dbo].[CI_LocalizedProperties], 72 mil rows selected from [CM_ZCM].[dbo].[CI_ConfigurationItems], then other 18 mil rows selected from [CM_ZCM].[dbo].[CI_LocalizedProperties] once again.
All those massives are computed, hash matched, again computed, again hash matched and end in the end filtered into 1247 rows:
[CM_ZCM].[dbo].[CI_LocalizedProperties].[CI_ID] as [loc].[CI_ID]=[CM_ZCM].[dbo].[Update_ComplianceStatus].[CI_ID] as [cs].[CI_ID]
We still to see the code where all of that is happening.
But this is exactly where the problem is.
_____________
Code for TallyGenerator
Viewing 11 posts - 31 through 40 (of 40 total)
You must be logged in to reply to this topic. Login to reply