August 21, 2012 at 3:27 pm
Our proprietary reporting module calls stored procedures that have no parameters but make extensive use of local variables.
A single query (and many of them are dynamic SQL) may utilize many local variables in the WHERE clause. And some of the stored procs are very complex, containing more than 2000 lines of code.
I'm just beginning to learn the importance of accurate cardinality estimates and am wondering if anyone can share how they might enhance optimization of procs that make extensive use of local variables that can affect the size of result set.
For example, would you have the procs set to recompile each time they are run? Would you make use of "OPTIMIZE FOR.." etc.
I hope this makes sense. Please don't yell at me for not yet posting a code sample. I will be glad to do that when asked. I just want to be sure that whatever I put up is useful.
August 21, 2012 at 11:59 pm
Where local variables get their values from?
You should probably post some code. (Pick the simplest of your procedures).
It would be hard to recommend (or not recommend) the use of Optimize or Recompile options. Most people would probably say “it depends” and you’ll have to experiment with those in your environment.
--Vadim R.
August 22, 2012 at 1:30 am
jshahan (8/21/2012)
For example, would you have the procs set to recompile each time they are run? Would you make use of "OPTIMIZE FOR.." etc.
Not by default I wouldn't. I'd look at the procedures (starting with the ones that perform worst) and see if there are any cardinality errors. If there are, I might go for recompile, optimise for or splitting the procedure into sub-procedures. No blanket rule.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 22, 2012 at 5:01 am
jshahan (8/21/2012)
For example, would you have the procs set to recompile each time they are run? Would you make use of "OPTIMIZE FOR.." etc.
Using WITH RECOMPILE on the procedure is rarely the best option; it causes every query inside the procedure to compile at least once. OPTION (RECOMPILE) is a much more targeted weapon - it recompiles only the query it is attached to. This option allows SQL Server 2008 to 'see' the values in the local variables at the time, and produce a plan optimized for that value. The downside is that compilation is not free, and you might see a significant increase in CPU and memory if the queries are complex and/or called very frequently. OPTIMIZE FOR can be useful, if you can identify a suitable value to optimize for. It can be tough to choose such a value, especially one that will remain a good choice for all time as the data changes. As Gail said, I would look to analyse procedures and queries that perform unacceptably now, and use the tool that best suits the individual case.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 22, 2012 at 7:46 am
You make some good points, Joe, but I fear the "proprietary module" allows for limited rewrite & redesign opportunities.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 22, 2012 at 8:12 am
-- create a temporary stored procedure
create procedure #sp_TempReportMaster AS
create table #iFromDate (iFromDate DateTime)
insert into #iFromDate values ('JAN 1 2012')
create table #iThruDate (iThruDate DateTime)
insert into #iThruDate values ('JAN 31 2012')
create table #iProdSort1 (ID int)
insert into #iProdSort1 values (6537)
exec Rpt_TestProcedure
-- the temporary proc to calls the stored procedure
-- and makes the values available for the local variables
CREATE PROCEDURE [dbo].[Rpt_TestPRocedure]
AS
Declare @FromDate datetime,
@ThruDate datetime,
@Sort1 nvarchar(4000)
set @FromDate = (select iFromDate from #iFromDate)
set @ThruDate = (select iThruDate from #iThruDate)
set @Sort1 = (select Query from ReportOptionsQuery where OptionCode =
(select OptionCode from ReportOptions where id =
(select id from #iProdSort1)))
Do you all know how wonderful it is to know that you’re getting some of the best advice available on the planet regarding a complex subject that you are just beginning to learn? It’s like I asked how to play an E chord and got an overnight response from Clapton and Santana. I thank you all.
In sequence of your kind responses:
rVadim
The local variables are usually passed a value by a temporary stored procedure created by the report module. Example provided above. The temporary proc is assembled dynamically and is extremely versatile. Sometimes dozens of parameters are passed.
Gail and Paul
Your blogs/columns/forum responses are a big part of my attempt to enhance my skills and are really helping me now. I’m trying to encourage our other developers to make use of them.
I think to paraphrase each of your responses, you are advising against a sledgehammer approach in favor of nuanced troubleshooting that identifies precisely where problems actually exist and addressing them appropriately. I need to find out if I can use multiple values for OPTIMIZE FOR should I decide to try it but I’ll check documentation for that.
Joe
I’m sure you know that you are equally famous for your knowledge as you are for your beat downs. I wear my ignorance on my sleeve in these posts and I hope you don’t find cause for blasting me for my questions.
I didn’t realize I was further revealing my ignorance by mentioning our use of dynamic SQL. I don’t know what is meant by the use of the term “cohesion” in this context. If you have an additional moment to educate me, I’d appreciate it.
And as an FYI, the background of the primary developer of the product was Paradox.
And Paul, you are dead on. No way a rewrite is happening at this point in time.
John
August 22, 2012 at 8:21 am
jshahan (8/22/2012)
I think to paraphrase each of your responses, you are advising against a sledgehammer approach in favor of nuanced troubleshooting that identifies precisely where problems actually exist and addressing them appropriately.
Yup. I'm not a fan of what I call shotgun query tuning. Find the problems, address the problems and be specific with the solutions.
I need to find out if I can specific multiple values for OPTIMIZE FOR should I decide to try it but I’ll check documentation for that.
Not in a single query, no.
I didn’t realize I was further revealing my ignorance by mentioning our use of dynamic SQL
Nothing wrong with dynamic SQL used appropriately. It's a tool. Whether you're using it appropriately or not is hard to tell with this little information.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 22, 2012 at 9:59 am
set @SqlString3 = N'
left join ClientProgram cpgm on cpgm.EnrollmentKey = ClientEnrollment.ID
and '''+convert(char,@ThruDate,101)+''' between
isnull(cpgm.effectivedate,'''+convert(char,@FromDate,101)+''') and
isnull(cpgm.expirationdate,'''+convert(char,@ThruDate,101)+''')
left join Program pgm on pgm.id = cpgm.ProgramKey
left join Program spgm on spgm.id = cpgm.SubProgramKey
where dbo.ClientEnrollment.Void is null and dbo.ClientEnrollment. Enrollment_Date between
'''+convert(char(10),@FromDate,101)+'''and '''+convert(char(10),@ThruDate,101)+''''
if @ProviderCount >0
set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select id from #Provider)'
if @SiteCount >0
set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select ProviderKey from ProviderSites where ProviderSites.ProviderSiteKey in (select ID from #Site))'
if @RBHACount >0
set @SqlString3 = @SqlString3 + N' and dbo.RBHA.ID in (select ID from #RBHA)'
if @RBHAGroupCount >0
set @SqlString3 = @SqlString3 + N' and dbo.RBHA.ID in (select RbhaKey from RBHAXrefGroup where RBHAXrefGroup.GroupKey in (select ID from #RBHAGroup))'
if @EnrollType = 'C'
set @SqlString3 = @SqlString3 + N' and dbo.ClientEnrollment.CrisisEnrollment = 1'
if @EnrollType = 'R'
set @SqlString3 = @SqlString3 + N' and (dbo.ClientEnrollment.CrisisEnrollment = 0
or dbo.ClientEnrollment.CrisisEnrollment is null)'
Above is an example of how we typically use dynamic SQL. I'm not an authority on injection but it seems pretty airtight to me.
August 22, 2012 at 11:49 am
jshahan (8/22/2012)
if @ProviderCount >0
set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select id from #Provider)'
Above is an example of how we typically use dynamic SQL. I'm not an authority on injection but it seems pretty airtight to me.
That's generally a bad way to handle it. IN can be a very expensive operator.
You're very close to a catch-all type of thing here... if you need to solve the problem dynamically, I'd consider building your SELECT, FROM and WHERE clauses (and potentially GROUP BY/HAVING) as separate SQL variables.
Then concatenate those pieces back together.
Gail has a great blog on this: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
The above could instead be constructed as a JOIN then, which generally performs better than using the IN operator.
August 22, 2012 at 12:07 pm
where dbo.ClientEnrollment.Void is null and dbo.ClientEnrollment. Enrollment_Date between
'''+convert(char(10),@FromDate,101)+''' and '''+convert(char(10),@ThruDate,101)+''''
Nope, this is injectable. Look up sp_executeSQL, and in particular parameterization. Any time you're directly including a variable into the SQL Execute string without a parameter 'wash' of some kind you're open to injection.
Our proprietary reporting module calls stored procedures that have no parameters but make extensive use of local variables.
A single query (and many of them are dynamic SQL) may utilize many local variables in the WHERE clause. And some of the stored procs are very complex, containing more than 2000 lines of code.
How the heck do they use local variables that heavily without them being proc parameters, or am I misunderstanding this a little?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 22, 2012 at 12:08 pm
mtassin (8/22/2012)
The above could instead be constructed as a JOIN then, which generally performs better than using the IN operator.
No it doesn't.
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/
General conclusion: In is a very slight bit more optimal than join for the purpose of just checking if the rows match (SQL can use a semi-join not a full join)
In is not an inefficient operator. I suspect, given the prevalence of that thought that it used to be at some point in SQL's history.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 22, 2012 at 12:18 pm
Evil Kraig F (8/22/2012)
where dbo.ClientEnrollment.Void is null and dbo.ClientEnrollment. Enrollment_Date between'''+convert(char(10),@FromDate,101)+''' and '''+convert(char(10),@ThruDate,101)+''''
Nope, this is injectable.
Depends. Very very hard to inject anything with 10 characters to play with (the convert to char 10), and if that variable's defined as a datetime it's possible for it to contain rogue input. Also, can't see from there where @FromDate comes from. It could be a calculation, from a table or user input.
The general form is certainly injectable (variables concatenated into a string), the specific example, can't say.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 22, 2012 at 12:30 pm
GilaMonster (8/22/2012)
mtassin (8/22/2012)
The above could instead be constructed as a JOIN then, which generally performs better than using the IN operator.No it doesn't.
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/
General conclusion: In is a very slight bit more optimal than join for the purpose of just checking if the rows match (SQL can use a semi-join not a full join)
In is not an inefficient operator. I suspect, given the prevalence of that thought that it used to be at some point in SQL's history.
Thanks for the Article Gail
I do recall back with SQL 7/2000 that IN was very expensive. Figures that at some point MS would make it better. I recall watching IN get blown up into a chained set of OR's. I would see it bypass indexes and go for table scans and a bunch of other junk.
I also enjoyed reading the comparison of NOT EXISTS vs JOIN, did I miss one that compared EXISTS/NOT EXISTS vs IN/NOT IN?
August 22, 2012 at 12:45 pm
mtassin (8/22/2012)
I do recall back with SQL 7/2000 that IN was very expensive. Figures that at some point MS would make it better. I recall watching IN get blown up into a chained set of OR's.
That's IN with a list of values. Not IN with a subquery. Very different behaviour. SQL 2000 had problems with OR statements, so that translated over to IN with values lists.
I would see it bypass indexes and go for table scans and a bunch of other junk.
It will go for scans if there are too many values in the list to make multiple seeks an option (just like with key lookups vs table scan)
Consider this:
SELECT <stuff> FROM SomeTable WHERE SomeColumn IN (<List of 1000 values>)
Now let's say the table has 1500 pages and SomeColumn is indexed (and it's even a covering index). Which will be more efficient, a single table scan that reads 1500 pages, or 1000 nonclustered index seeks that each read 2 pages?
Table scan != inefficient. That's another of those irritating performance myths.
did I miss one that compared EXISTS/NOT EXISTS vs IN/NOT IN?
How would I compare things that do the complete opposite to each other?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 22, 2012 at 12:54 pm
GilaMonster (8/22/2012)
did I miss one that compared EXISTS/NOT EXISTS vs IN/NOT IN?
How would I compare things that do the complete opposite to each other?
EXISTS vs IN
NOT EXISTS vs NOT IN
Not exactly opposite, and I've seen them used interchangeably to do the same thing.
i.e.
back to this code
if @ProviderCount >0
set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select id from #Provider)'
This could be written this way... (I realize one is dynamic, the other isn't... ignoring that fact I'm mostly talking about the dbo.enrollment.providerkey IN (select id from #Provider)
I'm pretending dbo.enrollment is aliased with a in this case and not making it a dynamic fragment... but I hope my question makes sense.
WHERE
EXISTS (select 1 from #Provider x where a.ProviderKey = x.id)
It's effectively very similar to the IN, but I've seen differences in performance between them myself. Maybe I'm just imagining things?
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply