May 21, 2022 at 1:13 pm
Hello.
I am not a SQL Developer - just an IT generalist (if that) who dips in and out of SQL Server
stuff and even writing this down helps me to better understand the 'problem'.
From reading, the perceived wisdom is that you should not do DB lookups in Scalar functions for performance reasons.
I am looking at some code for a job that used to run for 13 hours until it got canned, due to the fact that the results were incorrect as much as the runtime (it was pointless).
The job used a Scalar function to perform multiple DB lookups (10's of millions per hour).
I now want to do something similar to what this job did but avoid the pitfalls of doing the lookups in a Scalar function.
I see that 2019 implicitly converts to inline TVFs - and in fact I tried converting the code myself but think it will face the same issues with being non-performant, so did not pursue this (as I never actually got it working).
What the Scalar function is doing is looking up a price for an item based on a hierarchy, and if the result is NULL then and only then performing the next lookup in the hierarchy for the item.
A scaled down version of the correct hierarchical lookup looks like this :
A) Specific Item Price for a Customer
B) Specific Item Parent Item Price for a Customer
C) Generic Item Price for a Customer
D) Specific Item Price for all Customers
E) Specific Item Parent Price for all Customers
F) Generic Item Price for all Customers
The exiting code is similar, potentially performing 6 lookups - but using a different (incorrect) hierarchy.
Pricing should always exist at D, E and F - this is Base pricing - primarily to trap the non-existence of A, B,C
which is an error.
In most circumstances the pricing exists at A, but some prices exist at B and C, which is valid and needs to
be catered for.
I can see why a Scalar function might have looked like the answer from a coding perspective as it simplifies the SQL query - at the expense of performance.
The DB contains 135 million items - there are some predicates that reduce the input to the function perhaps by half - but that still means somewhere from 75 - 450 million lookups required to me.
All I have been able to come up with to date (conceptually) is to pull all itemids with customer ids and prices into a Temp table and add a derived from column to say where in the hierarchy the price came from - applying a value or weight. Then sort them by customerid/hierarchy and somehow filter by the row that applies based on the hierarchy - maybe a 2nd Temp table (or some kind of nested Selects to do both).
Not looking for somebody to solve this for me, but rather looking for some guidance to develop my SQL coding.
Based on what I am trying to achieve and knowing (?) that Scalar function with multiple DB lookups are the wrong answer - what pointers can anybody offer as to what the right answer to how to do something like this is?
I think my proposed method will reduce the runtime from 13 hours to n minutes but still may not be the best way to do this even if it works and this requirement for an iterative lookup must be common.
Thanks in advance for any input.
May 21, 2022 at 2:21 pm
I think that knowing why and how that function is called so many times may help pointing you to a better solution.
how all those hierarchical rules are applied may also change how this can be implemented
below is a possible approach style - not necessarily the best or fastest
select ...
, coalesce(cust_prices.price, all_prices.price) as price
from master_table mt
outer apply (select top 1 case
when p.pricetype = 'specific'
then p.price
when parent.pricetype is not null
and parent.pricetype = 'specific'
then parent.price
else p.price
end
from prices p
left outer join prices parent
on parent.priceid = p.parentid
and parent.customerid = mt.customerid
where p.customerid = mt.customerid
and p.itemid = mt.itemid
order by case
when p.pricetype = 'specific'
then 1
when parent.pricetype is not null
and parent.pricetype = 'specific'
then 2
else 3
end
) cust_prices
outer apply (select top 1 case
when p.pricetype = 'specific'
then p.price
when parent.pricetype is not null
and parent.pricetype = 'specific'
then parent.price
else p.price
end
from prices p
left outer join prices parent
on parent.priceid = p.parentid
and parent.customerid is null -- how do all customers get flagged
where p.customerid is null -- how do all customers get flagged?
and p.itemid = mt.itemid
order by case
when p.pricetype = 'specific'
then 1
when parent.pricetype is not null
and parent.pricetype = 'specific'
then 2
else 3
end
) all_prices
May 21, 2022 at 2:46 pm
Thanks Frederico for the quick reply. I will work though that example to understand what it is doing (Not used Outer Apply to date). I think that the Top 1 may replace the idea of sorting / assigning a weight but just a guess. I will attempt to construct the DDL and show the data when I can figure that out.
At this point I am running some queries to identify the scale of the issue where an exact match on object/customer/price does not exist and try to break it down to verify results as I go. My first query is still running after 43 minutes - the other 2 methods return the same results after 10 seconds each. I am going to let the first query finish to see if it returns the same results.
May 22, 2022 at 4:17 am
Are Items A thru F all in a single parent-child table?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 23, 2022 at 3:05 pm
We need to see the actual lookup code to offer more specific advice here.
Proper indexing could also be critical to getting best performance for this task.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 29, 2022 at 11:21 am
Hello Jeff.
Thanks for the reply and aplogogies for not replying sooner - my account is not linked to my work email and not checked my personal emails for a while - just came back here in downtime to review Fredico's code.
The 'Price' table has links to an Object Table and the actual Monetary Value is stored in another PriceDetail Table so currently not in the same table as such but I had complemplated pulling the data into 1 Temp table and passing that to a function.
If I can get my head around this myself and what I am actually trying to do, and work out how, I will post som DDL.
Thanks
Steve O.
May 29, 2022 at 11:26 am
Hello Scott.
Thanks for the reply.
I have already determined that iterative DB lookups in a Scalar Function is not the way to do this and, so am looking at starting from scratch with the requirements so I did not see much point posting the current function and the DDL used.
Once I have something that works I can look at indexing and execution plans but at this stage I am looking to understand the proper way to do this if not using a scalar function.
If you think that there is still value in seeing the scalar function I can try to have a look for it and post it?
Steve O.
May 29, 2022 at 11:36 am
Hello again Jeff.
Re-reading your reply and perhaps I misunderstood the question.
There is 1 Price Table containing all Price records and there is 1 Price Detail Table containing all monetary values. I guess they are separate for a reason (probably because there can be a one to many realtionship between a price record and monetary value with concept of current price and dates from which the price applies).
For any customer, any of A-C may or may not exist - but D-F should exist (there are other checks for that).
What this code is trying to do is identify instances where all of A-C do not exist as in most instances (but not all) this is an error state and will be highlighted by the monetary values returned for D-F being abnormally high.
Thanks
Steve O.
May 30, 2022 at 3:30 am
I'm thinking that I'm not understanding much either. 😀
You probably don't actually mean "Hierarchical"... you just mean you have some relational tables, correct? If that's the case, and ERD might help a bit here.
Have you tried working with Frederico's code at all? How'd that work out?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2022 at 12:10 pm
As a general rule of thumb, you should not write any scalar valued functions. At all.
Even tough SQL2019 and up are evolving in this troublesome area, just don't use SVFs.
Even returning a single value as a SET, will enhance performance in most of the cases. (cross apply / outer apply are your friends)
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 30, 2022 at 3:37 pm
Hello Jeff.
Thanks for hanging in there. I have a meeting tomorrow to discuss my thoughts on this with peers.
However, your sig has always been something that was at the back of my mind (I was not aware it was yours) and what I determined was that I don't actually need to check 135 million entries. I ran a quick query to group by accountid and objectid and all I need to do is use the output from that to check for pricing - amounts to 65K rows before I add predicates.
At this stage I don't even need the monetary value - all I need to to know there is a price - even if set to £0
It doesn't matter if there are 1 or a million hits - the same rule will apply to all and the check is only requried once per account in the hierarchy.
Not sure if that is what your sig means, but it prompted me to think of the above just the same so works for me.
I have looked at whether i can use Frederico's code and still unclear - what I was doing last night was creating tmp tables with the fields I wanted in separate queries and trying to document the fields so I could generate the DLL statements as I was running test queries.
I will update with progress.
Regards
Steve O.
May 30, 2022 at 3:44 pm
Hello Johan.
I have a couple of Scalar Functions that are sub-second.
1 that I use every day passes an IP Address/Subnet in and returns a human friendly site name by scanning though the list of subnet lookups - does no DB lookups, it is all in the function (coded as and when I find New entries and nothing on TV).
I probably have a couple of hundred subnets (we have VLANs for Data, Voice, Guest etc at each of dozens of sites and are in the process of migrating networks so lots of entries during transition) and pass a few thousand IP addresses to it when refreshing the query - and doing a bunch of joins to other data tables and as I said it runs in under 1 second in SSMS.
Appreciate what you are saying, but works for me as is and may perform even better in 2019 (?).
Regards
Steve O.
May 30, 2022 at 11:29 pm
Hello Johan.
I have a couple of Scalar Functions that are sub-second.
1 that I use every day passes an IP Address/Subnet in and returns a human friendly site name by scanning though the list of subnet lookups - does no DB lookups, it is all in the function (coded as and when I find New entries and nothing on TV).
I probably have a couple of hundred subnets (we have VLANs for Data, Voice, Guest etc at each of dozens of sites and are in the process of migrating networks so lots of entries during transition) and pass a few thousand IP addresses to it when refreshing the query - and doing a bunch of joins to other data tables and as I said it runs in under 1 second in SSMS.
Appreciate what you are saying, but works for me as is and may perform even better in 2019 (?).
Regards
Steve O.
Heh... all I can say there is "famous last words" and I'll leave it at that. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
May 31, 2022 at 5:46 am
If you have 200 entries, at least use a form of binary search rather than searching sequentially thru the list. Even better would be just looking it up in a table clustered on the lookup value.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 31, 2022 at 8:52 am
Hello Johan.
I have a couple of Scalar Functions that are sub-second.
1 that I use every day passes an IP Address/Subnet in and returns a human friendly site name by scanning though the list of subnet lookups - does no DB lookups, it is all in the function (coded as and when I find New entries and nothing on TV).
I probably have a couple of hundred subnets (we have VLANs for Data, Voice, Guest etc at each of dozens of sites and are in the process of migrating networks so lots of entries during transition) and pass a few thousand IP addresses to it when refreshing the query - and doing a bunch of joins to other data tables and as I said it runs in under 1 second in SSMS.
Appreciate what you are saying, but works for me as is and may perform even better in 2019 (?).
Regards
Steve O.
We are currently experiencing HUGE cpu consumption due to SVF usage on a bit larger sets with SQLServer 2019 CU14/15/16.
Here's a sample SVF ( noting to it, right ? )
( just don't ask "Why"!)
CREATE FUNCTION [dbo].[F_ISWOSTATUSWAITPOSSIBLE] (
@WO_StatusVARCHAR(2)
)
RETURNS BIT
AS
BEGIN
DECLARE @ISWOSTATUSWAITPOSSIBLEBIT;
SELECT @ISWOSTATUSWAITPOSSIBLE = (CASE WHEN @WO_Status in ('30', '40', '50', '60', '65', '70') THEN 1 ELSE 0 END);
RETURN @ISWOSTATUSWAITPOSSIBLE
END
This function has these settings in sys.modules : is_inlineable = 1 / inline_type = 1
Queries now hitting timeouts ! ( +30sec )
Here's the inline table value function that replaces it
CREATE FUNCTION [dbo].[tvF_ISWOSTATUSWAITPOSSIBLE_JoBi] (
@WO_StatusVARCHAR(2)
)
RETURNS Table
AS
return (
Select convert(bit, CASE WHEN @WO_Status in ('30', '40', '50', '60', '65', '70') THEN 1
ELSE 0
END) as ISWOSTATUSWAITPOSSIBLE
)
GO
Same queries (only modified to use cross apply with the itvf ) now run very stable consuming < 100ms
We have been warned ever since SQL2008 by MS to replace SVFs to (i)TVFs for performance reasons.
So, for 14 years these functions have been consuming below any alert level, right?
Now it is total panic to get this fixed by yesterday.
Edited:
We managed to turn things back to "normal", performing
ALTER DATABASE SCOPED CONFIGURATION set LEGACY_CARDINALITY_ESTIMATION = ON -- Default = OFF
This should be avoided as much as possible !
( it's like driving a Ferrari but shutting off most of its marvels )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply