Performance issue on a udf - Better way to rewrite?

  • Hi all,

    I have a function that needs to work to export some numeric values to an old Cobol system.

    For those of you familiar with the Format, I need to export the price (e.g. 123.45) as a 9(8)V99 format.

    In example, the value 123.45 would give 0000012345, since we want 8 digits for the numeric part, and 2 for the decimal part. We always have to pad with 0's, for the Cobol format, and need to remove the decimal separator.

    I have come up with this function, and test data, (10K rows), but it does not scale well, it takes about 7-10 seconds for 10K rows, and I'm looking to extract roughly 1 million weekly, so that tends to take a long time.

    Anywayz, if anyone wants to take a shot a it, I'd really appreciate. Here's the code:

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'[dbo].[fnGICSPFParseNumericForCobol]')

    AND type IN ( N'FN', N'IF', N'TF', N'FS', N'FT' ) )

    DROP FUNCTION [dbo].[fnGICSPFParseNumericForCobol]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION fnGICSPFParseNumericForCobol

    (

    @InputNumber NUMERIC(38, 15) ,

    @IntegerPart INT ,

    @DecimalPart INT

    )

    RETURNS VARCHAR(50)

    AS

    BEGIN

    DECLARE @StrIntegerPart VARCHAR(50) -- Integerpart in String

    DECLARE @StrDecimalPart VARCHAR(50) -- Decimalpart in String

    DECLARE @StrInputNumber VARCHAR(50) -- InputNumber in string

    SET @StrInputNumber = CONVERT(VARCHAR(50), @InputNumber) -- Parse the number to make a string.

    -- Substring the integer part of the number and

    -- trim to the desired amount of characters (@IntegerPart)

    SET @StrIntegerPart = RIGHT(SUBSTRING(@StrInputNumber, 0,

    PATINDEX('%[.,]%',

    @StrInputNumber)),

    @IntegerPart)

    -- Pad with '0' to fill the number in the desired value.

    -- i.e. 12.123 -> 9(04)V9(03) = '0012.123' = 0012123

    WHILE LEN(@StrIntegerPart) < @IntegerPart

    BEGIN

    SET @StrIntegerPart = '0' + @StrIntegerPart

    END

    -- Get the decimal part of the InputNumber. Then fill with '0' at the right.

    SET @StrDecimalPart = LEFT(SUBSTRING(@StrInputNumber,

    PATINDEX('%[.,]%',

    @StrInputNumber) + 1,

    LEN(@StrInputNumber)),

    @DecimalPart)

    -- Return Integer + Decimal parts.

    RETURN @StrIntegerPart

    + @StrDecimalPart

    END

    GO

    --=== Drop the table if it exists

    IF OBJECT_ID('TestFormat') IS NOT NULL

    DROP TABLE TestFormat ;

    --=== Create the test Table for performance testing

    CREATE TABLE TestFormat

    (

    ID INT IDENTITY(1, 1) ,

    Val1 NUMERIC(10, 2) ,

    val2 NUMERIC(10, 2)

    )

    --=== Create 10K rows of numeric data, for performance testing

    INSERT INTO TestFormat

    ( Val1 ,

    Val2

    )

    SELECT TOP 10000

    CAST(ABS(CHECKSUM(NEWID()) % 1000000000000 / 100.0) AS DECIMAL(10,

    2)) ,

    CAST(ABS(CHECKSUM(NEWID()) % 1000000000000 / 100.0) AS DECIMAL(10,

    2))

    FROM Master.dbo.SysColumns sc1

    CROSS JOIN Master.dbo.SysColumns sc2

    SET STATISTICS TIME ON ;

    --=== Try the poor performing function, on 10K Rows

    SELECT Val1,

    dbo.fnGICSPFParseNumericForCobol(Val1, 8, 2) AS ComputedVal1,

    Val2,

    dbo.fnGICSPFParseNumericForCobol(Val2, 6, 2) AS ComputedVal2

    FROM TestFormat

    SET STATISTICS TIME OFF ;

    Thanks in advance, There must be a way to use a Cte, or Inline function, to replace this Scalar one, but I can't find it.

    Cheers,

    J-F

  • J-F if possilbe, can you get rid of the function and just add a persisted calculated column instead? persisted stores the value in the data pages, and would be much faster to access;

    would this work for you?

    typical results:

    ID Val1 val2 Val1_formatted Val2_formatted

    --- ---- ---- ------------ ------------- -------------- --------------

    10 10 1 21471601.93 15630264.86 2147160193 63026486

    10 10 2 5960743.79 3111563.78 0596074379 11156378

    10 10 3 3246489.88 2933431.45 0324648988 93343145

    10 10 4 16442183.35 13033996.38 1644218335 03399638

    10 10 5 4365166.84 3532365.53 0436516684 53236553

    the change:

    ALTER TABLE TestFormat ADD Val1_formatted AS RIGHT('0000000000' + CONVERT(VARCHAR,CONVERT(INT,Val1 * 100)),10) PERSISTED

    ALTER TABLE TestFormat ADD Val2_formatted AS RIGHT('00000000' + CONVERT(VARCHAR,CONVERT(INT,Val2 * 100)),8) PERSISTED

    SELECT len(Val1_formatted),len(Val1_formatted),* FROM TestFormat

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Even if he can't add persisted columns... if I run a straight select and then a select with your computations it runs in the exact same time over 1M records (1 sec on a low end work station WITH XP SP3 AND SQL 2008 SP1 Standard).

    I agree that I'd add the persisted columns as pers choice, but for speed only, it does not make that much of a difference with your code.

  • Not really, that is not an option, since the function is used across the system, everytime we need an output for cobol. It is in many tables, and sometimes the number of decimals/digits change, that is the challenge with the function...

    Cheers,

    J-F

  • Ninja's_RGR'us (1/6/2010)


    Even if he can't add persisted columns... if I run a straight select and then a select with your computations it runs in the exact same time over 1M records (1 sec on a low end work station WITH XP SP3 AND SQL 2008 SP1 Standard).

    I agree that I'd add the persisted columns as pers choice, but for speed only, it does not make that much of a difference with your code.

    Well, how do you handle it when I need 6 digits, and 4 decimals...

    Eg. 100.35 would give 0001003500

    You need the leading AND trailing zeros to fit the requirement, and remove the demiliter.. that is what I have trouble with! 😉

    Cheers,

    J-F

  • Going in a meeting, i'll be back in a few, if you have any other suggestions, i'd greatly appreciate!

    Cheers,

    J-F

  • EDIT: HOLD ON, I think I have a problem!!!

    2nd EDIT: Actually I don't, it was the data, it was getting truncated.

    The test data is all NUMERIC(10,2) so when I rechecked my results the values didn't match. When I changed the test query to use

    10, 2 instead of 8, 2 and 6, 2 the values looked good.

    Since it appears you are using SQL Server 2005, try this:

    CREATE FUNCTION dbo.fnGICSPFParseNumericForCobol

    (

    @InputNumber NUMERIC(38, 15) ,

    @precision INT ,

    @Scale INT

    )

    RETURNS TABLE

    AS return (select right(replicate('0', @precision) + cast(cast(@InputNumber * power(10,@Scale) as int) as varchar(64)), @precision) as CobolData)

    GO

    SET STATISTICS TIME ON ;

    --=== Try the poor performing function, on 10K Rows

    SELECT

    tf.Val1,

    parse1.CobolData AS ComputedVal1,

    tf.Val2,

    parse2.CobolData AS ComputedVal2

    FROM

    dbo.TestFormat tf

    cross apply dbo.fnGICSPFParseNumericForCobol(tf.Val1, 8, 2) parse1

    cross apply dbo.fnGICSPFParseNumericForCobol(tf.Val2, 6, 2) parse2

    SET STATISTICS TIME OFF ;

    On my server here at work against 10,000 rows:

    (10000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 207 ms.

  • Lynn, would schema binding that function make any difference to the performance? Could it be modified to a scalar UDF and would that make it perform any differently?

    Just questions - I guess I could test it myself 😉

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Not really sure. I just sort of stumbled on to this method of converting data. We used an scaler UDF to convert dates to a specified format for an export. I found out that turning it into an in-line TVF actually sped the export up significantly. At this time, I really can't explain why.

  • To continue, I did write a blog about it Comparing Hardcoded functions, In-Line TVF's, and Scalar Functions

  • Just finished a quick test - converting to a scalar UDF took the following on my system:

    SQL Server Execution Times:

    CPU time = 984 ms, elapsed time = 1044 ms.

    So, using Lynn's approach is going to be the best - I think.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Lynn Pettis (1/6/2010)


    EDIT: HOLD ON, I think I have a problem!!!

    2nd EDIT: Actually I don't, it was the data, it was getting truncated.

    The test data is all NUMERIC(10,2) so when I rechecked my results the values didn't match. When I changed the test query to use

    10, 2 instead of 8, 2 and 6, 2 the values looked good.

    Since it appears you are using SQL Server 2005, try this:

    CREATE FUNCTION dbo.fnGICSPFParseNumericForCobol

    (

    @InputNumber NUMERIC(38, 15) ,

    @precision INT ,

    @Scale INT

    )

    RETURNS TABLE

    AS return (select right(replicate('0', @precision) + cast(cast(@InputNumber * power(10,@Scale) as int) as varchar(64)), @precision) as CobolData)

    GO

    SET STATISTICS TIME ON ;

    --=== Try the poor performing function, on 10K Rows

    SELECT

    tf.Val1,

    parse1.CobolData AS ComputedVal1,

    tf.Val2,

    parse2.CobolData AS ComputedVal2

    FROM

    dbo.TestFormat tf

    cross apply dbo.fnGICSPFParseNumericForCobol(tf.Val1, 8, 2) parse1

    cross apply dbo.fnGICSPFParseNumericForCobol(tf.Val2, 6, 2) parse2

    SET STATISTICS TIME OFF ;

    On my server here at work against 10,000 rows:

    (10000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 207 ms.

    Lynn, this is more than what I was looking for. I'm really surprised you've come with a solution that works with the parametrised leading/trailing zeros. The trailing zeroes gave me such a hard time to think of!

    I will test your solution a bit more, but this looks exactly like what I need!

    Thanks a lot, this is really nice.

    I guess that's what happens when you provide sample data, 😉

    Cheers,

    J-F

  • Previous employer, Retail System was COBOL. When we started using SQL Server for reporting and other activities, we found it easier to complete updates on the SQL side then simply update the ISAM databases for the application through simple COBOL update programs instead of writing the updates in COBOL to update the system. The later would take 2 weeks to complete a price change (for instance) and using SQL we could do it in a couple of hours.

  • Ya beat me to it on the POWER thing. Nicely done, Lynn.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lynn Pettis (1/6/2010)


    To continue, I did write a blog about it Comparing Hardcoded functions, In-Line TVF's, and Scalar Functions

    Nice... I'm curious... why blog these instead of submitting them as articles? Articles don't have to be long to be effective. This would have been a perfect article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply