Comparing 2 functions for any differences in the result set.

  • Recently, we've done some optimizations to the functions and now the BI Manager has tasked me with a request to compare the result of the 2 functions as below making sure the row counts match, along with the exact values of each field. In the same request he is expecting this to be a while loop that will loop all the YTD ranges dating back 20 years. (not sure if this is really practical in order to test the validity). However, the while loop is not a mandate for the same. Is there a way to compare the result of the 2 functions without hard coding the ranges and automating the same like for e.g declaring a variable and then based on the date range calculating the next date range etc without the user intervention.

    Last but not the least a script that would validate there are no differences in the results of both the functions which means that the optimization did work. I am thinking of using except clause but I am not sure if it would work with functions. Also, if that won't work what else would be a possible way of doing it.

    I would really appreciate for all the help. It is a big impact for the company if I could complete this task. However, I am not even sure if this is something that should be even handled by a DBA.

    Functions
    Original Function:  report.fnReportDealCore_original(@StartDate, @EndDate, -1)
    New Optimized Function: report.fnReportDealCore(@StartDate, @EndDate, -1)

    Example of Run Dates dating back 20 years
    1.       1/1/1997 – 1/31/2997
    2.       2/1/1997 – 2/28/1997
    3.       3/1/1997 – 3/31/1997
    4.       ….
    5.       1/1/1998 – 1/31/1998
    6.       2/1/1988 – 2/28/1998
    7.      
    8.       Through 9/30/2017

    Thanks
     

  • That's not entirely outside of a DBA's duties.   I guess the question is why would you need to test each individual data range over the last 20 years?   Why not just do a one time comparison across a 20 year span e.g. @StartDate = '1997-01-01', @EndDate = '2017-09-30' ?   However you do it, there are a couple of ways.   You can use EXCEPT or INTERSECT.   Both ways mean you'll end up doing something twice.  In the case of EXCEPT, you would have to see that the results of SELECT * from the original function with the date range EXCEPT the results of SELECT * from the new function produces zero rows, and doing the same thing just reversing the order of the queries between the EXCEPT keyword also produces zero rows.   In the case of INTERSECT, you need to see that the results for both queries as used with EXCEPT but instead used with INTERSECT, return identical results to what comes out of both queries individually, so I suspect that using EXCEPT is more practical.

    A calendar table would certainly help immensely if you were going to down the WHILE loop road, however.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • ffarouqi - Thursday, October 5, 2017 8:47 AM

    Recently, we've done some optimizations to the functions and now the BI Manager has tasked me with a request to compare the result of the 2 functions as below making sure the row counts match, along with the exact values of each field. In the same request he is expecting this to be a while loop that will loop all the YTD ranges dating back 20 years. (not sure if this is really practical in order to test the validity). However, the while loop is not a mandate for the same. Is there a way to compare the result of the 2 functions without hard coding the ranges and automating the same like for e.g declaring a variable and then based on the date range calculating the next date range etc without the user intervention.

    Last but not the least a script that would validate there are no differences in the results of both the functions which means that the optimization did work. I am thinking of using except clause but I am not sure if it would work with functions. Also, if that won't work what else would be a possible way of doing it.

    I would really appreciate for all the help. It is a big impact for the company if I could complete this task. However, I am not even sure if this is something that should be even handled by a DBA.

    Functions


    Original Function:  report.fnReportDealCore_original(@StartDate, @EndDate, -1)

    New Optimized Function: report.fnReportDealCore(@StartDate, @EndDate, -1)

    Example of Run Dates dating back 20 years
    1.       1/1/1997 – 1/31/2997

    2.       2/1/1997 – 2/28/1997

    3.       3/1/1997 – 3/31/1997

    4.       ….

    5.       1/1/1998 – 1/31/1998

    6.       2/1/1988 – 2/28/1998

    7.       …

    8.       Through 9/30/2017

    Thanks

     

    Are these scalar or table functions?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares - Thursday, October 5, 2017 9:44 AM

    ffarouqi - Thursday, October 5, 2017 8:47 AM

    Recently, we've done some optimizations to the functions and now the BI Manager has tasked me with a request to compare the result of the 2 functions as below making sure the row counts match, along with the exact values of each field. In the same request he is expecting this to be a while loop that will loop all the YTD ranges dating back 20 years. (not sure if this is really practical in order to test the validity). However, the while loop is not a mandate for the same. Is there a way to compare the result of the 2 functions without hard coding the ranges and automating the same like for e.g declaring a variable and then based on the date range calculating the next date range etc without the user intervention.

    Last but not the least a script that would validate there are no differences in the results of both the functions which means that the optimization did work. I am thinking of using except clause but I am not sure if it would work with functions. Also, if that won't work what else would be a possible way of doing it.

    I would really appreciate for all the help. It is a big impact for the company if I could complete this task. However, I am not even sure if this is something that should be even handled by a DBA.

    Functions


    Original Function:  report.fnReportDealCore_original(@StartDate, @EndDate, -1)

    New Optimized Function: report.fnReportDealCore(@StartDate, @EndDate, -1)

    Example of Run Dates dating back 20 years
    1.       1/1/1997 – 1/31/2997

    2.       2/1/1997 – 2/28/1997

    3.       3/1/1997 – 3/31/1997

    4.       ….

    5.       1/1/1998 – 1/31/1998

    6.       2/1/1988 – 2/28/1998

    7.       …

    8.       Through 9/30/2017

    Thanks

     

    Are these scalar or table functions?

    Table

  • sgmunson - Thursday, October 5, 2017 9:08 AM

    That's not entirely outside of a DBA's duties.   I guess the question is why would you need to test each individual data range over the last 20 years?   Why not just do a one time comparison across a 20 year span e.g. @StartDate = '1997-01-01', @EndDate = '2017-09-30' ?   However you do it, there are a couple of ways.   You can use EXCEPT or INTERSECT.   Both ways mean you'll end up doing something twice.  In the case of EXCEPT, you would have to see that the results of SELECT * from the original function with the date range EXCEPT the results of SELECT * from the new function produces zero rows, and doing the same thing just reversing the order of the queries between the EXCEPT keyword also produces zero rows.   In the case of INTERSECT, you need to see that the results for both queries as used with EXCEPT but instead used with INTERSECT, return identical results to what comes out of both queries individually, so I suspect that using EXCEPT is more practical.

    A calendar table would certainly help immensely if you were going to down the WHILE loop road, however.

    Thank you for your suggestion. However, do you have a script or anything similar that I can use as a while loop or any other automated way of comparing the range of dates dating 20 years etc. I really appreciate your help on this.

  • This would generate a test script for all dates from 20 years ago till today. It uses a YTD range.

    DECLARE @Start datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    WITH
    E(n) AS(
      SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
    ),
    E2(n) AS(
      SELECT a.n FROM E a, E b
    ),
    E4(n) AS(
      SELECT a.n FROM E2 a, E2 b
    ),
    cteCalendar(calDate) AS(
      SELECT TOP (DATEDIFF( DD, @Start, GETDATE()) + 1)
       DATEADD( DD, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @Start) calDate
      FROM E4
    )
    SELECT REPLACE(REPLACE('IF EXISTS( (SELECT * FROM report.fnReportDealCore_original( @StartDate, @EndDate, -1)
           EXCEPT
           SELECT * FROM report.fnReportDealCore( @StartDate, @EndDate, -1))
           UNION ALL
           (SELECT * FROM report.fnReportDealCore( @StartDate, @EndDate, -1)
           EXCEPT
           SELECT * FROM report.fnReportDealCore_original( @StartDate, @EndDate, -1)))
        SELECT ''Differences'''
        , '@StartDate', '''' + CAST( YEAR(calDate) AS char(4)) + '0101''')
        , '@EndDate', '''' + CONVERT( char(8), calDate, 112) + '''')
    FROM cteCalendar;

    For month by month ranges, you could try this calculation and change the previous script.

    DECLARE @Start datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    WITH
    E(n) AS(
      SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
    ),
    E2(n) AS(
      SELECT a.n FROM E a, E b
    ),
    E4(n) AS(
      SELECT a.n FROM E2 a, E2 b
    ),
    cteCalendar(calDate) AS(
      SELECT TOP (DATEDIFF( MM, @Start, GETDATE()) + 1)
       DATEADD( MM, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @Start) calDate
      FROM E4
    )
    SELECT calDate AS MonthStart,
      DATEADD( MM, 1, calDate) - 1 AS MonthEnd
    FROM cteCalendar;  

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • My take on it is that "It Depends".  If it was an optimization made by the DBAs, then the DBAs are responsible for proving identical functionality and results.  As a bit of a side bar, it probably took longer to write the original post on this thread than it would take to prove the two functions produce the identical results.  Get crackin' 😉

    If someone else made the changes, let them prove the QA on it.  Just keep in mind that anything that goes to prod should still be reviewed by the DBA for possible security risks (not likely in this case) and performance issues.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This migh also work.

    DECLARE @Start datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    WITH
    E(n) AS(
      SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
    ),
    E2(n) AS(
      SELECT a.n FROM E a, E b
    ),
    E4(n) AS(
      SELECT a.n FROM E2 a, E2 b
    ),
    cteCalendar(calDate) AS(
      SELECT TOP (DATEDIFF( DD, @Start, GETDATE()) + 1)
       DATEADD( DD, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @Start) calDate
      FROM E4
    )
    SELECT CAST( CAST( YEAR(calDate) AS char(4)) AS datetime),
       calDate
    FROM cteCalendar
    WHERE EXISTS((SELECT * FROM report.fnReportDealCore_original( CAST( YEAR(calDate) AS char(4)), calDate, -1)
           EXCEPT
           SELECT * FROM report.fnReportDealCore( CAST( YEAR(calDate) AS char(4)), calDate, -1))
           UNION ALL
           (SELECT * FROM report.fnReportDealCore( CAST( YEAR(calDate) AS char(4)), calDate, -1)
           EXCEPT
           SELECT * FROM report.fnReportDealCore_original( CAST( YEAR(calDate) AS char(4)), calDate, -1))) ;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • BTW... please post the 2 functions.  I'm always interested in reporting functions and have some old "Black Arts" tricks that might provide additional performance benefits.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Luis - Please review the code below - will this perform better from your code -

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (
    Select @st as dte
    union all
    Select dte+1 as dte2 from cte where dte<getdate()
    )

    select *,min(dte)over (partition by year(dte),month(dte)) as MinDate ,max(dte)over (partition by year(dte),month(dte)) as MAXDate
    from
    cte
    order by dte
    option(maxrecursion 0)

    First solve the problem then write the code !

  • Here is something strange i found   - Can anyone help me explain why is this that the scan counts are less in the below code

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (
    Select @st as dte
    union all
    Select dte+1 as dte2 from cte where dte<getdate()
    )

    select *,min(dte)over (partition by year(dte),month(dte)) as MinDate ,max(dte)over (partition by year(dte),month(dte)) as MAXDate
    from
    cte
    order by dte
    option(maxrecursion 0)
    go

    /*
    Table 'Worktable'. Scan count 8, logical reads 77860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 141 ms, elapsed time = 384 ms.

    */

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (
    Select @st as dte,year(@st) as yr,month(@st) as mnth
    union all
    Select dte+1 as dte2,year(dte),month(dte) from cte where dte<getdate()
    )

    select *,min(dte)over (partition by yr,mnth) as MinDate ,max(dte)over (partition by yr,mnth) as MAXDate
    from
    cte
    order by dte
    option(maxrecursion 0)

    /*
    Table 'Worktable'. Scan count 5, logical reads 61686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 156 ms, elapsed time = 224 ms.

    */

    First solve the problem then write the code !

  • TheCTEGuy - Friday, October 6, 2017 4:53 AM

    Hey Luis - Please review the code below - will this perform better from your code -

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (

    ; is a terminator, not an initiator.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin - Friday, October 6, 2017 6:43 AM

    TheCTEGuy - Friday, October 6, 2017 4:53 AM

    Hey Luis - Please review the code below - will this perform better from your code -

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (

    ; is a terminator, not an initiator.

    Are you trying to say that the code should like the below code ?


    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    with cte as (
    Select @st as dte,year(@st) as yr,month(@st) as mnth
    union all
    Select dte+1 as dte2,year(dte),month(dte) from cte where dte<getdate()
    )

    select *,min(dte)over (partition by yr,mnth) as MinDate ,max(dte)over (partition by yr,mnth) as MAXDate
    from
    cte
    order by dte
    option(maxrecursion 0)

    First solve the problem then write the code !

  • TheCTEGuy - Friday, October 6, 2017 6:50 AM

    Phil Parkin - Friday, October 6, 2017 6:43 AM

    TheCTEGuy - Friday, October 6, 2017 4:53 AM

    Hey Luis - Please review the code below - will this perform better from your code -

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (

    ; is a terminator, not an initiator.

    Are you trying to say that the code should like the below code ?


    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    with cte as (
    Select @st as dte,year(@st) as yr,month(@st) as mnth
    union all
    Select dte+1 as dte2,year(dte),month(dte) from cte where dte<getdate()
    )

    select *,min(dte)over (partition by yr,mnth) as MinDate ,max(dte)over (partition by yr,mnth) as MAXDate
    from
    cte
    order by dte
    option(maxrecursion 0)

    Pretty much. Starting CTEs with a semicolon is unnecessary (and may mislead others who read your code into thinking that it is the correct way of doing things) if you properly terminate the rest of your code.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • TheCTEGuy - Friday, October 6, 2017 4:53 AM

    Hey Luis - Please review the code below - will this perform better from your code -

    declare @st datetime = DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 20, 0);

    ;with cte as (
    Select @st as dte
    union all
    Select dte+1 as dte2 from cte where dte<getdate()
    )

    select *,min(dte)over (partition by year(dte),month(dte)) as MinDate ,max(dte)over (partition by year(dte),month(dte)) as MAXDate
    from
    cte
    order by dte
    option(maxrecursion 0)

    Can you prove that? I would but won't have access to my laptop until Monday.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply