Omitting 'DISTINCT'

  • I have the following code which is returning the required result, however I think I am coding it wrong because it runs really slow on large data-sets and the hunch I have is I need to omit DISTINCT, even though it gives me what I want.

    IF OBJECT_ID('[tempdb].dbo.tb_p', 'U') IS NOT NULL DROP TABLE #n

    GO

    CREATE TABLE [dbo].[#n]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL,[identity_2] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    GO

    IF OBJECT_ID('[tempdb].dbo.tb_p', 'U') IS NOT NULL DROP TABLE #t

    GO

    CREATE TABLE [dbo].[#t]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL,[identity_2] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    GO

    IF OBJECT_ID('[tempdb].dbo.tb_p', 'U') IS NOT NULL DROP TABLE #k

    GO

    CREATE TABLE [dbo].[#k]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    GO

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',5,2)

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',8,-11)

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',11,7)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',3,4)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',1,51)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-4,23)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',8,-7)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',5,-18)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',1,11)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',0,40)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',7,6)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-9,4)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',3,8)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',11,5)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',10,1)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-7,3)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-9,9)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-3,2)

    INSERT INTO #k ([timing],[identity_1]) VALUES ('2015-09-30',7)

    INSERT INTO #k ([timing],[identity_1]) VALUES ('2015-09-29',1)

    DECLARE @set AS INT = 3;

    SELECT DISTINCT --omit this

    SUM((n.identity_1-t.identity_1)*k.identity_1) OVER (ORDER BY n.timing DESC)+

    SUM((n.identity_2-t.identity_2)*k.identity_1) OVER (ORDER BY n.timing DESC)

    FROM

    (SELECT *, ROW_NUMBER() OVER (ORDER BY [timing] DESC) AS r FROM #n) AS n JOIN

    (SELECT *, ROW_NUMBER() OVER (ORDER BY [timing] DESC) AS r FROM #t) AS t ON IIF(n.r=@set,0,n.r)=t.r%@set JOIN

    (SELECT * FROM #k) AS k ON t.timing = k.timing

    Is there a better way to code this?

    You're all familiar with 'there's an app for that'... here's another... there's a CTE for that! 🙂
  • How would I use CROSS APPLY in this case?

    You're all familiar with 'there's an app for that'... here's another... there's a CTE for that! 🙂
  • Don't recode on a hunch??. Look at the astimated execution plan! Given that I too do not have that plan, and given your inserted data is not the large amount you mentioned, I can only suggest you first carefully consider how the JOIN upon the IIF function is being handled in that plan, and its cost when compared to the plan's total cost.

  • sql_only (12/25/2015)


    I have the following code which is returning the required result, however I think I am coding it wrong because it runs really slow on large data-sets and the hunch I have is I need to omit DISTINCT, even though it gives me what I want.

    IF OBJECT_ID('[tempdb].dbo.tb_p', 'U') IS NOT NULL DROP TABLE #n

    GO

    CREATE TABLE [dbo].[#n]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL,[identity_2] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    GO

    IF OBJECT_ID('[tempdb].dbo.tb_p', 'U') IS NOT NULL DROP TABLE #t

    GO

    CREATE TABLE [dbo].[#t]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL,[identity_2] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    GO

    IF OBJECT_ID('[tempdb].dbo.tb_p', 'U') IS NOT NULL DROP TABLE #k

    GO

    CREATE TABLE [dbo].[#k]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    GO

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',5,2)

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',8,-11)

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',11,7)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',3,4)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',1,51)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-4,23)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',8,-7)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',5,-18)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',1,11)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',0,40)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',7,6)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-9,4)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',3,8)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',11,5)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',10,1)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-7,3)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-9,9)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-3,2)

    INSERT INTO #k ([timing],[identity_1]) VALUES ('2015-09-30',7)

    INSERT INTO #k ([timing],[identity_1]) VALUES ('2015-09-29',1)

    DECLARE @set AS INT = 3;

    SELECT DISTINCT --omit this

    SUM((n.identity_1-t.identity_1)*k.identity_1) OVER (ORDER BY n.timing DESC)+

    SUM((n.identity_2-t.identity_2)*k.identity_1) OVER (ORDER BY n.timing DESC)

    FROM

    (SELECT *, ROW_NUMBER() OVER (ORDER BY [timing] DESC) AS r FROM #n) AS n JOIN

    (SELECT *, ROW_NUMBER() OVER (ORDER BY [timing] DESC) AS r FROM #t) AS t ON IIF(n.r=@set,0,n.r)=t.r%@set JOIN

    (SELECT * FROM #k) AS k ON t.timing = k.timing

    Is there a better way to code this?

    Dunno... I don't have the time to analyze someone's code to figure out what it's doing? What does the code do?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sql_only (12/25/2015)


    Is there a better way to code this?

    Yes there is, not only in terms of performance but it's also more legible and maintainable

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID('tempdb..#n') IS NOT NULL DROP TABLE #n;

    CREATE TABLE [dbo].[#n]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL,[identity_2] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;

    CREATE TABLE [dbo].[#t]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL,[identity_2] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    IF OBJECT_ID('tempdb..#k') IS NOT NULL DROP TABLE #k;

    CREATE TABLE [dbo].[#k]([timing][datetime] NOT NULL,[identity_1] [decimal](18, 0) NOT NULL) ON [PRIMARY]

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',5,2)

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',8,-11)

    INSERT INTO #n ([timing],[identity_1],[identity_2]) VALUES ('2015-10-01',11,7)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',3,4)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',1,51)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-4,23)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',8,-7)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',5,-18)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',1,11)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',0,40)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',7,6)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-9,4)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',3,8)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',11,5)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',10,1)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-7,3)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-9,9)

    INSERT INTO #t ([timing],[identity_1],[identity_2]) VALUES ('2015-09-30',-3,2)

    INSERT INTO #k ([timing],[identity_1]) VALUES ('2015-09-30',7)

    INSERT INTO #k ([timing],[identity_1]) VALUES ('2015-09-29',1)

    ;WITH N_INFO AS

    (

    SELECT

    COUNT(*) AS N_COUNT

    FROM #n

    )

    ,THE_N_SET AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    ORDER BY N.timing

    ) AS N_RID

    ,N.timing

    ,N.identity_1

    ,N.identity_2

    FROM #n N

    )

    ,BASE_DATA AS

    (

    SELECT

    ISNULL(NULLIF(ROW_NUMBER() OVER

    (

    ORDER BY T.timing

    ) % NI.N_COUNT,0),NI.N_COUNT) AS T_RID

    ,T.timing

    ,T.identity_1 AS T_identity_1

    ,T.identity_2 AS T_identity_2

    ,K.identity_1 AS K_identity_1

    FROM #t T

    INNER JOIN #k K

    ON T.timing = K.timing

    CROSS APPLY N_INFO NI

    )

    SELECT

    SUM(

    (TNS.identity_1 - BD.T_identity_1) * BD.K_identity_1

    + (TNS.identity_2 - BD.T_identity_2) * BD.K_identity_1

    ) AS OUTPUT_VALUE

    FROM BASE_DATA BD

    INNER JOIN THE_N_SET TNS

    ON BD.T_RID = TNS.N_RID;

    Result

    OUTPUT_VALUE

    -343

    Edit: Replaced a local variable with a CTE, close to 50% improvement in performance;-)

  • Yes there is, not only in terms of performance but it's also more legible and maintainable

    Edit: Replaced a local variable with a CTE, close to 50% improvement in performance;-)

    Thank you Sir. 🙂

    You're all familiar with 'there's an app for that'... here's another... there's a CTE for that! 🙂

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply