Best way to find certain records.

  • Jeff Moden (5/18/2013)


    The key is to not put columns into functions so that indexes can actually use them as part of a seek.

    Hi Jeff...appreciate you comment.

    however been playing around...always dangerous for me to get too deep into analytics 😀

    running some test code on a 10M row table your version is slower than mine...see code below..(results in comments).

    running on 1M row table....code was on a par.

    maybe its the way I have the indexes...but would be keen to see your comments if you have time to try the code.

    Always keen to understand my misunderstandings :hehe:

    USE [tempdb]

    GO

    --select @@version

    --=================================================================================================

    --Version

    --Microsoft SQL Server 2012 - 11.0.2218.0 (Intel X86)

    --Enterprise Evaluation Edition on Windows NT 6.1 <X86> (Build 7601: Service Pack 1)

    --=================================================================================================

    -- set up test harness with 10M rows...takes around 2 minutes on my laptop

    --=================================================================================================

    IF object_id('dbo.TPC') IS NOT NULL

    BEGIN

    DROP TABLE dbo.TPC;

    END;

    CREATE TABLE [dbo].[TPC](

    [ClientID] [int] NOT NULL,

    [ActivityID] [int] NOT NULL,

    [CreateDate] [datetime] NOT NULL

    ) ON [PRIMARY]

    GO

    INSERT INTO [dbo].[TPC]

    ([ClientID]

    ,[ActivityID]

    ,[CreateDate])

    SELECT TOP 10000000 ---- NOTE 10 MILLION rows ...

    ClientID = CAST(Abs(Checksum(Newid()) % 9000 + 1) AS INT),

    ActivityID = CAST(Abs(Checksum(Newid()) % 90 + 1) AS INT),

    CreateDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2011', Getdate()), '2011')

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    GO

    --=================================================================================================

    --create indexes...these delivered the fastest response I could get for JLS code

    -- any ideas to improve performance...greatfully received <grin>

    --=================================================================================================

    CREATE CLUSTERED INDEX [cix_JLS] ON [dbo].[TPC]

    ([ClientID] ASC,[ActivityID] ASC,[CreateDate] ASC)

    CREATE NONCLUSTERED INDEX [nix_JLS] ON [dbo].[TPC]

    ([ActivityID] ASC,[CreateDate] ASC)

    -- setup end

    --=================================================================================================

    --code variants below

    --=================================================================================================

    -- JLS code...

    SET STATISTICS TIME , IO ON;

    SELECT DISTINCT ClientId

    FROM TPC

    WHERE ClientId

    NOT IN (SELECT DISTINCT ClientId

    FROM TPC

    WHERE (ActivityId = 72) AND (DATEDIFF ( d , CreateDate , GETDATE ( )) < 60) )

    AND ClientId

    NOT IN (SELECT DISTINCT ClientId

    FROM TPC

    WHERE (ActivityId = 33) AND (DATEDIFF ( d , CreateDate , GETDATE ( )) < 90) )

    AND ClientId

    NOT IN (SELECT DISTINCT ClientId

    FROM TPC

    WHERE (ActivityId = 34) AND (DATEDIFF ( d , CreateDate , GETDATE ( )) < 120) );

    SET STATISTICS TIME , IO OFF;

    --=================================================================================================

    --JLS RESULTS for 10M row table

    --(193 row(s) affected)

    --Table 'TPC'. Scan count 4, logical reads 32027, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- SQL Server Execution Times:

    -- CPU time = 1545 ms, elapsed time = 1545 ms.

    --=================================================================================================

    --JLS RESULTS for 1M row table

    --(6118 row(s) affected)

    --Table 'TPC'. Scan count 4, logical reads 3216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- SQL Server Execution Times:

    -- CPU time = 250 ms, elapsed time = 250 ms.

    --=================================================================================================

    --DBCC freeproccache

    --=================================================================================================

    -- JModen code ..

    SET STATISTICS TIME , IO ON;

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ClientId NOT IN (SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 72 AND CreateDate > GETDATE()-60)

    AND ClientId NOT IN (SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 33 AND CreateDate > GETDATE()-90)

    AND ClientId NOT IN (SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 34 AND CreateDate > GETDATE()-120)

    SET STATISTICS TIME , IO OFF;

    --=================================================================================================

    --JModen RESULTS for 10M row table

    --(193 row(s) affected)

    --Table 'TPC'. Scan count 1208400, logical reads 3659573, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- SQL Server Execution Times:

    -- CPU time = 4087 ms, elapsed time = 4105 ms.

    --=================================================================================================

    --JModen RESULTS for 1M row table

    --(6118 row(s) affected)

    --Table 'TPC'. Scan count 4, logical reads 3134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- SQL Server Execution Times:

    -- CPU time = 250 ms, elapsed time = 290 ms.

    --=================================================================================================

    --suggest thats tests are re run on various row numbers...when intially run on 1M row table there was little difference

    --this code has also been run on

    --Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)

    --Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

    --with similar differences.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thanks, Graham, especially for the test harness. I'll check it out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi!

    J Livingston SQL what thinks about this query?

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ClientId NOT IN (

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 72 AND CreateDate > GETDATE()-60

    union

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 33 AND CreateDate > GETDATE()-90

    union

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 34 AND CreateDate > GETDATE()-120)

    /*

    My test is here:

    JLS RESULTS for 10M row table

    (202 row(s) affected)

    Table 'TPC'. Scan count 4, logical reads 32024, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1826 ms, elapsed time = 1869 ms.

    My RESULTS for 10M row table

    (202 row(s) affected)

    Table 'TPC'. Scan count 4, logical reads 31205, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1450 ms, elapsed time = 1752 ms.

    */

    I think than unlikely the indexed column in the function.

    Thanks

  • Well I'll be damned. Talk about laying waste to more than one supposed "Best Practice"! The only good thing that comes of this is that the following tests prove, once again, that you have to test to know for sure.

    Referring to the test code and the SQL Profiler runs below, the "JLS Original" code is more than twice as fast as the supposed best practice code ("JBM Original") and uses more than 100 times fewer logical reads. I'm not sure of the cause but the problem is that the "JBM Original" code caused the optimizer to use a Nested Loop instead of a Merge Join. I'll have to let someone like Grant Fritchey or Paul White explain that one.

    Wondering what other "Best Practices" might be laid to waste, I also included code that uses "OR" ("JBM New OR") and code that uses both "OR" and column/formula encapsulation ("JBM New Both"), both of which are supposed "Worst Practices" and, as you can see in the SQL Profiler results, it ROCKS!

    I also threw in a version with "OR" and "EXCEPT". CPU and READ-wise, it's a bit worse than the "JBM New Both" run, but it is the only code in the test that allows parallelism to come into play which, in this case, cuts the duration of the run almost in half.

    Here's the SQL Profiler results for the test code that follows that. Hat's off to JLS for the 10 Mega row test harness.

    Here's the test code I used.

    DBCC FREEPROCCACHE

    GO

    --===== JLS Original ==============================================================================

    SELECT DISTINCT ClientId

    FROM TPC

    WHERE ClientId

    NOT IN (SELECT DISTINCT ClientId

    FROM TPC

    WHERE (ActivityId = 72) AND (DATEDIFF ( d , CreateDate , GETDATE ( )) < 60) )

    AND ClientId

    NOT IN (SELECT DISTINCT ClientId

    FROM TPC

    WHERE (ActivityId = 33) AND (DATEDIFF ( d , CreateDate , GETDATE ( )) < 90) )

    AND ClientId

    NOT IN (SELECT DISTINCT ClientId

    FROM TPC

    WHERE (ActivityId = 34) AND (DATEDIFF ( d , CreateDate , GETDATE ( )) < 120) )

    ;

    GO 3

    DBCC FREEPROCCACHE

    GO

    --===== JBM Original ==============================================================================

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ClientId NOT IN (SELECT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 72 AND CreateDate > GETDATE()-60)

    AND ClientId NOT IN (SELECT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 33 AND CreateDate > GETDATE()-90)

    AND ClientId NOT IN (SELECT ClientId

    FROM dbo.TPC

    WHERE ActivityId = 34 AND CreateDate > GETDATE()-120)

    ;

    GO 3

    DBCC FREEPROCCACHE

    GO

    --===== JBM New OR =================================================================================

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ClientId NOT IN

    (

    SELECT ClientId

    FROM dbo.TPC

    WHERE (ActivityId = 72 AND CreateDate > GETDATE()-60)

    OR (ActivityId = 33 AND CreateDate > GETDATE()-90)

    OR (ActivityId = 34 AND CreateDate > GETDATE()-120)

    )

    ;

    GO 3

    DBCC FREEPROCCACHE

    GO

    --===== JBM New BOTH =================================================================================

    SELECT DISTINCT ClientId

    FROM dbo.TPC

    WHERE ClientId NOT IN

    (

    SELECT ClientId

    FROM dbo.TPC

    WHERE (ActivityId = 72 AND DATEDIFF(dd, CreateDate, GETDATE()) < 60)

    OR (ActivityId = 33 AND DATEDIFF(dd, CreateDate, GETDATE()) < 90)

    OR (ActivityId = 34 AND DATEDIFF(dd, CreateDate, GETDATE()) < 120)

    )

    ;

    GO 3

    DBCC FREEPROCCACHE

    GO

    --===== JBM New OR EXCEPT=================================================================================

    SELECT ClientId

    FROM dbo.TPC

    EXCEPT

    SELECT ClientId

    FROM dbo.TPC

    WHERE (ActivityId = 72 AND CreateDate > GETDATE()-60)

    OR (ActivityId = 33 AND CreateDate > GETDATE()-90)

    OR (ActivityId = 34 AND CreateDate > GETDATE()-120)

    ;

    GO 3

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    as always, you have done a lot more detailed testing on this.....have you come to any further conclusions ??

    sorry for late reply but have had a few days vacation.

    regards

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • No... haven't had the time, unfortunately. It would really be interesting to figure out why the supposed "Best Practice" method caused the optimizer to shift to a Nested Loop instead of a merge join but I just haven't had the time to dig into it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply