Choosing the Birthday of a Customer by month?

  • Lynn Pettis - Wednesday, November 22, 2017 12:48 PM

    ScottPletcher - Wednesday, November 22, 2017 12:44 PM

    Lynn Pettis - Wednesday, November 22, 2017 12:34 PM

    ScottPletcher - Wednesday, November 22, 2017 12:25 PM

    Lynn Pettis - Wednesday, November 22, 2017 12:05 PM

    I'd rather store the date as a date and use computed columns if I need the month, day, or year separately on a regular basis.

    I don't know if computed columns still prevent parallel execution plans and/or still have other execution quirks. 

    But it doesn't really matter.  Either way, I'd still prefer to properly normalize data whenever possible, and for the standard reason: it prevents issues and more work later.  You can bet that sometime after you compute and store the month, they'll be a requirement to notify based on birthdays in the upcoming week, and you're back to adjusting the table yet again to add yet another computed column, day.

    I think I covered that in my post, or am I imagining that I wrote month, day, or year?

    If you're going to store them all separately, then it's redundant to store them combined, and again violates normal form (redundant data).

    I'm not against computed columns per se, but I don't believe they are the proper solution for this problem.

    A date is a date and should be stored as a date. Just saying.

    Even if you have to duplicate every part of the date in separate columns just to write queries?

    Not to me.  If month and day alone are that significant, I'd store them with a base year, then store the actual year separately.  It's trivial to combine them when you need to display the actual birthday.  And in the meantime, I'm not duplicating the entire column's data in other columns in the row.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher - Wednesday, November 22, 2017 1:01 PM

    Lynn Pettis - Wednesday, November 22, 2017 12:48 PM

    ScottPletcher - Wednesday, November 22, 2017 12:44 PM

    Lynn Pettis - Wednesday, November 22, 2017 12:34 PM

    ScottPletcher - Wednesday, November 22, 2017 12:25 PM

    Lynn Pettis - Wednesday, November 22, 2017 12:05 PM

    I'd rather store the date as a date and use computed columns if I need the month, day, or year separately on a regular basis.

    I don't know if computed columns still prevent parallel execution plans and/or still have other execution quirks. 

    But it doesn't really matter.  Either way, I'd still prefer to properly normalize data whenever possible, and for the standard reason: it prevents issues and more work later.  You can bet that sometime after you compute and store the month, they'll be a requirement to notify based on birthdays in the upcoming week, and you're back to adjusting the table yet again to add yet another computed column, day.

    I think I covered that in my post, or am I imagining that I wrote month, day, or year?

    If you're going to store them all separately, then it's redundant to store them combined, and again violates normal form (redundant data).

    I'm not against computed columns per se, but I don't believe they are the proper solution for this problem.

    A date is a date and should be stored as a date. Just saying.

    Even if you have to duplicate every part of the date in separate columns just to write queries?

    Not to me.  If month and day alone are that significant, I'd store them with a base year, then store the actual year separately.  It's trivial to combine them when you need to display the actual birthday.  And in the meantime, I'm not duplicating the entire column's data in other columns in the row.

    What ever.  But you will reconstitute the date when you need to query based on the date, right?

  • ScottPletcher - Wednesday, November 22, 2017 12:25 PM

    Lynn Pettis - Wednesday, November 22, 2017 12:05 PM

    I'd rather store the date as a date and use computed columns if I need the month, day, or year separately on a regular basis.

    I don't know if computed columns still prevent parallel execution plans and/or still have other execution quirks. 

    But it doesn't really matter.  Either way, I'd still prefer to properly normalize data whenever possible, and for the standard reason: it prevents issues and more work later.  You can bet that sometime after you compute and store the month, they'll be a requirement to notify based on birthdays in the upcoming week, and you're back to adjusting the table yet again to add yet another computed column, day.

    That would be when I shifted to some form of Calendar table...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • chef423 - Wednesday, November 22, 2017 10:12 AM

    Lynn Pettis - Wednesday, November 22, 2017 8:52 AM

    chef423 - Wednesday, November 22, 2017 8:51 AM

    Jeff Moden - Tuesday, November 21, 2017 8:46 PM

    If finding rows by birth month without regard to year or day is something important, it's probably worthwhile to materialize the birth month in the table using a persisted computed column.  Like this....


     CREATE TABLE dbo.birthdays
            (
             cust_id    INTEGER         NOT NULL PRIMARY KEY
            ,cust_fname NVARCHAR(50)    NOT NULL
            ,cust_lname NVARCHAR(50)    NOT NULL
            ,cust_dob   DATETIME        NOT NULL
            ,cust_dobmo AS DATEPART(mm,cust_dob) PERSISTED --<----<<< Added this column    
            )
    ;

    Then the code become trivial for such things.  For example...


    DECLARE @pSomeDate DATETIME = GETDATE(); --Could be a parameter for a Stored Procedure or iTVF

     SELECT *
       FROM dbo.birthdays
      WHERE cust_dobmo = DATEPART(mm,@pSomeDate)
    ;

    Thanks Jeff. The original table I get the Customer data from is dbo.Customers

    So I cannot replace birthdays with that name is says the table is already created.

    Then add the computed column to the Customer table.

    ALTER TABLE Customers
    ADD cust_dobmo AS DATEPART(mm,cust_dob) PERSISTED

    Yep.... that's the ticket.  Because it's persisted, you could actually put an index on it if you need it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • saravanatn - Wednesday, November 22, 2017 8:24 AM

    Hi Jeff,

    Thanks for point it out and your contribution to SQL is immense . I am a tester when I run my query it usually takes lot of time.What are the ways to practically improve query performance.

    Regards,
    Saravanan

    Thanks Saravanan.  There are a ton of things that can be done to improve performance.  It usually depends on what the code needs to do.  Things like SARGable queries, datatype matching, early data reduction (number of base rows to work with), a very good understanding of Clustered and Non-Clustered indexes, and a few "Black Arts" techniques are very helpful in that area.  Performance tuning is an alchemy combination of science, math, knowledge of the product, occasional luck, some good ol' "PFM" (Pure Freaking' Magic), and a lot of practice.  😉

    A lot of my "practice" isn't just from what I've encountered at work.  It's real practice where I make a million or more rows of random but constrained data and have at it.  That's probably the best advice I can give you.  Learn how to make lots of test data to practice with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff for sharing your experience.....

    Regards,
    Saravanan

    Saravanan

  • Jeff Moden - Tuesday, November 21, 2017 8:46 PM

    If finding rows by birth month without regard to year or day is something important, it's probably worthwhile to materialize the birth month in the table using a persisted computed column.  Like this....


     CREATE TABLE dbo.birthdays
            (
             cust_id    INTEGER         NOT NULL PRIMARY KEY
            ,cust_fname NVARCHAR(50)    NOT NULL
            ,cust_lname NVARCHAR(50)    NOT NULL
            ,cust_dob   DATETIME        NOT NULL
            ,cust_dobmo AS DATEPART(mm,cust_dob) PERSISTED --<----<<< Added this column    
            )
    ;

    Then the code become trivial for such things.  For example...


    DECLARE @pSomeDate DATETIME = GETDATE(); --Could be a parameter for a Stored Procedure or iTVF

     SELECT *
       FROM dbo.birthdays
      WHERE cust_dobmo = DATEPART(mm,@pSomeDate)
    ;

    I don't think adding a persistent column makes much sense in this case.

    There are only 12 month, so cardinality will be always low.
    Which means - scanning a cluisterd index will be usually a preferred choice for optimizer.

    Clustering by [cust_dobmo] does not look like a good idea, so even if the column would be indexed the corresponding bookmark lookup would be more expensive than a simple table scan (remember - we return about 1/12 of the whole table anyway).

    So, simple
     SELECT *
       FROM dbo.birthdays
      WHERE DATEPART(mm,cust_dob) = DATEPART(mm,@pSomeDate)

    would do, and the performance would be noth worse than of the persistent column.

    _____________
    Code for TallyGenerator

  • Sergiy - Thursday, November 23, 2017 1:45 AM

    Jeff Moden - Tuesday, November 21, 2017 8:46 PM

    If finding rows by birth month without regard to year or day is something important, it's probably worthwhile to materialize the birth month in the table using a persisted computed column.  Like this....


     CREATE TABLE dbo.birthdays
            (
             cust_id    INTEGER         NOT NULL PRIMARY KEY
            ,cust_fname NVARCHAR(50)    NOT NULL
            ,cust_lname NVARCHAR(50)    NOT NULL
            ,cust_dob   DATETIME        NOT NULL
            ,cust_dobmo AS DATEPART(mm,cust_dob) PERSISTED --<----<<< Added this column    
            )
    ;

    Then the code become trivial for such things.  For example...


    DECLARE @pSomeDate DATETIME = GETDATE(); --Could be a parameter for a Stored Procedure or iTVF

     SELECT *
       FROM dbo.birthdays
      WHERE cust_dobmo = DATEPART(mm,@pSomeDate)
    ;

    I don't think adding a persistent column makes much sense in this case.

    There are only 12 month, so cardinality will be always low.
    Which means - scanning a cluisterd index will be usually a preferred choice for optimizer.

    Clustering by [cust_dobmo] does not look like a good idea, so even if the column would be indexed the corresponding bookmark lookup would be more expensive than a simple table scan (remember - we return about 1/12 of the whole table anyway).

    So, simple
     SELECT *
       FROM dbo.birthdays
      WHERE DATEPART(mm,cust_dob) = DATEPART(mm,@pSomeDate)

    would do, and the performance would be noth worse than of the persistent column.

    Just to be clear, the is no place in my post that I suggested clustering on cust_dobmo.  As you say, the cardinality is way too low for such a thing.  A PERSISTED computed column in no way requires than any index be imparted to it never mind a CI.  If you do add an NCI to the column and the table does have a CI on it then, of course, the key(s) to the CI would automatically be added to the NCI key column(s) and to the leaf level of the index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It makes zero sense to pull out just the month and index it.  You'd be far better off indexing the date with a dummy year.

    And if one (almost) always searched this table by birth month (or month and day), then you should indeed cluster on it.  Selectivity is only one factor in clustering.  The main point is to reduce I/O.  Therefore, you must go beyond super-simplistic basic rules when setting up indexes.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher - Friday, November 24, 2017 8:02 AM

    It makes zero sense to pull out just the month and index it.  You'd be far better off indexing the date with a dummy year.

    And if one (almost) always searched this table by birth month (or month and day), then you should indeed cluster on it.  Selectivity is only one factor in clustering.  The main point is to reduce I/O.  Therefore, you must go beyond super-simplistic basic rules when setting up indexes.

    As you imply, "It Depends".  Depending on what other queries are played against the table, pulling out just the month and indexing it can make perfect sense and, while I also agree about you thoughts on CI, it could be a horrible thing to use any part of a data or birth as a CI especially when you consider the size of the RID and the Uniquefier (the combination of which is quite wide) that would be automatically added to the CI.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, November 24, 2017 9:13 AM

    ScottPletcher - Friday, November 24, 2017 8:02 AM

    It makes zero sense to pull out just the month and index it.  You'd be far better off indexing the date with a dummy year.

    And if one (almost) always searched this table by birth month (or month and day), then you should indeed cluster on it.  Selectivity is only one factor in clustering.  The main point is to reduce I/O.  Therefore, you must go beyond super-simplistic basic rules when setting up indexes.

    As you imply, "It Depends".  Depending on what other queries are played against the table, pulling out just the month and indexing it can make perfect sense and, while I also agree about you thoughts on CI, it could be a horrible thing to use any part of a data or birth as a CI especially when you consider the size of the RID and the Uniquefier (the combination of which is quite wide) that would be automatically added to the CI.

    No, it's the only logical CI if you (almost) always search by birth month (or month and day; remember, with a "placeholder" year to make lookups more efficient).  Particularly since we're very likely to want to sort by birthday.  With a CI on birthday, no actual sort is required, and sorts are relatively expensive operations.

    As to other indexes, on a table with rows this narrow, the idea is to not need any other indexes, although one might be unavoidable if you can't get the birth day and month to narrow the search.

    As to size, a date's only 3 bytes to go with the 4-byte identity / uniquifier.  No big deal if the clus key exceeds the supposedly "sacred" 4 bytes, esp. if it so drastically improves (almost) all lookups and/or avoid sorting of the results.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • chef423 - Wednesday, November 22, 2017 8:51 AM

    Jeff Moden - Tuesday, November 21, 2017 8:46 PM

    If finding rows by birth month without regard to year or day is something important, it's probably worthwhile to materialize the birth month in the table using a persisted computed column.  Like this....


     CREATE TABLE dbo.birthdays
            (
             cust_id    INTEGER         NOT NULL PRIMARY KEY
            ,cust_fname NVARCHAR(50)    NOT NULL
            ,cust_lname NVARCHAR(50)    NOT NULL
            ,cust_dob   DATETIME        NOT NULL
            ,cust_dobmo AS DATEPART(mm,cust_dob) PERSISTED --<----<<< Added this column    
            )
    ;

    Then the code become trivial for such things.  For example...


    DECLARE @pSomeDate DATETIME = GETDATE(); --Could be a parameter for a Stored Procedure or iTVF

     SELECT *
       FROM dbo.birthdays
      WHERE cust_dobmo = DATEPART(mm,@pSomeDate)
    ;

    Thanks Jeff. The original table I get the Customer data from is dbo.Customers

    So I cannot replace birthdays with that name is says the table is already created.

    Understood.... I was writing it for you example code that you posted.  If the table already exists then, as Lynn suggests, alter the table to add the column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher - Friday, November 24, 2017 8:02 AM

    It makes zero sense to pull out just the month and index it.  You'd be far better off indexing the date with a dummy year.

    And if one (almost) always searched this table by birth month (or month and day), then you should indeed cluster on it.  Selectivity is only one factor in clustering.  The main point is to reduce I/O.  Therefore, you must go beyond super-simplistic basic rules when setting up indexes.

    If the main point is to reduce I/O, then convert the month to TINYINT and persist it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher - Friday, November 24, 2017 10:22 AM

    Jeff Moden - Friday, November 24, 2017 9:13 AM

    ScottPletcher - Friday, November 24, 2017 8:02 AM

    It makes zero sense to pull out just the month and index it.  You'd be far better off indexing the date with a dummy year.

    And if one (almost) always searched this table by birth month (or month and day), then you should indeed cluster on it.  Selectivity is only one factor in clustering.  The main point is to reduce I/O.  Therefore, you must go beyond super-simplistic basic rules when setting up indexes.

    As you imply, "It Depends".  Depending on what other queries are played against the table, pulling out just the month and indexing it can make perfect sense and, while I also agree about you thoughts on CI, it could be a horrible thing to use any part of a data or birth as a CI especially when you consider the size of the RID and the Uniquefier (the combination of which is quite wide) that would be automatically added to the CI.

    No, it's the only logical CI if you (almost) always search by birth month (or month and day; remember, with a "placeholder" year to make lookups more efficient).  Particularly since we're very likely to want to sort by birthday.  With a CI on birthday, no actual sort is required, and sorts are relatively expensive operations.

    As to other indexes, on a table with rows this narrow, the idea is to not need any other indexes, although one might be unavoidable if you can't get the birth day and month to narrow the search.

    As to size, a date's only 3 bytes to go with the 4-byte identity / uniquifier.  No big deal if the clus key exceeds the supposedly "sacred" 4 bytes, esp. if it so drastically improves (almost) all lookups and/or avoid sorting of the results.

    You're not considering other things.  If the table is frequently inserted into, you have to consider the physical fragmentation that the table will go through.  12 months has a cardinality of 12, which is 4 less than a GUID.  Done correctly, you won't be sorting the results on the month for this problem at all.

    I also don't recommend having the sister table known as "Birthdays", either.  I made the mistake of thinking the example was just a derivative example rather than posting the whole Customer table.   You are creating the problem of having to maintain a separate table for every INSERT.  It's easier just to create an NCI with the necessary columns INCLUDEd.  An NCI is no more of a duplication of data than a sister table and always stays in sync with the data in the CI.  Remember that an NCI has the same basic structure as a CI except that it also contains the key(s) from the CI.  In other words, it's a miniature version of the table in a different logical order.

    Now that we know that the real table is the Customer table, I definitely wouldn't squander the CI on a birthdate or similar column in the real Customer table.  It's much too valuable for other things, like preventing physical fragmentation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's a quick test, using just over 3.25 million rows of data in both dbo.customer_1 and dbo.customer_2.

    Aside from the persisted computed columns on 2, both tables identical with identical data.

    The results are below...
    -- DROP TABLE dbo.customer_1
    CREATE TABLE dbo.customer_1 (
        cust_id INT NOT NULL IDENTITY(1,1),
        last_name VARCHAR(30) NOT NULL
            CONSTRAINT df_lastlame DEFAULT (''),
        first_name VARCHAR(30) NOT NULL
            CONSTRAINT df_firstname DEFAULT (''),
        date_of_birth DATE NOT NULL
            CONSTRAINT df_dateofbirth DEFAULT ('19000101')
        CONSTRAINT pk_customer PRIMARY KEY (cust_id)
        );
    GO

    INSERT dbo.customer_1 (last_name, first_name, date_of_birth)
    SELECT DISTINCT
        LAG(r.PatientNameLast, 1, 'Doe') OVER (ORDER BY (SELECT NULL)),
        LEAD(r.PatientNameFirst, 1, 'Jane') OVER (ORDER BY (SELECT NULL)),
        r.PatientBirthDate
    FROM
        dbo.Referral r
    WHERE
        r.PatientBirthDate > '19000101';
    GO

    CREATE NONCLUSTERED INDEX ix_dateofbirth ON dbo.customer_1 (date_of_birth)
    INCLUDE (cust_id, last_name, first_name);
    GO

    -- DROP TABLE dbo.customer_2
    CREATE TABLE dbo.customer_2 (
        cust_id INT NOT NULL IDENTITY(1,1),
        last_name VARCHAR(30) NOT NULL
            CONSTRAINT df_lastlame2 DEFAULT (''),
        first_name VARCHAR(30) NOT NULL
            CONSTRAINT df_firstname2 DEFAULT (''),
        date_of_birth DATE NOT NULL
            CONSTRAINT df_dateofbirth2 DEFAULT ('19000101'),
        b_month AS CONVERT(TINYINT, MONTH(date_of_birth)) PERSISTED,
        b_day AS CONVERT(TINYINT, DAY(date_of_birth)) PERSISTED,
        CONSTRAINT pk_customer2 PRIMARY KEY (cust_id)
        );
    GO

    INSERT dbo.customer_2 (last_name, first_name, date_of_birth)
    SELECT
        c1.last_name,
        c1.first_name,
        c1.date_of_birth
    FROM
        dbo.customer_1 c1;
    GO

    CREATE NONCLUSTERED INDEX ix_bday_bmonth_2 ON dbo.customer_2 (b_day, b_month)
    INCLUDE (cust_id, last_name, first_name, date_of_birth)
    GO

    -- DROP TABLE dbo.dob_calendar
    CREATE TABLE dbo.dob_calendar (
        date_of_birth DATE NOT NULL,
        b_month AS CONVERT(TINYINT, MONTH(date_of_birth)) PERSISTED,
        b_day AS CONVERT(TINYINT, DAY(date_of_birth)) PERSISTED,
        CONSTRAINT pk_dobcalendar PRIMARY KEY CLUSTERED (b_day, b_month, date_of_birth)
        );
    GO

    INSERT dbo.dob_calendar (date_of_birth)
    SELECT
        DATEADD(DAY, 0, t.n)
    FROM
        dbo.tfn_Tally(DATEDIFF(DAY, 0, GETDATE()), 0) t;    -- dbo.tfn_Tally is a "ben-gan" style tally/numbers function. I think we've all got one by now, but I'll post the code if anyone wants it...
    GO

    ALTER INDEX pk_dobcalendar ON dbo.dob_calendar REBUILD;
    GO

    --==========================================================================================

    DBCC DROPCLEANBUFFERS;
    GO
    WAITFOR DELAY '00:00:01';
    GO
    DECLARE @_start DATETIME2(7) = SYSDATETIME();
    PRINT(CONCAT('************** start join dob_calendar: ', @_start, ' **************************'));
    SET STATISTICS IO ON;
    ---------------------
        SELECT
            cst.cust_id,
            cst.last_name,
            cst.first_name,
            cst.date_of_birth
        FROM
            dbo.customer_1 cst
            JOIN dbo.dob_calendar dc
                ON cst.date_of_birth = dc.date_of_birth
        WHERE
            dc.b_month = 11
            AND dc.b_day = 27;
    ---------------------
    PRINT(CONCAT('************** finish join dob_calendar... duration in ms: , DATEDIFF(mcs, @_start, SYSDATETIME()) / 1000.0, ' **************************

    '));
    GO

    DBCC DROPCLEANBUFFERS;
    GO
    WAITFOR DELAY '00:00:01';
    GO
    DECLARE @_start DATETIME2(7) = SYSDATETIME();
    PRINT(CONCAT('************** start where exists dob_calendar: ', @_start, ' **************************'));
    SET STATISTICS IO ON;
    ---------------------
        SELECT
            cst.cust_id,
            cst.last_name,
            cst.first_name,
            cst.date_of_birth
        FROM
            dbo.customer_1 cst
        WHERE
            EXISTS (
                    SELECT 1
                    FROM dbo.dob_calendar dc
                    WHERE cst.date_of_birth = dc.date_of_birth
                        AND dc.b_month = 11
                        AND dc.b_day = 27
                    );
    ---------------------
    PRINT(CONCAT('************** finish where exists dob_calendar... duration in ms: ', DATEDIFF(mcs, @_start, SYSDATETIME()) / 1000.0, ' **************************

    '));
    GO

    DBCC DROPCLEANBUFFERS;
    GO
    WAITFOR DELAY '00:00:01';
    GO
    DECLARE @_start DATETIME2(7) = SYSDATETIME();
    PRINT(CONCAT('************** start persisted cols: ', @_start, ' **************************'));
    SET STATISTICS IO ON;
    ---------------------
        SELECT
            cst.cust_id,
            cst.last_name,
            cst.first_name,
            cst.date_of_birth
        FROM
            dbo.customer_2 cst
        WHERE
            cst.b_month = 11
            AND cst.b_day = 27;
    ---------------------
    PRINT(CONCAT('************** finish persisted cols... duration in ms: ', DATEDIFF(mcs, @_start, SYSDATETIME()) / 1000.0, ' **************************

    '));
    GO

    Results... ((query options: discard results after execution)

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    ************** start join dob_calendar: 2017-11-28 00:22:15.5127477 **************************

    (8217 rows affected)
    Table 'customer_1'. Scan count 117, logical reads 910, physical reads 3, read-ahead reads 576, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'dob_calendar'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    ************** finish join dob_calendar... duration in ms: 289.054000 **************************

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    ************** start where exists dob_calendar: 2017-11-28 00:22:17.2769080 **************************

    (8217 rows affected)
    Table 'customer_1'. Scan count 117, logical reads 910, physical reads 3, read-ahead reads 576, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'dob_calendar'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    ************** finish where exists dob_calendar... duration in ms: 275.049000 **************************

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    ************** start persisted cols: 2017-11-28 00:22:19.0060602 **************************

    (8221 rows affected)
    Table 'customer_2'. Scan count 1, logical reads 39, physical reads 2, read-ahead reads 36, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    ************** finish persisted cols... duration in ms: 257.038000 **************************

    All 3 plowed through 3.2+ million rows in ~1/4 second. Given the fact that this isn't likely to be executed more than once or twice a day... me personally... I'd be inclined to go with the "dob_calendar" table option and avoid the negative consequences of having the computed columns. The "dob_calendar" table is cheap to create & populate, easy to maintain and won't impact other queries hitting the original table (yes there is the covering index on date_of_birth... but... AFAIC, that's negated by the fact that a similar index would be required by the persisted computed columns solution).
    That said, Jeff has a long history of finding flaws and/or holes in my logic. So, there's a good chance that tradition will continue and I'll either learn something new or gain a new perspective.

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply