Searching in date field provides inconsistent results

  • I have been working with SQL Server for too many years to count and this is baffling; perhaps someone has a clue on the cause.  I am running this on SQL Server 2022.  I put together a simple function to return a primary key integer for a lookup table that has mileage rates that are active within a date time span for a vehicle type.  When I execute that function it is returning inconsistent results and I have narrowed it down to the date time span.  Below are the table and function with the final question:

    Table (Note: The ValidFrom and ValidUntil are used to create the time span for changing mileage effective rates.  The most current mileage rates for each vehicle type has a NULL datetime in the ValidUntil column.)

    CREATE TABLE [dbo].[MileageRate](
    [RateId] [INT] IDENTITY(1,1) NOT NULL,
    [VehicleType] [NVARCHAR](50) NOT NULL,
    [Abbreviation] [NVARCHAR](2) NOT NULL,
    [ValidFrom] [DATETIME2](7) NOT NULL,
    [ValidUntil] [DATETIME2](7) NULL
    CONSTRAINT [PK_MileageRate_RateId] PRIMARY KEY CLUSTERED
    (
    [RateId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    Function providing inconsistent results where sometime it was coming back as RateId = zero and sometimes the RateId was the correct value.

    CREATE FUNCTION [dbo].[fn_GetRateId]( 
    @vehicleType VARCHAR(50)
    )
    RETURNS INTEGER
    AS
    /*
    --Testing code executions
    SELECT dbo.fn_GetRateId('pv');
    SELECT dbo.fn_GetRateId('personal');
    SELECT dbo.fn_GetRateId('mp');
    SELECT dbo.fn_GetRateId('motor pool');
    SELECT dbo.fn_GetRateId('gv');
    SELECT dbo.fn_GetRateId('government');

    */BEGIN
    DECLARE @rateId INTEGER = 0;
    --Set comparisonDate as consistent value
    DECLARE @compareDate DATETIME = GETDATE();
    --Clean input
    SET @vehicleType = LOWER(TRIM(ISNULL(@vehicleType, '')));
    --Look for abbreviation first
    IF LEN(@vehicleType) = 2
    BEGIN
    SET @rateId = ISNULL((SELECT TOP(1) RateId FROM [dbo].[MileageRate] WHERE LOWER(Abbreviation) = @vehicleType AND @compareDate BETWEEN ValidFrom AND ISNULL(ValidUntil, @compareDate) ORDER BY ValidFrom), 0);
    END
    --If not found, look by VehicleType
    IF @rateId = 0
    BEGIN
    SET @rateId = ISNULL((SELECT TOP(1) RateId FROM [dbo].[MileageRate] WHERE LOWER(VehicleType) = @vehicleType AND @compareDate BETWEEN ValidFrom AND ISNULL(ValidUntil, @compareDate) ORDER BY ValidFrom), 0);
    END

    RETURN @rateId;
    END
    GO

    I resolved the function to provide a consistent result if I changed the date search to:

    DATEADD(SECOND, -1, @compareDate) BETWEEN ValidFrom AND ISNULL(ValidUntil, @compareDate)

    Does anyone understand why I had to do this?  I have always understood the date BETWEEN  predicate to be inclusive of the start and end dates in the result.

    Thank you in advance.

  • Your ValidFrom and ValidUntil are DATETIME2(7).  GETDATE() returns DATETIME, so you have a mismatch in the precision.  Try using SYSDATETIME() instead of GETDATE().

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • For best performance, you want to stop using NULL so you don't have to do ISNULL() as part of the WHERE.  Yes, you will have to go back and change the 99991231 ValidUntil when a newer row is inserted, but you have to change the NULL value now anyway, so 6 of one ...

    Even more importantly, you should uniquely cluster the lookup table on ( ValidFrom, ValidUntil, VehicleType ) rather than RateId (which is probably just chosen by "default" because of (supposed) "rules" for keys).  This will also prevent duplicate values which should be important here.

    Also, get rid of the LOWER()s by ensuring that all rows in the table use only lower case values for all rows (an AFTER trigger can insure that).

    Finally, I removed the variables because I believe they are some overhead, thus I used in-line row values instead.

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Once you get rid of the LOWER on the vehicle type, then the clustering key could be ( VehicleType, ValidFrom, ValidUntil ).

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thank you Drew and Scott for the feedback.  This is the first time I have used DateTime2 instead of DateTime . Drew.Allen: I changed the ValidFrom and ValidTo to be DateTime and the comparison is 100% when running it through 50 times.

     

  • FWIW If you don't need sub second information in datetime2 values, just use datetime2(0) ( and you gain 2bytes storage )

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ScottPletcher wrote:

    For best performance, you want to stop using NULL so you don't have to do ISNULL() as part of the WHERE.  Yes, you will have to go back and change the 99991231 ValidUntil when a newer row is inserted, but you have to change the NULL value now anyway, so 6 of one ...

    Even more importantly, you should uniquely cluster the lookup table on ( ValidFrom, ValidUntil, VehicleType ) rather than RateId (which is probably just chosen by "default" because of (supposed) "rules" for keys).  This will also prevent duplicate values which should be important here.

    Also, get rid of the LOWER()s by ensuring that all rows in the table use only lower case values for all rows (an AFTER trigger can insure that).

    Finally, I removed the variables because I believe they are some overhead, thus I used in-line row values instead.

    I forgot to mention the NULL values.  I recommend using 9999-12-30 or 9999-01-01 instead of 9999-12-31, to prevent overflows when using certain date/time calculations.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen wrote:

    ScottPletcher wrote:

    For best performance, you want to stop using NULL so you don't have to do ISNULL() as part of the WHERE.  Yes, you will have to go back and change the 99991231 ValidUntil when a newer row is inserted, but you have to change the NULL value now anyway, so 6 of one ...

    Even more importantly, you should uniquely cluster the lookup table on ( ValidFrom, ValidUntil, VehicleType ) rather than RateId (which is probably just chosen by "default" because of (supposed) "rules" for keys).  This will also prevent duplicate values which should be important here.

    Also, get rid of the LOWER()s by ensuring that all rows in the table use only lower case values for all rows (an AFTER trigger can insure that).

    Finally, I removed the variables because I believe they are some overhead, thus I used in-line row values instead.

    I forgot to mention the NULL values.  I recommend using 9999-12-30 or 9999-01-01 instead of 9999-12-31, to prevent overflows when using certain date/time calculations.

    Drew

    That makes sense, seems like a good idea.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I know you solved it already, but I think the issue only happens when the milliseconds of GETDATE() end in a 3, which is why it was unpredictable.

    SQL considers DATETIME '2023-08-25 15:38:05.863' to be greater than DATETIME2(7) '2023-08-25 15:38:05.8633333'

    ISNULL(ValidUntil, @compareDate) returns a DATETIME2(7) so the @compareDate (datetime) variable was greater than ISNULL(ValidUntil, @compareDate) and the zero RateId was returned.

    Conversely '2023-08-25 15:38:05.867' is less than '2023-08-25 15:38:05.8666667'

    DECLARE @ValudUntil DATETIME = '2023-08-25 15:38:05.863'
    DECLARE @ValidUntil2 DATETIME2(7) = CAST(@ValudUntil AS DATETIME2(7))

    SELECT @ValudUntil, '>', @ValidUntil2
    WHERE @ValudUntil > @ValidUntil2

    SET @ValudUntil = DATEADD(ms,3,@ValudUntil)
    SET @ValidUntil2 = CAST(@ValudUntil AS DATETIME2(7))

    SELECT @ValudUntil,'<', @ValidUntil2
    WHERE @ValudUntil < @ValidUntil2

    A DATETIME can only end in 0, 3 or 7 so something about its rounding affects the comparison. I don't know what implicit conversion tables place. When a DATETIME ending .xx3 is converted to datetime2(7)) to it becomes .xx33333 and if you try to convert .xx33333 to datetime it overflows. A good example of why to use consistent data types, but still weird.

  • This was removed by the editor as SPAM

  • The first thing we need to do is get some valid DDL. Your singular table name indicate you have only one Rate. I’m pretty sure that’s not true. The vehicle type is probably not 50 characters long, but if you allow for it. I’m sure that someday that kind of crap will crap up in your table. I also seem to recall that the automobile industry has some standard, shorter codes for vehicle types; why didn’t you use them? You got a temporal interval construct that goes out to seven decimal places! Wow! I never knew a motor pool that kept data to that degree of accuracy. But you’re going to get it when you allow for bad data. I’m going to guess it probably just date level is sufficient for the time in which a given rate will be valid. Also, when you define a temporal interval like this, you really need to assure that it doesn’t begin before it ends as a minimum constraint. There’s actually more you can do with this using some programming tricks from Kunetsov to guarantee that it’s impossible to insert temporal gaps in your table. These columns should have been your primary key! Why do you think that the “parking space number” in the physical storage of a row can ever be a valid relation key? SQL people have a term for guys that do this, we call themI “ID-iotT” Because they don’t understand what key is, or how the relational model works. Next, why do you think “abbreviation” is a meaningful data element name? What does it abbreviate? A state code? Geographic feature? Job title? Again, the automobile industry has a pretty well-established set of encodings that you can use. But what bothers me the most is this thing is supposed to be mileage rates, but there’s no such column in the table. I would have guessed that they would be a monetary amount (so many cents per mile For a particular type of vehicle)

    CREATE TABLE MileageRates

    (vehicle_type CHAR(5) NOT NULL,

    foobar_abbreviation CHAR(2) NOT NULL,

    rate_start_date DATE NOT NULL,

    rate_end_date DATE,

    CHECK (rate_start_date < rate_end_date),

    mileage_rate DECIMAL (5,2) NOT NULL CHECK (mileage_rate >=0.00)

    PRIMARY KEY (vehicle_type, foobar_abbreviation, rate_start_date),

    ));

    I think your improper table design is probably lit garbage data into the schema.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • I appreciate your broad explanation.  To be honest, I created the mileage table just as a simple example because the actual situation was more complex; so, I agree it does mislead you and I should have had a rate value to at least provide the appearance of completeness.  The crux of the question was to try and understand why a query of a datetime vs datetime2 field yielded unreliable results so any evaluation of the design and implementation beyond that is superfluous.  The use of the datetime2 in my example is what I inherited when assigned to a system designed by someone else--don't understand why they implemented datetime2 throughout as it is proving completely unnecessary.  I agree with gist of your message that you should always try to use the appropriate data types and I actually do understand the purpose and proper use of key fields but thank you for your time and effort at a comprehensive response.

  • jcelko212 32090 wrote:

    Your singular table name indicate you have only one Rate.

    Not if you understand that each row is unique and can have only one rate.  Either convention is just fine as long as it's used consistently throughout the database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually this was a discussion inside ANSI on this Before relational modeling, there were other conventions; one of the most popular was IDEF (I can't remember what it stands for), which was record-at-a-time oriented. Thus, you referred to "employee", the record, and not to "personnel", the set level concept. You had to figure it out for yourself, just as you are urging people to do. But then one of the flaws in SQL was that we didn't have separate name spaces at various levels of abstraction in the language. This let me have a column named "employee" and a table, also named "employee" in the schema. The grammar was such that the column name and the table name could not be used ambiguously. For the compiler 🙂 ; but it messes up the humans 🙁 The Metadata Group and other convention committees came up with the idea that I column name should be singular, name an attribute, and of the form <attribute>_<attribute property>. The underscore was a result of Unicode making it universal to all alphabets. I think it was the University of Maryland that did some human factors studies on readability. Turns out, all those decades of reading and writing on lined paper paid off.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply