Modeling interval buckets

  • Hi SSC,

    At a high level, I have "events" that happen to a certain stock symbol. For example, say ACME has a dividend coming up 3 days in the future. Five days from now, that dividend will have occured 2 days in the past. What I'm trying to do is assign a score to how important that event is depending on how far away it is from the current date. The requirement here is that at any given point, the distance between the getdate() and the event date falls into exactly 1 scoring bucket. It's storing down these intervals and easily finding which bucket it belongs to that I'm having difficulty with.

    Right now I have it using just dates (e.g. the event is 1 day out, 2 days out, but never 1.5 days out) which simplifies things a lot, but now I have to add in the fractional component. Consider the following test data:

    /*

    Test setup.

    HIGH LEVEL OVERVIEW:

    There should be (in this case) 4 distinct intervals so that whatever value is input, it will fall into exactly one bucket.

    --Desired output:

    I'm not so much worried about whether a value on a boundary condition falls into one bucket or the other

    (i.e. I'm not concered about whether a value of -1 falls into group 1 or group 2

    What I AM concerned about is a value of 1 falling into TWO buckets. Each bucket must be mutually exclusive.

    Grp1: where num >= -45 and num <= -1

    Grp2: where num > -1 and num < 1

    Grp3: where num >= 1 and <= 45

    */

    declare @sampleInputs table (RID int identity(1,1), num decimal(38,10))

    declare @testTable table

    (

    LowerBound decimal(9,5),

    UpperBound decimal(9,5)

    )

    insert into @sampleInputs

    values (-5), (-1.6), (-1.5), (-1.4), (-1), (.999995), (.00005), (1), (1.0005), (1.5), (3.9), (4.0), (4.1)

    The simplest solution is just to say where num between lower and upper boundaries, but then values such as 1, will evaluate to two buckets

    /* First approach

    Use integers and use between (inclusive)

    PROS: Super simple to code and understand

    CONS: Values fall into duplicate buckets along boundaries

    */

    insert into @testTable

    values (-45, -1), (-1, 1), (1,45)

    select

    Method = 'Integer Inclusive',

    s.RID,

    s.Num,

    t.LowerBound,

    t.UpperBound

    from @testTable t

    inner join @sampleInputs s

    on s.num between t.LowerBound and t.UpperBound

    delete from @testTable

    The inverse is true if you do non-inclusive intervals (i.e. less than/greater than rather than less than or equal to/greater than or equal to)

    The next though I had was to model the intervals using decimals, so that I can use inclusive intervals (i.e. BETWEEN), and admittedly it works for almost all cases, but 1, it's ugly and 2, it doesnt work for ALL cases.

    /* Second approach

    Use decimals and send the boundaries at a tiny value smaller or larger than wha tyou want so that the intervals come out right

    PROS: Reasonably easy to understand and implement

    CONS: Still excludes values with precisions falling outside the decimal range (e.g. .9999995

    Adds to confusion in configuring these values. If the person doesn't know how much decimal precision to require, you might get .99, and miss a bunch of values

    Looks sloppy as heck

    */

    insert into @testTable

    values (-45, -1), (-.99999, .99999), (1, 45)

    select

    Method = 'Decimal Inclusive',

    s.RID,

    s.Num,

    t.LowerBound,

    t.UpperBound

    from @testTable t

    inner join @sampleInputs s

    on s.num between t.LowerBound and t.UpperBound

    From here I started look at things like CLR User Defined Types or additional columns indicating whether to have the boundary be inclusive or exclusive, but both those solutions quickly get hairy.

    Is there a nice, simple, solution I'm overlooking here? Or is this really as confounding a problem as it feels?

    Executive Junior Cowboy Developer, Esq.[/url]

  • Using the first of your examples:

    SELECT

    Method = 'Integer Inclusive',

    t.RID,

    t.Num,

    s.LowerBound,

    s.UpperBound

    FROM @sampleInputs t

    CROSS APPLY

    (

    SELECT TOP 1 s.LowerBound, s.UpperBound

    FROM @testTable s

    WHERE t.num between s.LowerBound and s.UpperBound

    ORDER BY s.LowerBound

    ) s;

    Decide whether you want the value to fall into the first of the intervals overlapping with one or the last, and adjust the ORDER BY s.LowerBound as ASC or DESC accordingly.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • An elegant solution. I'll try scaling it up and see how it performs. Thanks, Dwain!

    Executive Junior Cowboy Developer, Esq.[/url]

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply