Modeling interval buckets

Question

Post reply

Modeling interval buckets

Xedni

SSCertifiable

Points: 6951
More actions
January 20, 2015 at 3:19 pm

#293992

Hi SSC,
At a high level, I have "events" that happen to a certain stock symbol. For example, say ACME has a dividend coming up 3 days in the future. Five days from now, that dividend will have occured 2 days in the past. What I'm trying to do is assign a score to how important that event is depending on how far away it is from the current date. The requirement here is that at any given point, the distance between the getdate() and the event date falls into exactly 1 scoring bucket. It's storing down these intervals and easily finding which bucket it belongs to that I'm having difficulty with.
Right now I have it using just dates (e.g. the event is 1 day out, 2 days out, but never 1.5 days out) which simplifies things a lot, but now I have to add in the fractional component. Consider the following test data:
/*
Test setup.
HIGH LEVEL OVERVIEW:
There should be (in this case) 4 distinct intervals so that whatever value is input, it will fall into exactly one bucket.
--Desired output:
I'm not so much worried about whether a value on a boundary condition falls into one bucket or the other
(i.e. I'm not concered about whether a value of -1 falls into group 1 or group 2
What I AM concerned about is a value of 1 falling into TWO buckets. Each bucket must be mutually exclusive.
Grp1: where num >= -45 and num <= -1
Grp2: where num > -1 and num < 1
Grp3: where num >= 1 and <= 45
*/
declare @sampleInputs table (RID int identity(1,1), num decimal(38,10))
declare @testTable table
(
LowerBound decimal(9,5),
UpperBound decimal(9,5)
)
insert into @sampleInputs
values (-5), (-1.6), (-1.5), (-1.4), (-1), (.999995), (.00005), (1), (1.0005), (1.5), (3.9), (4.0), (4.1)
The simplest solution is just to say where num between lower and upper boundaries, but then values such as 1, will evaluate to two buckets
/* First approach
Use integers and use between (inclusive)
PROS: Super simple to code and understand
CONS: Values fall into duplicate buckets along boundaries
*/
insert into @testTable
values (-45, -1), (-1, 1), (1,45)
select
Method = 'Integer Inclusive',
s.RID,
s.Num,
t.LowerBound,
t.UpperBound
from @testTable t
inner join @sampleInputs s
on s.num between t.LowerBound and t.UpperBound
delete from @testTable
The inverse is true if you do non-inclusive intervals (i.e. less than/greater than rather than less than or equal to/greater than or equal to)
The next though I had was to model the intervals using decimals, so that I can use inclusive intervals (i.e. BETWEEN), and admittedly it works for almost all cases, but 1, it's ugly and 2, it doesnt work for ALL cases.
/* Second approach
Use decimals and send the boundaries at a tiny value smaller or larger than wha tyou want so that the intervals come out right
PROS: Reasonably easy to understand and implement
CONS: Still excludes values with precisions falling outside the decimal range (e.g. .9999995
Adds to confusion in configuring these values. If the person doesn't know how much decimal precision to require, you might get .99, and miss a bunch of values
Looks sloppy as heck
*/
insert into @testTable
values (-45, -1), (-.99999, .99999), (1, 45)
select
Method = 'Decimal Inclusive',
s.RID,
s.Num,
t.LowerBound,
t.UpperBound
from @testTable t
inner join @sampleInputs s
on s.num between t.LowerBound and t.UpperBound
From here I started look at things like CLR User Defined Types or additional columns indicating whether to have the boundary be inclusive or exclusive, but both those solutions quickly get hairy.
Is there a nice, simple, solution I'm overlooking here? Or is this really as confounding a problem as it feels?
Executive Junior Cowboy Developer, Esq.[/url]

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86978 More actions · Answer 1

Using the first of your examples:

SELECT

Method = 'Integer Inclusive',

t.RID,

t.Num,

s.LowerBound,

s.UpperBound

FROM @sampleInputs t

CROSS APPLY

(

SELECT TOP 1 s.LowerBound, s.UpperBound

FROM @testTable s

WHERE t.num between s.LowerBound and s.UpperBound

ORDER BY s.LowerBound

) s;

Decide whether you want the value to fall into the first of the intervals overlapping with one or the last, and adjust the ORDER BY s.LowerBound as ASC or DESC accordingly.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Xedni SSCertifiable Points: 6951 More actions · Answer 2

An elegant solution. I'll try scaling it up and see how it performs. Thanks, Dwain!

Executive Junior Cowboy Developer, Esq.[/url]