why no Binary Search operator ?

Question

Post reply

why no Binary Search operator ?

DataDog

Hall of Fame

Points: 3305
More actions
July 7, 2010 at 1:26 pm

#223620

where datediff(day, [field], getdate()) < 240
will force a scan (instead of a seek)
but if the optimizer knew that the output of datediff decreases as [field] increases
(which is always the case as getdate() is constant for this statement)
it could reduce io by using a binary search
it can easily determine the min and max values of the [field] if there is a suitable index available
the property of increasing / decreasing is deterministic for many functions and mathematical expressions
so I am surprised there is no "binary search" operator in SQL Server
or am I missing something?

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Nils Gustav Stråbø SSChampion Points: 11259 More actions · Answer 1

I guess [field] is a datetime column. Then you can rewrite your WHERE clause

WHERE [field]< {dateadd formula here}

Putting your SARGable columns in functions (like datediff, substring, left etc) will always cause scans.

DataDog Hall of Fame Points: 3305 More actions · Answer 2

Nils Gustav Stråbø (7/7/2010)
I guess [field] is a datetime column. Then you can rewrite your WHERE clause
WHERE [field]< {dateadd formula here}
Putting your SARGable columns in functions (like datediff, substring, left etc) will always cause scans.

Yes I understand that - that was just one example

(I don't think) you can always rewrite these type of expressions as:

[field] <seekable operator> <scalar>

my point is that if you can determine whether the function is increasing or decreasing and use a binary search

it *wouldn't* always cause a scan

and it wouldn't be hard for the optimizer to know that information for built-in functions

and there could be a hint for user-defined functions

the question being "why isn't there a binary search operator?"

it is such a basic and fundamental technique for quickly locating data in sorted sets ...

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 3

and it wouldn't be hard for the optimizer to know that information for built-in functions

and there could be a hint for user-defined functions"

That is just your opinion. Had you had any discusion about this with the people at Microsoft about how easy this would be?

"the question being "why isn't there a binary search operator?""

You might want to direct your question to the people at Microsoft that made that decision.

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 4

The problem is, it's not a question of having a function that can do what you need. The issue is how is the data stored and retrieved and can you still retrieve it in the same fashion with this new function. The way data is stored in indexes, you have a double-linked list of hard values. As soon as SQL Server has to start calculating, anything, not just binaries, it can no longer refer to the hard value list, it has to start scanning around and checking values against the function.

The language is set up to work off the mechanism you described.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

WayneS SSC Guru Points: 95460 More actions · Answer 5

If you use a function on a column in the where/join condition, you are essentially forcing a scan. You need to recode your expression to eliminate the function on the column.

In your example:

where datediff(day, [field], getdate()) < 240

Should be replaced with:

declare @StartDate datetime,

@EndDate datetime

select -- get tomorrow's date, no time

@EndDate = DateAdd(day, DateDiff(day, 0, GetDate())+1, 0),

-- get date only of 240 days prior

@StartDate = DateAdd(day, -240, DateAdd(day, DateDiff(day, 0, GetDate()), 0))

where field >= @StartDate

and field < @EndDate

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

DataDog Hall of Fame Points: 3305 More actions · Answer 6

please read and understand the post before you reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 7

doobya (7/8/2010)
please read and understand the post before you reply

There's no specific "binary search", since a search which uses an index IS a binary search (that's what a BTree structure is).

So, it will comes down to feeding something in to the query parse and optimizer engine in a format it can use to pick up. All of the post above give suggestions as to how that is done, so I'm not oging to waste time doing so again.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

doobya (7/8/2010)
please read and understand the post before you reply

Sorry, but it looks to me that everyone who responded DID read and understand the post. Unfortunately, you don't seem to understand the answers being provided as they don't support your version of reality.

You need to learn how SQL Server works if you want to write high performing T-SQL code.

DataDog Hall of Fame Points: 3305 More actions · Answer 9

I know perfectly well how SQL Server works

I am talking about the ability of the optimizer to use a binary search algorithm to quickly find the minimum and maximum values

of a function over the values in an index - this would be possible if it had access to whether the function was increasing or decreasing

for a given parameter

it could then build its own "[field] between [min] and [max]"

thus creating an efficient seek

as a binary search can locate a single entry out of 16,777,216 in only 24 iterations

the horrible "datediff(minute, [field], getdate())" would NOT need a table scan

or, put another way, is there a reason that this functionality doesn't exist, for instance is it the case

that a predicate can be always be rewritten to be seekable?

and if so - why doesn't the optimizer do that?

Steve Jones - SSC Editor SSC Guru Points: 736183 More actions · Answer 10

It is an interesting idea. Perhaps a specialized operator like this, adding in more choices for how to solve a particular set of problems is not enough of a boost to enough workloads to implement?

Perhaps it hasn't been thought of?

I don't necessarily see that your solution helps. The B-tree isn't necessarily binary. It's "balanced". However I'm not sure why we don't have more limited scans in some situations. Perhaps there are other considerations.

I would write this up in a clearer manner, maybe with an example or two, and submit it at Connect.microsoft.com. I have seen them consider a number of my suggestions and offer feedback about why or why not something could be done.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

doobya (7/8/2010)
or, put another way, is there a reason that this functionality doesn't exist, for instance is it the case
that a predicate can be always be rewritten to be seekable?
and if so - why doesn't the optimizer do that?

In each edition the optimiser gets better. In 2008 it's capable of doing similar to this for some functions that it couldn't in earlier editions.

It's a tradeoff, the smarter the optimiser gets, the more expensive it gets and the more CPU and memory it needs to do it's job, hence making query execution more expensive. Optimisation's not a simple process.

If you believe this is an absolutely essential feature for SQL Server, tell Microsoft. Posting here won't get it to their attention

http://connect.microsoft.com/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 12

The problem with this approach is that when the execution plan is created, it has to read ALL records to know which are within the "binary" range you talk about. What if more records are inserted, or updated, due to concurrency?

Even worse, if the plan is cached, there may be more records stored later, within the wanted range. How to deal with them? SQL Server have to scan the table to get those records too.

Or are you suggesting you would need a nonclustered single-column index on every column?

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 13

doobya (7/8/2010)
I know perfectly well how SQL Server works
...
it could then build its own "[field] between [min] and [max]"
thus creating an efficient seek
...

How are you suggesting SQL Server builds the "list" without scanning all values? Unless there already is a clustered index on that column?

And next user want to do a BETWEEN min AND max on another column?

N 56°04'39.16"
E 12°55'05.25"

DataDog Hall of Fame Points: 3305 More actions · Answer 14

SwePeso (7/10/2010)
Or are you suggesting you would need a nonclustered single-column index on every column?

I am saying that IF there is an index on the column it should be possible to perform a binary search to find the min and max values

and that should require less IO than a scan

When you look up a number in the phone book - you know whether to binary search left or right because the alphabet is increasing

If the engine knows that a certain function is increasing/decreasing in relation to a certain parameter - that should be enough information

to enable a binary search:

calculate datediff(minute, [field], getdate())

for the first and last row in the index - detect if all rows are out-of-range of predicate ...

then perform standard binary search

calculate datediff(minute, [field], getdate())

for the middle row in the index and decide which half of the index to exclude

rinse and repeat

this would enable the engine to locate 1 row in 16,777,216 in only 24 iterations (for example)

in other words - 24 index reads versus scan across 16,777,216 rows ...

its just an idea - I am interested in how it could get shot down in flames