July 7, 2010 at 1:26 pm
where datediff(day, [field], getdate()) < 240
will force a scan (instead of a seek)
but if the optimizer knew that the output of datediff decreases as [field] increases
(which is always the case as getdate() is constant for this statement)
it could reduce io by using a binary search
it can easily determine the min and max values of the [field] if there is a suitable index available
the property of increasing / decreasing is deterministic for many functions and mathematical expressions
so I am surprised there is no "binary search" operator in SQL Server
or am I missing something?
July 7, 2010 at 1:38 pm
I guess [field] is a datetime column. Then you can rewrite your WHERE clause
WHERE [field]< {dateadd formula here}
Putting your SARGable columns in functions (like datediff, substring, left etc) will always cause scans.
July 7, 2010 at 2:28 pm
Nils Gustav Stråbø (7/7/2010)
I guess [field] is a datetime column. Then you can rewrite your WHERE clause
WHERE [field]< {dateadd formula here}
Putting your SARGable columns in functions (like datediff, substring, left etc) will always cause scans.
Yes I understand that - that was just one example
(I don't think) you can always rewrite these type of expressions as:
[field] <seekable operator> <scalar>
my point is that if you can determine whether the function is increasing or decreasing and use a binary search
it *wouldn't* always cause a scan
and it wouldn't be hard for the optimizer to know that information for built-in functions
and there could be a hint for user-defined functions
the question being "why isn't there a binary search operator?"
it is such a basic and fundamental technique for quickly locating data in sorted sets ...
July 7, 2010 at 2:35 pm
and it wouldn't be hard for the optimizer to know that information for built-in functions
and there could be a hint for user-defined functions"
That is just your opinion. Had you had any discusion about this with the people at Microsoft about how easy this would be?
"the question being "why isn't there a binary search operator?""
You might want to direct your question to the people at Microsoft that made that decision.
July 8, 2010 at 8:50 am
The problem is, it's not a question of having a function that can do what you need. The issue is how is the data stored and retrieved and can you still retrieve it in the same fashion with this new function. The way data is stored in indexes, you have a double-linked list of hard values. As soon as SQL Server has to start calculating, anything, not just binaries, it can no longer refer to the hard value list, it has to start scanning around and checking values against the function.
The language is set up to work off the mechanism you described.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 8, 2010 at 9:01 am
If you use a function on a column in the where/join condition, you are essentially forcing a scan. You need to recode your expression to eliminate the function on the column.
In your example:
where datediff(day, [field], getdate()) < 240
Should be replaced with:
declare @StartDate datetime,
@EndDate datetime
select -- get tomorrow's date, no time
@EndDate = DateAdd(day, DateDiff(day, 0, GetDate())+1, 0),
-- get date only of 240 days prior
@StartDate = DateAdd(day, -240, DateAdd(day, DateDiff(day, 0, GetDate()), 0))
where field >= @StartDate
and field < @EndDate
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 8, 2010 at 10:15 am
please read and understand the post before you reply
July 8, 2010 at 10:37 am
doobya (7/8/2010)
please read and understand the post before you reply
There's no specific "binary search", since a search which uses an index IS a binary search (that's what a BTree structure is).
So, it will comes down to feeding something in to the query parse and optimizer engine in a format it can use to pick up. All of the post above give suggestions as to how that is done, so I'm not oging to waste time doing so again.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 8, 2010 at 11:33 am
doobya (7/8/2010)
please read and understand the post before you reply
Sorry, but it looks to me that everyone who responded DID read and understand the post. Unfortunately, you don't seem to understand the answers being provided as they don't support your version of reality.
You need to learn how SQL Server works if you want to write high performing T-SQL code.
July 8, 2010 at 2:28 pm
I know perfectly well how SQL Server works
I am talking about the ability of the optimizer to use a binary search algorithm to quickly find the minimum and maximum values
of a function over the values in an index - this would be possible if it had access to whether the function was increasing or decreasing
for a given parameter
it could then build its own "[field] between [min] and [max]"
thus creating an efficient seek
as a binary search can locate a single entry out of 16,777,216 in only 24 iterations
the horrible "datediff(minute, [field], getdate())" would NOT need a table scan
or, put another way, is there a reason that this functionality doesn't exist, for instance is it the case
that a predicate can be always be rewritten to be seekable?
and if so - why doesn't the optimizer do that?
July 8, 2010 at 2:46 pm
It is an interesting idea. Perhaps a specialized operator like this, adding in more choices for how to solve a particular set of problems is not enough of a boost to enough workloads to implement?
Perhaps it hasn't been thought of?
I don't necessarily see that your solution helps. The B-tree isn't necessarily binary. It's "balanced". However I'm not sure why we don't have more limited scans in some situations. Perhaps there are other considerations.
I would write this up in a clearer manner, maybe with an example or two, and submit it at Connect.microsoft.com. I have seen them consider a number of my suggestions and offer feedback about why or why not something could be done.
July 8, 2010 at 2:46 pm
doobya (7/8/2010)
or, put another way, is there a reason that this functionality doesn't exist, for instance is it the casethat a predicate can be always be rewritten to be seekable?
and if so - why doesn't the optimizer do that?
In each edition the optimiser gets better. In 2008 it's capable of doing similar to this for some functions that it couldn't in earlier editions.
It's a tradeoff, the smarter the optimiser gets, the more expensive it gets and the more CPU and memory it needs to do it's job, hence making query execution more expensive. Optimisation's not a simple process.
If you believe this is an absolutely essential feature for SQL Server, tell Microsoft. Posting here won't get it to their attention
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 10, 2010 at 3:59 pm
The problem with this approach is that when the execution plan is created, it has to read ALL records to know which are within the "binary" range you talk about. What if more records are inserted, or updated, due to concurrency?
Even worse, if the plan is cached, there may be more records stored later, within the wanted range. How to deal with them? SQL Server have to scan the table to get those records too.
Or are you suggesting you would need a nonclustered single-column index on every column?
N 56°04'39.16"
E 12°55'05.25"
July 10, 2010 at 4:03 pm
doobya (7/8/2010)
I know perfectly well how SQL Server works...
it could then build its own "[field] between [min] and [max]"
thus creating an efficient seek
...
How are you suggesting SQL Server builds the "list" without scanning all values? Unless there already is a clustered index on that column?
And next user want to do a BETWEEN min AND max on another column?
N 56°04'39.16"
E 12°55'05.25"
July 12, 2010 at 2:49 am
SwePeso (7/10/2010)
Or are you suggesting you would need a nonclustered single-column index on every column?
I am saying that IF there is an index on the column it should be possible to perform a binary search to find the min and max values
and that should require less IO than a scan
When you look up a number in the phone book - you know whether to binary search left or right because the alphabet is increasing
If the engine knows that a certain function is increasing/decreasing in relation to a certain parameter - that should be enough information
to enable a binary search:
calculate datediff(minute, [field], getdate())
for the first and last row in the index - detect if all rows are out-of-range of predicate ...
then perform standard binary search
calculate datediff(minute, [field], getdate())
for the middle row in the index and decide which half of the index to exclude
rinse and repeat
this would enable the engine to locate 1 row in 16,777,216 in only 24 iterations (for example)
in other words - 24 index reads versus scan across 16,777,216 rows ...
its just an idea - I am interested in how it could get shot down in flames
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply