What''s the magic of ''between'' ?

Question

What''s the magic of ''between'' ?

Christopher Klein

SSCrazy

Points: 2648
More actions
January 31, 2006 at 12:54 pm

#96576

Greetings,
This may seem like a stupid question but I've never stopped to look at it beyond the simple 'it works'
I have a column with data in it: '001-1210-105'
When I do the following SQL it returns a result:
SELECT GL.vchrACCTNUMB FROM GENLACCT GL where GL.vchrACCTNUMB between '001-1210' and '001-1219'
My question is WHY? I always used it on date fields before and just figured there was some internal mechanism that just knew how to do date comparisons, but here I have a varchar. I've tested out a few scenarios adding records inside and outside the range and changing the between range and it really seems like it does EXACTLY what I want it to do but I'm just not happy not necessarily knowing why...
Any tech articles or just thoughts would be appreciated.
regards,
Chris

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

David le Quesne SSCertifiable Points: 5229 More actions · Answer 1

Hi Chris,

I'm not an expert, but my comprehension is that when comparing strings, SQL Server does a character by character comparison of the ASCII codes, starting with the leftmost characters.

A string is greater than another string if the ASCII code of the corresponding character is greater.

'001-1219' is greater than '001-1210' because the ASCII code of the 8th character '9' is 57, and the ASCII code of '0' is 48.

A string is also greater than another string if it starts with the same characters, but has additional characters.

'001-1210-105' is greater than '001-1210' because it has additional characters.

However, '001-1210-105' is less than '001-1219' because the comparison is character by character, and the ASCII code of the 8th character is less.

I can't find a reference for this, but perhaps someone else can oblige.

David

If it ain't broke, don't fix it...

Stewart Joslyn SSCertifiable Points: 6131 More actions · Answer 2

Dates are stored internally as numbers (zero is 1 January 1900, then count days with time being the decimal part) so the comparison actually is on a number range (and, as SQL Server only has date-time, it is easy to get the end of your range wrong since the raw date only covers the first couple of milliseconds of that day).

If you are comparing numbers as strings, you have to be very carefull. For example, 0 is less than 0 because the space character is ASCII 32 while zero is 48. Similarly, 01 is less than 1.

David le Quesne SSCertifiable Points: 5229 More actions · Answer 3

I encountered a problem of this nature, when I found that the system i administer had a varchar field containing dates. When you sort the dates they come out in the order;

01/01/2006

01/02/2006

...

01/12/2006

...

02/01/2006

02/02/2006

David

If it ain't broke, don't fix it...

bledu SSChampion Points: 12335 More actions · Answer 4

sql will do a character sort on the varchar filed and then do the comparison, i would not use [between] for textual range comparison, best use it for numbers and datetime columns

Everything you can imagine is real.

Jeff Gray SSChampion Points: 10667 More actions · Answer 5

@W BETWEEN @X and @Y is shorthand for:

@W >= @X AND @W <= @Y

as shown in the sample execution plan below.

hth jg

StmtText

-----------------------

set showplan_text on

(1 row(s) affected)

StmtText

-----------------------------------------------------------------------

select * from HandheldUnit

where SerialNumber between '1' and '2'

(1 row(s) affected)

StmtText

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Clustered Index Seek(OBJECT[USFS].[dbo].[HandheldUnit].[PK_HandheldUnit]), SEEK[HandheldUnit].[SerialNumber] >= [@1] AND [HandheldUnit].[SerialNumber] <= [@2]) ORDERED FORWARD)

Deaf Programmer Grasshopper Points: 19 More actions · Answer 6

The where and between clauses are difficult to work with date expressions.

You can create an user defined function with four date parameters (two user-inputs and two data from a selected table) to be passed on and then returns a boolean of true or false. You can use this function in the where clause. If the two inputs are within the two data then returns 1 otherwise 0.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 7

>>You can create an user defined function with four

Right, but if I have a 100 million row transaction table and want to pull out a small subset of data for 1 day's transactions, I'll use standard SQL BETWEEN and allow the optimizer to use indexes, rather than the 100 million row resulting tablescan from a UDF

Deaf Programmer Grasshopper Points: 19 More actions · Answer 8

Deaf Programmer

Grasshopper

Points: 19

February 1, 2006 at 5:42 pm

#618260

Have you try it ?

PW-201837 SSC-Insane Points: 20805 More actions · Answer 9

>>Have you try it ?

Yep, but here's a small test to convince you that a UDF is a bad idea from a perfomance perspective:

-- Simple function to return Bit on date comparison

Create Function ufn_IsDateBetween(@DateCol As Datetime, @LowerDate As DateTime, @UpperDate As DateTime)

returns bit

as

begin

Return (

Select Case When @DateCol >= @LowerDate And @DateCol <= @UpperDate Then 1 Else 0 End

)

End

-- Temp table to hold dates

Create table #temp (

RowID int identity,

TheDate datetime

)

Create Index #ixDate on #temp (TheDate)

-- Create 10000 records

Insert Into #temp (TheDate)

Select top 10000 Null

From syscomments c1 Cross join syscomments c2

-- Set the dates on each record to be 1 day apart

Update #temp Set TheDate = DateAdd(dd, RowID, getDate())

-- Let's see what the optimiser does ...

Set Showplan_text On

go

-- Use SQL BETWEEN

Select * from #temp Where TheDate between '02 Feb 2006' And '04 Feb 2006'

-- Use a UDF in the where

Select * from #temp Where dbo.ufn_IsDateBetween(TheDate, '02 Feb 2006','04 Feb 2006') = 1

Results:

|--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017]))

|--Index Seek(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017].[#ixDate]), SEEK[#temp].[TheDate] >= 'Feb 2 2006 12:00AM' AND [#temp].[TheDate

|--Filter(WHEREConvert([dbo].[ufn_IsDateBetween]([#temp].[TheDate], 'Feb 2 2006 12:00AM', 'Feb 4 2006 12:00AM'))=1))

|--Table Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017]))

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 10

That is about as much misuse of UDFs as it is possible to do. A simple set-based statement is easy to write and will perform "infinitely" better, like Jeff shows.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Deaf Programmer Grasshopper Points: 19 More actions · Answer 11

I wouldn't be writing a query for what you wrote above. If you design the algorithm properly then the UDF will be useful. I recommend not to use the UDF in UPDATE, INSERT AND DELETE statements. The right type of query I wrote is similar to this example below:

-- dtmStartDate and dtmEndDate columns are part of table_name

-- Good query

SELECT COUNT(*) AS NumOfRec

FROM table_name

WHERE (dbo.udf_CheckDates(@varStartDate, @varEndDate, dtmStartDate, dtmEndDate) = 1)

AND (field_1= @ID_1)

AND (field_2= @ID_2)

AND (field_3= @ID_3)

-- bad query

SELECT COUNT(*) AS NumOfRec

FROM table_name

WHERE (dbo.udf_CheckDates(@varStartDate, @varEndDate, dtmStartDate, dtmEndDate) = 1)

-- The udf_CheckDates statements extraction:

IF @dtmStartDatePara = @dtmStartDateData

BEGIN

SET @boolean = 1

END

ELSE

IF (@dtmStartDatePara <= @dtmEndDateData) AND (@dtmStartDateData <= @dtmEndDatePara)

BEGIN

SET @boolean = 1

END

ELSE

SET @boolean = 0

END

Stewart Joslyn SSCertifiable Points: 6131 More actions · Answer 12

Surely, the real answer here is : dont use varchars for datetime if you are going to select on them. Add datetime columns if necessary and populate them with CONVERT(). Then you can write efficient queries without resorting to udfs (which I class with cursors - last resort only!).

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 13

You are missing the point. By doing this in a udf that needs to be called for each row you have effectively stopped SQL Server from using an index on the datetime column to decide which rows should be returned. A full scan of the table (heap or clustered index) will be necessary to return the results. This, plus the fact that invocation of the udf is much slower than using the existing functionality of t-sql will result in utterly poor performance.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Jeff Gray SSChampion Points: 10667 More actions · Answer 14

heh. I was just trying to show what BETWEEN does. Chris is right. In general, it is best to avoid performing computations (including convert and cast) on tables. It is far better to get the computations done on the scalar values when selecting rows from a tables, because it allows the best use of an index. This can be hard to see sometimes, because the optimizer can use an index, but it will be a scan, usually followed by a bookmark lookup to get the rest of the row.

I've had to fix many situations in which someone was looking for "today's" transactions by truncating the HHMMSS part of a column and looking for the result to equal today's midnight. It is far better to figure the beginning time that meets the criteria and the begin of the first time that is just outside the criteria, then select using a comparison between the two endpoints. Between is NOT a good way to do this, BTW.

Assuming the dates are all declared as datetime, including the MyDate column:

Select @SDate = '1/2/05', @EDate = '1/3/05'

Select Data from Table where

MyDate >= @SDate and MyDate < @EDate

Will give the best performance, regardless of whether there is an index on the MyDate column, because SQL doesn't have to do anything with each MyDate value in the table besides the comparison. Obviously, an index will make this perform much better given the likely selectivity of the MyDate column.

Note that the first comparison is an inclusive inequality and the second is exclusive. Logically, it must be that way. If I use inclusive on both ends, any MyDate that is exactly midnight will be selected on two different days. Not so good if it would result in a duplicate debit to my bank account.

Since BETWEEN is inclusive on both ends of a range, you must be careful when using it for dates.

hth jg