January 31, 2006 at 12:54 pm
Greetings,
This may seem like a stupid question but I've never stopped to look at it beyond the simple 'it works'
I have a column with data in it: '001-1210-105'
When I do the following SQL it returns a result:
SELECT GL.vchrACCTNUMB FROM GENLACCT GL where GL.vchrACCTNUMB between '001-1210' and '001-1219'
My question is WHY? I always used it on date fields before and just figured there was some internal mechanism that just knew how to do date comparisons, but here I have a varchar. I've tested out a few scenarios adding records inside and outside the range and changing the between range and it really seems like it does EXACTLY what I want it to do but I'm just not happy not necessarily knowing why...
Any tech articles or just thoughts would be appreciated.
regards,
Chris
February 1, 2006 at 2:18 am
Hi Chris,
I'm not an expert, but my comprehension is that when comparing strings, SQL Server does a character by character comparison of the ASCII codes, starting with the leftmost characters.
A string is greater than another string if the ASCII code of the corresponding character is greater.
'001-1219' is greater than '001-1210' because the ASCII code of the 8th character '9' is 57, and the ASCII code of '0' is 48.
A string is also greater than another string if it starts with the same characters, but has additional characters.
'001-1210-105' is greater than '001-1210' because it has additional characters.
However, '001-1210-105' is less than '001-1219' because the comparison is character by character, and the ASCII code of the 8th character is less.
I can't find a reference for this, but perhaps someone else can oblige.
David
If it ain't broke, don't fix it...
February 1, 2006 at 6:09 am
Dates are stored internally as numbers (zero is 1 January 1900, then count days with time being the decimal part) so the comparison actually is on a number range (and, as SQL Server only has date-time, it is easy to get the end of your range wrong since the raw date only covers the first couple of milliseconds of that day).
If you are comparing numbers as strings, you have to be very carefull. For example, 0 is less than 0 because the space character is ASCII 32 while zero is 48. Similarly, 01 is less than 1.
February 1, 2006 at 6:33 am
I encountered a problem of this nature, when I found that the system i administer had a varchar field containing dates. When you sort the dates they come out in the order;
01/01/2006
01/02/2006
...
01/12/2006
...
02/01/2006
02/02/2006
David
If it ain't broke, don't fix it...
February 1, 2006 at 10:12 am
February 1, 2006 at 3:33 pm
@W BETWEEN @X and @Y is shorthand for:
@W >= @X AND @W <= @Y
as shown in the sample execution plan below.
hth jg
StmtText
-----------------------
set showplan_text on
(1 row(s) affected)
StmtText
-----------------------------------------------------------------------
select * from HandheldUnit
where SerialNumber between '1' and '2'
(1 row(s) affected)
StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Clustered Index Seek(OBJECT[USFS].[dbo].[HandheldUnit].[PK_HandheldUnit]), SEEK[HandheldUnit].[SerialNumber] >= [@1] AND [HandheldUnit].[SerialNumber] <= [@2]) ORDERED FORWARD)
February 1, 2006 at 4:35 pm
The where and between clauses are difficult to work with date expressions.
You can create an user defined function with four date parameters (two user-inputs and two data from a selected table) to be passed on and then returns a boolean of true or false. You can use this function in the where clause. If the two inputs are within the two data then returns 1 otherwise 0.
February 1, 2006 at 4:44 pm
>>You can create an user defined function with four
Right, but if I have a 100 million row transaction table and want to pull out a small subset of data for 1 day's transactions, I'll use standard SQL BETWEEN and allow the optimizer to use indexes, rather than the 100 million row resulting tablescan from a UDF
February 1, 2006 at 5:42 pm
Have you try it ?
February 1, 2006 at 10:32 pm
>>Have you try it ?
Yep, but here's a small test to convince you that a UDF is a bad idea from a perfomance perspective:
-- Simple function to return Bit on date comparison
Create Function ufn_IsDateBetween(@DateCol As Datetime, @LowerDate As DateTime, @UpperDate As DateTime)
returns bit
as
begin
Return (
Select Case When @DateCol >= @LowerDate And @DateCol <= @UpperDate Then 1 Else 0 End
)
End
-- Temp table to hold dates
Create table #temp (
RowID int identity,
TheDate datetime
)
Create Index #ixDate on #temp (TheDate)
-- Create 10000 records
Insert Into #temp (TheDate)
Select top 10000 Null
From syscomments c1 Cross join syscomments c2
-- Set the dates on each record to be 1 day apart
Update #temp Set TheDate = DateAdd(dd, RowID, getDate())
-- Let's see what the optimiser does ...
Set Showplan_text On
go
-- Use SQL BETWEEN
Select * from #temp Where TheDate between '02 Feb 2006' And '04 Feb 2006'
-- Use a UDF in the where
Select * from #temp Where dbo.ufn_IsDateBetween(TheDate, '02 Feb 2006','04 Feb 2006') = 1
Results:
|--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017]))
|--Index Seek(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017].[#ixDate]), SEEK[#temp].[TheDate] >= 'Feb 2 2006 12:00AM' AND [#temp].[TheDate
|--Filter(WHEREConvert([dbo].[ufn_IsDateBetween]([#temp].[TheDate], 'Feb 2 2006 12:00AM', 'Feb 4 2006 12:00AM'))=1))
|--Table Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017]))
February 2, 2006 at 12:48 am
That is about as much misuse of UDFs as it is possible to do. A simple set-based statement is easy to write and will perform "infinitely" better, like Jeff shows.
February 2, 2006 at 12:51 am
I wouldn't be writing a query for what you wrote above. If you design the algorithm properly then the UDF will be useful. I recommend not to use the UDF in UPDATE, INSERT AND DELETE statements. The right type of query I wrote is similar to this example below:
-- dtmStartDate and dtmEndDate columns are part of table_name
-- Good query
SELECT COUNT(*) AS NumOfRec
FROM table_name
WHERE (dbo.udf_CheckDates(@varStartDate, @varEndDate, dtmStartDate, dtmEndDate) = 1)
AND (field_1= @ID_1)
AND (field_2= @ID_2)
AND (field_3= @ID_3)
-- bad query
SELECT COUNT(*) AS NumOfRec
FROM table_name
WHERE (dbo.udf_CheckDates(@varStartDate, @varEndDate, dtmStartDate, dtmEndDate) = 1)
-- The udf_CheckDates statements extraction:
IF @dtmStartDatePara = @dtmStartDateData
BEGIN
SET @boolean = 1
END
ELSE
IF (@dtmStartDatePara <= @dtmEndDateData) AND (@dtmStartDateData <= @dtmEndDatePara)
BEGIN
SET @boolean = 1
END
ELSE
SET @boolean = 0
END
February 2, 2006 at 1:49 am
Surely, the real answer here is : dont use varchars for datetime if you are going to select on them. Add datetime columns if necessary and populate them with CONVERT(). Then you can write efficient queries without resorting to udfs (which I class with cursors - last resort only!).
February 2, 2006 at 3:04 am
You are missing the point. By doing this in a udf that needs to be called for each row you have effectively stopped SQL Server from using an index on the datetime column to decide which rows should be returned. A full scan of the table (heap or clustered index) will be necessary to return the results. This, plus the fact that invocation of the udf is much slower than using the existing functionality of t-sql will result in utterly poor performance.
February 2, 2006 at 7:12 am
heh. I was just trying to show what BETWEEN does. Chris is right. In general, it is best to avoid performing computations (including convert and cast) on tables. It is far better to get the computations done on the scalar values when selecting rows from a tables, because it allows the best use of an index. This can be hard to see sometimes, because the optimizer can use an index, but it will be a scan, usually followed by a bookmark lookup to get the rest of the row.
I've had to fix many situations in which someone was looking for "today's" transactions by truncating the HHMMSS part of a column and looking for the result to equal today's midnight. It is far better to figure the beginning time that meets the criteria and the begin of the first time that is just outside the criteria, then select using a comparison between the two endpoints. Between is NOT a good way to do this, BTW.
Assuming the dates are all declared as datetime, including the MyDate column:
Select @SDate = '1/2/05', @EDate = '1/3/05'
Select Data from Table where
MyDate >= @SDate and MyDate < @EDate
Will give the best performance, regardless of whether there is an index on the MyDate column, because SQL doesn't have to do anything with each MyDate value in the table besides the comparison. Obviously, an index will make this perform much better given the likely selectivity of the MyDate column.
Note that the first comparison is an inclusive inequality and the second is exclusive. Logically, it must be that way. If I use inclusive on both ends, any MyDate that is exactly midnight will be selected on two different days. Not so good if it would result in a duplicate debit to my bank account.
Since BETWEEN is inclusive on both ends of a range, you must be careful when using it for dates.
hth jg
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply