Apparent Error with BETWEEN operator

Question

Apparent Error with BETWEEN operator

Jim Howell

Ten Centuries

Points: 1010
More actions
October 18, 2007 at 7:27 am

#75797

We use the BETWEEN operator to find data in character ranges. We just came across a situation where BETWEEN is returning multiple records and I confess that I'm clueless as to the cause.
The example:
CREATE TABLE [dbo].[test] (
[col_1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[col_2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
Data:
col 1 Col 2
MSD1558001MSD1560000
MSD154001MSD156000
MSD14001MSD16000
select * from test where 'MSD1558017' between col_1 and col_2
By inspection, this should return the first data set. In reality (our dev system and production system 🙂 ) it returns all three. I have also tried this using >= and the same thing happens.
The only partial clue would be something collation related, but I am truely at a loss. Any ideas would be very much appreciated!
Thx --Jim

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Simon Liddle SSCrazy Points: 2580 More actions · Answer 1

Are you wanting to compare them as strings or are you trying to find whether the numerical part of MSD1558017 falls between the numerical parts of col_1 and col_2?

I'm a bit rusty with my string comparisons but I would expect the statement to return all three rows as the first four characters match and the fifth character falls between the fifth characters of each row (if this makes any sense)! 🙂

mrpolecat SSCertifiable Points: 7014 More actions · Answer 2

with char datatypes between (and > or < for that matter) do a character by character comparison not looking at the whole field as it would with numbers. If the second character of what you are comparing is between the second characters of the two items you are comparing to and the first character is the same it will return true regardless of the rest of the characters in the string.

Jim Howell Ten Centuries Points: 1010 More actions · Answer 3

That could be the answer! In the real world, these numbers are barcoded on envelopes and we send and receive a couple hundred thousand a month. In light of the above post, I'm going to test by padding the numeric values left with zero's to maintain strings of equal length. Thanks for the insite!

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 4

I'm not a 100% on this, but I believe the BETWEEN function does it's comparisons at the binary level. At least that's my best guess.

thus......(extra zeros removed)

MSD154001_________converted to binary would be______0x4D534431353430303100........

MSD156000_________converted to binary would be______0x4D534431353630303000........

and

MSD1558017________converted to binary would be______0x4D534431353538303137........

which IS between the previous values.

SELECT

CAST('MSD154001' AS BINARY) AS 'MSD154001'

,CAST('MSD1558017' AS BINARY) AS 'MSD1558017'

,CAST('MSD156000' AS BINARY) AS 'MSD156000'

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

Phil Factor SSC-Insane Points: 20244 More actions · Answer 5

This solution assumes a big table lots of new data coming in. It also assumes that all have got the MSD prefix which I could be wrong about. if it is always three Alpha then use STUFF. If the prefix could be anything, then extract the integer by finding the transition between alpha and number with PATINDEX. You might need to play with the indexing strategy but I suspect that a covering compound index including both col1asinteger and col2AsInteger would do the trick.

[font="Courier New"]CREATE TABLE [dbo].[#test] (

[col_1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[col_2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

Col1AsInteger INT,

Col2AsInteger INT

) ON [PRIMARY]

GO

INSERT INTO #test(col_1,col_2) SELECT 'MSD1558001', 'MSD1560000'

INSERT INTO #test(col_1,col_2) SELECT 'MSD154001', 'MSD156000'

INSERT INTO #test(col_1,col_2) SELECT 'MSD14001', 'MSD16000'

--and so on, several thousand of 'em

--now update all new entries in the table

UPDATE #test SET Col1AsInteger=REPLACE(col_1,'MSD',''),

Col2AsInteger=REPLACE(col_2,'MSD','')

WHERE Col1AsInteger+Col2AsInteger IS NULL--where either are null

SELECT col_1,col_2 FROM #test

WHERE REPLACE('MSD1558017','MSD','')

BETWEEN Col1AsInteger AND Col2AsInteger

[/font]

Best wishes,
Phil Factor

Sergiy SSC Guru Points: 110208 More actions · Answer 6

Jim Howell (10/18/2007)
That could be the answer! In the real world, these numbers are barcoded on envelopes and we send and receive a couple hundred thousand a month. In light of the above post, I'm going to test by padding the numeric values left with zero's to maintain strings of equal length. Thanks for the insite!

Slight touch of normalisation always gives a positive effect on databases.

If you'd have Code and Number separated and stored in columns with appropriate datatype you'd never have such problem.

Replacing varchar Code with smallint CodeID would improve performance of you queries, both saving and retrieving data. Not to mention dramatic decrease in total size of the tables.

_____________
Code for TallyGenerator

Jim Howell Ten Centuries Points: 1010 More actions · Answer 7

Phil--

Good assumptions! This is a solution that will work for us. We can't always control the formatting of the barcodes because some come from outside vendors. This is a very elegant solution that I will apply to our development server. Much appreciated.

Thx --Jim

Sergiy SSC Guru Points: 110208 More actions · Answer 8

[font="Courier New"]--Create 2 tables to hold normalized data

IF Object_ID('dbo.PrefixCode', 'U') IS NOT NULL

DROP TABLE dbo.PrefixCode

CREATE TABLE dbo.PrefixCode (

ID smallint IDENTITY(-32767, 1) NOT NULL PRIMARY KEY NONCLUSTERED,

Code nvarchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

)

CREATE UNIQUE CLUSTERED INDEX UX_PrefixCode_Code ON dbo.PrefixCode (Code)

IF Object_ID('dbo.BarCodeRange', 'U') IS NOT NULL

DROP TABLE dbo.BarCodeRange

CREATE TABLE dbo.BarCodeRange (

PrefixCodeID smallint NOT NULL,

StartNumber int NOT NULL,

EndNumber int NOT NULL

)

CREATE INDEX UX_BarCodeRange_EndNumber ON dbo.BarCodeRange (EndNumber)

GO

--Create view to represent data in denormalized form

IF Object_ID('dbo.BarCodeRange_Strings', 'V') IS NOT NULL

DROP VIEW dbo.BarCodeRange_Strings

GO

CREATE VIEW dbo.BarCodeRange_Strings

AS

SELECT C.Code, R.StartNumber, R.EndNumber,

C.Code + convert(nvarchar(9), R.StartNumber) as Col_1,

C.Code + convert(nvarchar(9), R.EndNumber) as Col_2

FROM dbo.PrefixCode C

INNER JOIN dbo.BarCodeRange R ON C.ID = R.PrefixCodeID

Go

--Create a stored procedure to populate tables from denormalized input

CREATE TRIGGER dbo.BarCodeRange_Strings_INSERT

ON dbo.BarCodeRange_Strings

INSTEAD OF INSERT

AS

DECLARE @Temp TABLE (

NewCode nvarchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

NewStartNumber nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

NewEndNumber nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

)

INSERT INTO @Temp (NewCode, NewStartNumber, NewEndNumber)

SELECT SUBSTRING(Col_1, 1, PATINDEX('%[0-9]%', Col_1) -1) as NewCode,

Col_1 as NewStartNumber, Col_2 AS NewEndNumber

FROM inserted

UNION -- we need distinct values

SELECT SUBSTRING(Col_2, 1, PATINDEX('%[0-9]%', Col_2) -1) as NewCode, Col_1, Col_2

FROM inserted

INSERT INTO dbo.PrefixCode

(Code)

SELECT N.NewCode

FROM @Temp N

WHERE NOT EXISTS (select 1 from dbo.PrefixCode C where N.NewCode = C.Code)

GROUP BY N.NewCode

INSERT INTO dbo.BarCodeRange

(PrefixCodeID, StartNumber, EndNumber)

SELECT C.ID, REPLACE(N.NewStartNumber, C.Code, ''), REPLACE(N.NewEndNumber, C.Code, '')

FROM dbo.PrefixCode C

INNER JOIN @Temp N ON N.NewCode = C.Code

WHERE NOT EXISTS (select 1 from dbo.BarCodeRange R

where R.PrefixCodeID = C.ID AND R.StartNumber = REPLACE(N.NewStartNumber, C.Code, '')

)

GO

--Insert test data

INSERT INTO dbo.BarCodeRange_Strings

(Code, StartNumber, EndNumber, Col_1, Col_2)

SELECT 0, '', '', 'MSD1558001', 'MSD1560000'

UNION

SELECT 0, '', '', 'MSD154001', 'MSD156000'

UNION

SELECT 0, '', '', 'MSD14001', 'MSD16000'

UNION

SELECT 0, '', '', 'ASD154001', 'ASD156000'

UNION

SELECT 0, '', '', 'ASD14001', 'ASD16000'

--Check result

SELECT * FROM dbo.BarCodeRange_Strings

--Run simple test

DECLARE @TestCode nvarchar(50)

SET @TestCode = 'MSD1558017'

SELECT * FROM dbo.BarCodeRange_Strings

WHERE Code = SUBSTRING(@TestCode, 1, PATINDEX('%[0-9]%', @TestCode) -1)

AND REPLACE(@TestCode, Code, '') >= StartNumber

AND REPLACE(@TestCode, Code, '') <= EndNumber

[/font]

This will work with any given codes, even in foreign languages, no need to hardcode known codes

_____________
Code for TallyGenerator

Jim Howell Ten Centuries Points: 1010 More actions · Answer 9

Sergiy--

Thank you for this solution. I will try this in our dev system today. Thanks all for your helpfulness on this issue. This is a very worthwhile forum!

--Jim