October 18, 2007 at 7:27 am
We use the BETWEEN operator to find data in character ranges. We just came across a situation where BETWEEN is returning multiple records and I confess that I'm clueless as to the cause.
The example:
CREATE TABLE [dbo].[test] (
[col_1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[col_2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
Data:
col 1 Col 2
MSD1558001MSD1560000
MSD154001MSD156000
MSD14001MSD16000
select * from test where 'MSD1558017' between col_1 and col_2
By inspection, this should return the first data set. In reality (our dev system and production system 🙂 ) it returns all three. I have also tried this using >= and the same thing happens.
The only partial clue would be something collation related, but I am truely at a loss. Any ideas would be very much appreciated!
Thx --Jim
October 18, 2007 at 7:44 am
Are you wanting to compare them as strings or are you trying to find whether the numerical part of MSD1558017 falls between the numerical parts of col_1 and col_2?
I'm a bit rusty with my string comparisons but I would expect the statement to return all three rows as the first four characters match and the fifth character falls between the fifth characters of each row (if this makes any sense)! 🙂
October 18, 2007 at 7:44 am
with char datatypes between (and > or < for that matter) do a character by character comparison not looking at the whole field as it would with numbers. If the second character of what you are comparing is between the second characters of the two items you are comparing to and the first character is the same it will return true regardless of the rest of the characters in the string.
October 18, 2007 at 7:51 am
That could be the answer! In the real world, these numbers are barcoded on envelopes and we send and receive a couple hundred thousand a month. In light of the above post, I'm going to test by padding the numeric values left with zero's to maintain strings of equal length. Thanks for the insite!
October 18, 2007 at 7:54 am
I'm not a 100% on this, but I believe the BETWEEN function does it's comparisons at the binary level. At least that's my best guess.
thus......(extra zeros removed)
MSD154001_________converted to binary would be______0x4D534431353430303100........
MSD156000_________converted to binary would be______0x4D534431353630303000........
and
MSD1558017________converted to binary would be______0x4D534431353538303137........
which IS between the previous values.
SELECT
CAST('MSD154001' AS BINARY) AS 'MSD154001'
,CAST('MSD1558017' AS BINARY) AS 'MSD1558017'
,CAST('MSD156000' AS BINARY) AS 'MSD156000'
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgOctober 18, 2007 at 2:51 pm
This solution assumes a big table lots of new data coming in. It also assumes that all have got the MSD prefix which I could be wrong about. if it is always three Alpha then use STUFF. If the prefix could be anything, then extract the integer by finding the transition between alpha and number with PATINDEX. You might need to play with the indexing strategy but I suspect that a covering compound index including both col1asinteger and col2AsInteger would do the trick.
[font="Courier New"]CREATE TABLE [dbo].[#test] (
[col_1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[col_2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
Col1AsInteger INT,
Col2AsInteger INT
) ON [PRIMARY]
GO
INSERT INTO #test(col_1,col_2) SELECT 'MSD1558001', 'MSD1560000'
INSERT INTO #test(col_1,col_2) SELECT 'MSD154001', 'MSD156000'
INSERT INTO #test(col_1,col_2) SELECT 'MSD14001', 'MSD16000'
--and so on, several thousand of 'em
--now update all new entries in the table
UPDATE #test SET Col1AsInteger=REPLACE(col_1,'MSD',''),
Col2AsInteger=REPLACE(col_2,'MSD','')
WHERE Col1AsInteger+Col2AsInteger IS NULL--where either are null
SELECT col_1,col_2 FROM #test
WHERE REPLACE('MSD1558017','MSD','')
BETWEEN Col1AsInteger AND Col2AsInteger
[/font]
Best wishes,
Phil Factor
October 18, 2007 at 2:55 pm
Jim Howell (10/18/2007)
That could be the answer! In the real world, these numbers are barcoded on envelopes and we send and receive a couple hundred thousand a month. In light of the above post, I'm going to test by padding the numeric values left with zero's to maintain strings of equal length. Thanks for the insite!
Slight touch of normalisation always gives a positive effect on databases.
If you'd have Code and Number separated and stored in columns with appropriate datatype you'd never have such problem.
Replacing varchar Code with smallint CodeID would improve performance of you queries, both saving and retrieving data. Not to mention dramatic decrease in total size of the tables.
_____________
Code for TallyGenerator
October 18, 2007 at 2:57 pm
Phil--
Good assumptions! This is a solution that will work for us. We can't always control the formatting of the barcodes because some come from outside vendors. This is a very elegant solution that I will apply to our development server. Much appreciated.
Thx --Jim
October 18, 2007 at 3:49 pm
[font="Courier New"]--Create 2 tables to hold normalized data
IF Object_ID('dbo.PrefixCode', 'U') IS NOT NULL
DROP TABLE dbo.PrefixCode
CREATE TABLE dbo.PrefixCode (
ID smallint IDENTITY(-32767, 1) NOT NULL PRIMARY KEY NONCLUSTERED,
Code nvarchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
CREATE UNIQUE CLUSTERED INDEX UX_PrefixCode_Code ON dbo.PrefixCode (Code)
IF Object_ID('dbo.BarCodeRange', 'U') IS NOT NULL
DROP TABLE dbo.BarCodeRange
CREATE TABLE dbo.BarCodeRange (
PrefixCodeID smallint NOT NULL,
StartNumber int NOT NULL,
EndNumber int NOT NULL
)
CREATE INDEX UX_BarCodeRange_EndNumber ON dbo.BarCodeRange (EndNumber)
GO
--Create view to represent data in denormalized form
IF Object_ID('dbo.BarCodeRange_Strings', 'V') IS NOT NULL
DROP VIEW dbo.BarCodeRange_Strings
GO
CREATE VIEW dbo.BarCodeRange_Strings
AS
SELECT C.Code, R.StartNumber, R.EndNumber,
C.Code + convert(nvarchar(9), R.StartNumber) as Col_1,
C.Code + convert(nvarchar(9), R.EndNumber) as Col_2
FROM dbo.PrefixCode C
INNER JOIN dbo.BarCodeRange R ON C.ID = R.PrefixCodeID
Go
--Create a stored procedure to populate tables from denormalized input
CREATE TRIGGER dbo.BarCodeRange_Strings_INSERT
ON dbo.BarCodeRange_Strings
INSTEAD OF INSERT
AS
DECLARE @Temp TABLE (
NewCode nvarchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
NewStartNumber nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
NewEndNumber nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
INSERT INTO @Temp (NewCode, NewStartNumber, NewEndNumber)
SELECT SUBSTRING(Col_1, 1, PATINDEX('%[0-9]%', Col_1) -1) as NewCode,
Col_1 as NewStartNumber, Col_2 AS NewEndNumber
FROM inserted
UNION -- we need distinct values
SELECT SUBSTRING(Col_2, 1, PATINDEX('%[0-9]%', Col_2) -1) as NewCode, Col_1, Col_2
FROM inserted
INSERT INTO dbo.PrefixCode
(Code)
SELECT N.NewCode
FROM @Temp N
WHERE NOT EXISTS (select 1 from dbo.PrefixCode C where N.NewCode = C.Code)
GROUP BY N.NewCode
INSERT INTO dbo.BarCodeRange
(PrefixCodeID, StartNumber, EndNumber)
SELECT C.ID, REPLACE(N.NewStartNumber, C.Code, ''), REPLACE(N.NewEndNumber, C.Code, '')
FROM dbo.PrefixCode C
INNER JOIN @Temp N ON N.NewCode = C.Code
WHERE NOT EXISTS (select 1 from dbo.BarCodeRange R
where R.PrefixCodeID = C.ID AND R.StartNumber = REPLACE(N.NewStartNumber, C.Code, '')
)
GO
--Insert test data
INSERT INTO dbo.BarCodeRange_Strings
(Code, StartNumber, EndNumber, Col_1, Col_2)
SELECT 0, '', '', 'MSD1558001', 'MSD1560000'
UNION
SELECT 0, '', '', 'MSD154001', 'MSD156000'
UNION
SELECT 0, '', '', 'MSD14001', 'MSD16000'
UNION
SELECT 0, '', '', 'ASD154001', 'ASD156000'
UNION
SELECT 0, '', '', 'ASD14001', 'ASD16000'
--Check result
SELECT * FROM dbo.BarCodeRange_Strings
--Run simple test
DECLARE @TestCode nvarchar(50)
SET @TestCode = 'MSD1558017'
SELECT * FROM dbo.BarCodeRange_Strings
WHERE Code = SUBSTRING(@TestCode, 1, PATINDEX('%[0-9]%', @TestCode) -1)
AND REPLACE(@TestCode, Code, '') >= StartNumber
AND REPLACE(@TestCode, Code, '') <= EndNumber
[/font]
This will work with any given codes, even in foreign languages, no need to hardcode known codes
_____________
Code for TallyGenerator
October 19, 2007 at 6:06 am
Sergiy--
Thank you for this solution. I will try this in our dev system today. Thanks all for your helpfulness on this issue. This is a very worthwhile forum!
--Jim
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply