December 7, 2007 at 1:05 pm
I have a character column which will have varying data, and potentially varying lengths. It needs to be standardized to 4 characters, adding in zeroes either to the front or middle, depending if the value starts with a letter(s) or not.
A012
A12
12
1
AB12
AB2
etc.
Then I would be looking at evaluating them to update them all to
A012
A012
0012
0001
AB12
AB02
I know how I want to script the actual update, but first I need to know how many combinations there are in the data, so I know what to accomodate for.
Is there a simple way to poll this information to tell me how many combinations there are? Eg. something that would come back and tell me if each character is a letter or a number and return me the combo like:
letter-num-num-num
letter-num-num
num-num
etc.
I know there's a hard way to do this, but I've got to prep the information to give to some one else to actually gather the answer for me, so I want to make it as easy as possible.
December 7, 2007 at 1:19 pm
You can do it a few ways. Personally I'm a fan of using .NET regular expressions to do just that, but that involves allowing CLR integration, which some are leery to allow.
Failing that - you can use something like patindex to help figure it out. You'll have to run 8 separate searches, or columns.
Patindex allows you to do something like
select count(*),
sum(case when patindex('%[a-zA-Z][0-9][0-9][0-9]%',myfield)=1 then 1 else 0 end) as A000,
sum(case when patindex('%[a-zA-Z][a-zA-Z][0-9][0-9]%',myfield)=1 then 1 else 0 end) as AA00,
etc...
from
table1
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 8, 2007 at 10:59 am
You'll have to run 8 separate searches, or columns
Or... we can cheat like hell 😛
--===== This just creates demonstrable test data and is NOT part of the solution
SET NOCOUNT ON
DECLARE @TestData TABLE(String VARCHAR(10))
INSERT INTO @TestData (String)
SELECT 'A012' UNION ALL
SELECT 'A12' UNION ALL
SELECT '12' UNION ALL
SELECT '1' UNION ALL
SELECT 'AB12' UNION ALL
SELECT 'AB2'
--===== This is the solution for the known constraints on the data as posted
SELECT Original = String,
Modified = CASE
WHEN d.LastLetter = 0 THEN RIGHT('0000'+d.String,4)
WHEN d.LastLetter = 1 THEN LEFT(d.String,d.LastLetter)
+ RIGHT('0000'+SUBSTRING(d.String,d.FirstDigit,4),3)
WHEN d.LastLetter = 2 THEN LEFT(d.String,d.LastLetter)
+ RIGHT('0000'+SUBSTRING(d.String,d.FirstDigit,4),2)
END
FROM (--==== Derived table "d" finds the interface between letters/digits
SELECT String,
LastLetter = PATINDEX('%[a-z][0-9]%',String),
FirstDigit = PATINDEX('%[0-9]%',String)
FROM @TestData
)d
RESULTS:
=======
Original Modified
---------- -------------
A012 A012
A12 A012
12 0012
1 0001
AB12 AB12
AB2 AB02
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2007 at 11:14 am
...or, perhaps, something a bit more "auto-magic"... 😀
--===== This just creates demonstrable test data and is NOT part of the solution
SET NOCOUNT ON
DECLARE @TestData TABLE(String VARCHAR(10))
INSERT INTO @TestData (String)
SELECT 'A012' UNION ALL
SELECT 'A12' UNION ALL
SELECT '12' UNION ALL
SELECT '1' UNION ALL
SELECT 'AB12' UNION ALL
SELECT 'AB2'
--===== This is the solution for the known constraints on the data as posted
DECLARE @MaxWidth TINYINT
SET @MaxWidth = (SELECT MAX(LEN(String)) FROM @TestData)
SELECT Original = String,
Modified = LEFT(d.String,d.LastLetter)
+ RIGHT('0000'+SUBSTRING(d.String,d.FirstDigit,4),@MaxWidth-d.LastLetter)
FROM (--==== Derived table "d" finds the interface between letters/digits
SELECT String,
LastLetter = PATINDEX('%[a-z][0-9]%',String),
FirstDigit = PATINDEX('%[0-9]%',String)
FROM @TestData
)d
RESULTS:
========
Original Modified
---------- ------------------
A012 A012
A12 A012
12 0012
1 0001
AB12 AB12
AB2 AB02
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2007 at 1:06 pm
:)Or in a simple query:
SELECTstring Original
,CASEWHEN ISNUMERIC(string) = 0
THEN LEFT(string, PATINDEX('%[0-9]%',string)-1)
ELSE '' END-- Characters first
+ REPLICATE('0', 4-LEN(string))-- Stuff with zeros
+ SUBSTRING(string, PATINDEX('%[0-9]%',string), 4)-- Add numeric remainder
ASModified
FROM @TestData
Best Regards,
Chris Büttner
December 8, 2007 at 5:15 pm
Drat... both of ours come up with a little problem when we have data like this...
--===== This just creates demonstrable test data and is NOT part of the solution
SET NOCOUNT ON
DECLARE @TestData TABLE(String VARCHAR(10))
INSERT INTO @TestData (String)
SELECT 'A012' UNION ALL
SELECT 'A12' UNION ALL
SELECT '12' UNION ALL
SELECT '1' UNION ALL
SELECT 'AB12' UNION ALL
SELECT 'AB2' UNION ALL
SELECT 'A' UNION ALL
SELECT ''
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2007 at 5:55 pm
This takes care of the single letter and blank situations...
--===== This is the solution for the known constraints on the data as posted
DECLARE @MaxWidth TINYINT
SET @MaxWidth = (SELECT MAX(LEN(String)) FROM @TestData)
SELECT Original = String,
Modified = LEFT(d.String,d.LastLetter)
+ RIGHT(
REPLICATE('0',@MaxWidth)
+ SUBSTRING(d.String,d.LastLetter+1,@MaxWidth)
, @MaxWidth-d.LastLetter)
FROM (--==== Derived table "d" finds the interface between letters/digits
SELECT String,
LastLetter = PATINDEX('%[a-z][0-9]%',String+'0')
FROM @TestData
)d
If ya really gotta have it as a single query, replace @MaxWidth with 4 and replace d.LastLetter with the formula for LastLetter.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 17, 2007 at 8:04 am
Interesting, definitely some syntax in there I have to learn..
Jeff, your "auto magic" one came the closest but I got some extra zeros I haven't quite figured yet -
Original Modified
-------------------
A14 A00014
B2 B0002
OBT 0000OBT
YH3 YH0003
XX 0000XX
So - too many zeroes, the whole string must be 4. And if there was no digit found it puts the zeros at the beginning of the string rather than after the letters.
Your query accomodating for blanks didn't give me any modification, similarly the earlier non-auto magic query didn't produce a modification.
And the other query offered, the simple one -
SELECTString Original
,CASEWHEN ISNUMERIC(Page) = 0
THEN LEFT(Page, PATINDEX('%[0-9]%',String)-1)
ELSE '' END-- Characters first
+ REPLICATE('0', 4-LEN(Page))-- Stuff with zeros
+ SUBSTRING(String, PATINDEX('%[0-9]%',String), 4)-- Add numeric remainder
ASModified
FROM @TestData
This one gives me some results for short page lengths but none of the more complex ones are returned and I got an error "invalid length parameter passed to the substring function."
The query would only be evaluating records where the field has content (so no worries about blanks) and where the content is less than 4 characters. If it has 4 or more it will be ignored in the update.
December 17, 2007 at 8:36 am
The following query should also take care of values without digits:
SELECTstring Original
,CASEWHEN PATINDEX('%[0-9]%',string) > 0
THEN LEFT(string, PATINDEX('%[0-9]%',string)-1)
ELSE RTRIM(string) END-- Characters first
+ REPLICATE('0', 4-LEN(string))-- Stuff with zeros
+ CASEWHEN PATINDEX('%[0-9]%',string) > 0
THEN SUBSTRING(string, PATINDEX('%[0-9]%',string), 4)
ELSE '' END-- Numeric Trailer
ASModified
FROM @TestData
Best Regards,
Chris Büttner
December 17, 2007 at 11:04 am
Jeff, your "auto magic" one came the closest but I got some extra zeros I haven't quite figured yet -
Original Modified
-------------------
A14 A00014
B2 B0002
OBT 0000OBT
YH3 YH0003
XX 0000XX
So - too many zeroes, the whole string must be 4. And if there was no digit found it puts the zeros at the beginning of the string rather than after the letters.
Yes... in my first attempt, that's what happens... did you try the second attempt? For your convenience, here it is again...
--===== This just creates demonstrable test data and is NOT part of the solution
SET NOCOUNT ON
DECLARE @TestData TABLE(String VARCHAR(10))
INSERT INTO @TestData (String)
SELECT 'A012' UNION ALL
SELECT 'A12' UNION ALL
SELECT '12' UNION ALL
SELECT '1' UNION ALL
SELECT 'AB12' UNION ALL
SELECT 'AB2' UNION ALL
SELECT 'A' UNION ALL
SELECT '' UNION ALL
SELECT 'A14' UNION ALL
SELECT 'B2' UNION ALL
SELECT 'OBT' UNION ALL
SELECT 'YH3' UNION ALL
SELECT 'XX'
--===== This is the solution for the known constraints on the data as posted
DECLARE @MaxWidth TINYINT
SET @MaxWidth = (SELECT MAX(LEN(String)) FROM @TestData)
SELECT Original = String,
Modified = LEFT(d.String,d.LastLetter)
+ RIGHT(
REPLICATE('0',@MaxWidth)
+ SUBSTRING(d.String,d.LastLetter+1,@MaxWidth)
, @MaxWidth-d.LastLetter)
FROM (--==== Derived table "d" finds the interface between letters/digits
SELECT String,
LastLetter = PATINDEX('%[a-z][0-9]%',String+'0')
FROM @TestData
)d
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2007 at 9:49 am
Yes I did try that one, and it didn't appear to make a modification, but I ran it again, here's a snippet of the results:
Original Modified
-----------------------------------
B10 B0
B101 B01
B11 B1
B12 B2
Here's a test sample of what my data looks like, this should get you the same results as I see.
DECLARE @TestData TABLE(page char(32))
insert @testdata (page)
select 'G18' union all
select 'G2' union all
select 'D99' union all
select 'A08' union all
select '16' union all
select 'ENT' union all
select 'M183' union all
select 'PAGE1'
DECLARE @MaxWidth TINYINT
SET @MaxWidth = (SELECT MAX(LEN(Page)) FROM Arch1)
SELECT Original = Page,
Modified = LEFT(d.Page,d.LastLetter)
+ RIGHT(
REPLICATE('0',@MaxWidth)
+ SUBSTRING(d.Page,d.LastLetter+1,@MaxWidth)
, @MaxWidth-d.LastLetter)
FROM (--==== Derived table "d" finds the interface between letters/digits
SELECT Page,
LastLetter = PATINDEX('%[a-z][0-9]%',Page+'0')
FROM @testdata
)d
group by page, d.lastletter
order by page
December 18, 2007 at 10:41 am
Mindy, the problem is that you have the data stored in a CHAR(32)... trailing blanks come into play there. Either change the column to a VARCHAR(32) or modify my code as follows...
DECLARE @TestData TABLE(page char(32))
insert @testdata (page)
select 'G18' union all
select 'G2' union all
select 'D99' union all
select 'A08' union all
select '16' union all
select 'ENT' union all
select 'M183' union all
select 'PAGE1'
--===== This is the solution for the known constraints on the data as posted
DECLARE @MaxWidth TINYINT
SET @MaxWidth = (SELECT MAX(LEN(page)) FROM @TestData)
SELECT Original = page,
Modified = LEFT(d.page,d.LastLetter)
+ RIGHT(
REPLICATE('0',@MaxWidth)
+ SUBSTRING(d.page,d.LastLetter+1,@MaxWidth)
, @MaxWidth-d.LastLetter)
FROM (--==== Derived table "d" finds the interface between letters/digits
SELECT Page = RTRIM(page),
LastLetter = PATINDEX('%[a-z][0-9]%',RTRIM(page)+'0')
FROM @TestData
)d
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2007 at 2:24 pm
Aaah, well I don't have control of the schema but that makes sense. Thanks! 🙂
December 19, 2007 at 7:21 am
Like I said... use the code I just posted... it takes the CHAR(32) into account...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2007 at 7:39 am
It's still putting in an additional zero, but I'll figure that out. The syntax is a little over my head so this is a good project for me.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply