December 5, 2007 at 2:05 pm
I have data that I'm trying to clean and split the string Actual_Extract and get the Type,Code and Amount
1)
When I manipulate the ACTUAL_TEXT result I'm still getting the wrong result on some rows. For Example...See below
'01 759 9991 997 -0.560' Gives the right result = 759 ,it has 3 spaces
'01 759 9991 997 -0.360' Incomplete just gives a 9 instead of 759 , it has one space
'01 759 9991 997 -0.160' Incomplete just gives a 9 instead of 759 , it has one space
'01 759 9991 997 -0.660' Gives the right result = 759,it has 3 spaces
The spaces are not consistent. They could 1,2,3 etc..Tried replacing the double or more spaces
with replace(field,Space(2),space(1)) etc..Too many replace statements
2) Remove the non-numeric Values from the when extracting the Amount. i.e EE,CO,CB etc
Currently using the replace function. Problem is the non numeric value could be anything
Is there a way to remove it without using too many replace functions ?
'-20.000 EE01 6060 3 997' output = -20.000 EE , it should be -20.000
'-20.000 01 6060 3 997' output = -20.000 CB , it should be -20.000
3)
If actual_text = '0997 2247' the type should be NULL,Amount Should be NULL as well as Code because they is
no monetary Value in the String
4)
This data set is over 60 million rows, how do I insert into tbPROCESSED in batches ?
5)
Lastly if Actual_text is as listed below. Type,Amount & Code should be NUll
'33CR RT CHG .000 - 2.000'
'33CR RT CHG .000 - 2.000'
'33TI RT CHG 1.000 - .000'
'33TI RT CHG 1.000 - .000'
December 5, 2007 at 11:41 pm
Ray,
First, I don't see any rows in your test data that look like the following:
'-20.000 EE 01 6060 3 997' output = -20.000 EE , it should be -20.000
'-20.000 01 6060 3 997' output = -20.000 CB , it should be -20.000
Second, in the rows that look like the following, please identify what the Type, Code, and Amount would be...
'01 759 9991 997 -0.560'
'01 6060 3 997 -40.000 CB'
Third, are we allowed to add columns to the tbExtract table?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2007 at 12:19 am
Yes sure you can add columns. Is it for Cleaning the data ?
Actual_Text is currently = '01 759 9991 997 -0.560'
Desired output = Type should be '01'
, Code should be 759 and Amount should be -0.560 after extracting it from Actual_Text
Actual_Text is currently = '01 6060 3 997 -40.000 CB'
= Type should be '01'
, Code should be 6060 and Amount should be -40.000.
Actual_Text is currently = '01 6060 3 997 -50.000 CB'
= Type should be '01'
, Code should be 6060 and Amount should be -50.000.
December 8, 2007 at 12:14 am
Yes sure you can add columns. Is it for Cleaning the data ?
Yes... and for storing it, too. The following works on the data you provided for the tbExtract table. I would not try to update 60 million rows in a single pass... I'd probably build a loop to use the code below 60 times or so to split the updates to only about a million rows.
ALTER TABLE tbExtract
ADD Type VARCHAR(50),
Code VARCHAR(50),
Unused1 VARCHAR(50),
Amount VARCHAR(50),
Unused2 VARCHAR(50)
GO
DECLARE @Dummy VARCHAR(50)
DECLARE @String VARCHAR(50)
UPDATE tbExtract
SET @String = Actual_Text+' ',
@Dummy = Type = LEFT(@String,PATINDEX('%[0-9] %',@String)),
@String = SUBSTRING(@String,PATINDEX('% [0-9]%',@String)+1,50),
@Dummy = Code = LEFT(@String,PATINDEX('%[0-9] %',@String)),
@String = SUBSTRING(@String,PATINDEX('% [0-9]%',@String)+1,50),
@Dummy = Unused1 = LEFT(@String,PATINDEX('%[0-9] %',@String)),
@String = SUBSTRING(@String,PATINDEX('% [0-9]%',@String)+1,50),
@Dummy = Unused2 = LEFT(@String,PATINDEX('%[0-9] %',@String)),
@String = SUBSTRING(@String,PATINDEX('% [-0-9]%',@String)+1,50),
@Dummy = Amount = LEFT(@String,PATINDEX('%[0-9] %',@String))
WHERE Actual_Text NOT LIKE '%[a-z][a-z][a-z]%'
AND Actual_Text LIKE '[0-9][0-9] %'
SELECT * FROM tbEXTRACT
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2007 at 12:23 am
p.s. Code above assumes all "Amounts" are negative... may have to make a second pass for positive "Amounts".
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2007 at 12:41 am
Yes there are both negative and positive values in the monetary column.....
December 8, 2007 at 9:00 am
Ok, Ray... I've changed the code above... minor tweak... should do what you want now... give 'er a try on a test table like the one you posted...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2007 at 12:29 am
Thanks a lot it works....
December 12, 2007 at 12:32 am
Cool... thanks for the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2007 at 8:39 am
I seem to be having challenges that are text related..
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply