March 16, 2009 at 12:44 pm
How do I extract the following information from a string. The string varies depending if it is an invoice or a PO. Also, the Invoice number is not always the same field length.
Any help would be appreciated.
This is an example of the string:
V# H019978 I# 95492175 R# 854954
V# H020945 PO# 852194 ACCRUAL
I need a column for Vendor, InvoiceNum, PONum, ReturnNum
If there is no invoice # I need the field to be blank for that column. The same for the PO and return.
So the columns should be like:
Vendor InvoiceNum PONum Return
H019978 95492175 854954
H020945 852194
Wendy
March 16, 2009 at 1:04 pm
Hi Wendy
Sure that is possible with SQL, but I would advice to do it with a client application or SSIS. How do you get this string?
If you really want to do this by SQL give a short feedback.
Greets
Flo
March 16, 2009 at 1:10 pm
Thanks for Responding.
I am using a data repository to create reports that I write for our Hospital Information Systems that utilizes Visual Studio 2005.
I am working under the DATA tab where I am creating my query.
The field is TxnDescription in the dbo.GLBATCHMoneyBatchTransactions table that holds this string.
I am fairly new with utilizing this software and learining SQL at the same time.
Wendy
March 16, 2009 at 1:40 pm
[font="Verdana"]Here's some example code.
Note that I have assumed that there is always a space between the "V#" and the actual vendor number. I've also assumed that vendor numbers are always in the form "H[0-9]+" (an H followed by one or more digits). I've made similar assumptions about the other fields. You can vary the code to cope if any of those assumptions aren't true.
declare @workedExample table(
LineNumber int identity(1,1) not null primary key,
Line varchar(200) not null
);
insert into @workedExample(
Line
)
select 'V# H019978 I# 95492175 R# 854954' as Line union all
select 'V# H020945 PO# 852194 ACCRUAL' as Line union all
select 'V# H0123' as Line;
with
LineDetails as (
select LineNumber,
--
-- add a space on the end of the line to ensure there is a
-- non-digit character to find after the last field
--
Line + ' ' as Line,
charindex('V#', Line) as VendorPosition,
charindex('I#', Line) as InvoicePosition,
charindex('PO#', Line) as POPosition,
charindex('R#', Line) as ReturnPosition
from @workedExample
)
select LineNumber,
Line,
--
-- show the part of the line where we've skipped past the 'V# ' part
--
substring(
Line,
VendorPosition + 3,
200
) as Line_StartWithVendor,
--
-- skip past not only the V#, but also the leading H. then find the first
-- non digit. that should give us the length of the vendor number. note
-- that because we skip past the leading H, but actually include it in the
-- returned string, we don't need the -1 on the length that we do for the
-- other fields (since we're including an extra character)
--
patindex(
'%[^0-9]%',
substring(
Line,
VendorPosition + 4,
200
)) as VendorLength,
--
-- now put them all together to show the finished product: just the vendor
-- number
--
case VendorPosition
when 0 then null
else
substring(
Line,
VendorPosition + 3,
patindex('%[^0-9]%', substring(Line, VendorPosition + 4, 200))
)
end as Vendor,
--
-- repeat the exercise for the other numbers
--
case InvoicePosition
when 0 then null
else
substring(
Line,
InvoicePosition + 3,
patindex('%[^0-9]%', substring(Line, InvoicePosition + 3, 200)) - 1
)
end as InvoiceNum,
case POPosition
when 0 then null
else
substring(
Line,
POPosition + 4,
patindex('%[^0-9]%', substring(Line, POPosition + 4, 200)) - 1
)
end as PONum,
case ReturnPosition
when 0 then null
else
substring(
Line,
ReturnPosition + 3,
patindex('%[^0-9]%', substring(Line, ReturnPosition + 3, 200)) - 1
)
end as [Return]
from LineDetails;
The XML solution GSquared has presented below is probably the more elegant solution, but I'm not that much of an XML guru with SQL Server. So this approach just uses plain old SQL functions. If you are using SQL Server 2000 or earlier, you can replace the part after the with (a common table expression, or CTE) with an inline view.
Note: extensive string manipulation in SQL isn't the easiest thing in the world. You can do it, but you should seriously consider whether you should do it. You may be better using a routine coded in C# or VB and linked in via the CLR, or you may be better writing an SSIS package to do it.
[/font]
March 16, 2009 at 2:00 pm
Here's one way to work with this kind of thing. This solution assumes you have a Numbers table, and that you are working in SQL 2005 (as per the forum you posted in). If you don't have a Numbers table, you can either look up how to create one (there are many examples on this site), or ask and I'll tell you.
declare @String varchar(1000), @String2 varchar(1000), @XML XML;
select @String = 'V# H019978 I# 95492175 R# 854954';
select @String2 = coalesce(@String2 + ' ' +
replace(substring(@String, number-(number-charindex(' ', ' ' + @String, number-2)), charindex(' ', @String+' ', number+2)+2-number), '# ', '="')+'"',
replace(substring(@String, number-(number-charindex(' ', ' ' + @String, number-2)), charindex(' ', @String+' ', number+2)+2-number), '# ', '="')+'"')
from dbo.Numbers
where number <= len(@String)
and substring(@String, number, 1) = '#';
select @XML = ' ';
select
@XML.value('(row/@V)[1]','varchar(100)') as Vendor,
@XML.value('(row/@I)[1]','varchar(100)') as InvoiceNum,
@XML.value('(row/@R)[1]','varchar(100)') as [Return],
@XML.value('(row/@PO)[1]','varchar(100)') as PONum;
select @String2 = null;
select @String = 'V# H020945 PO# 852194 ACCRUAL';
select @String2 = coalesce(@String2 + ' ' +
replace(substring(@String, number-(number-charindex(' ', ' ' + @String, number-2)), charindex(' ', @String+' ', number+2)+2-number), '# ', '="')+'"',
replace(substring(@String, number-(number-charindex(' ', ' ' + @String, number-2)), charindex(' ', @String+' ', number+2)+2-number), '# ', '="')+'"')
from dbo.Numbers
where number <= len(@String)
and substring(@String, number, 1) = '#';
select @XML = ' ';
select
@XML.value('(row/@V)[1]','varchar(100)') as Vendor,
@XML.value('(row/@I)[1]','varchar(100)') as InvoiceNum,
@XML.value('(row/@R)[1]','varchar(100)') as [Return],
@XML.value('(row/@PO)[1]','varchar(100)') as PONum;
You'll have to modify it to work with your set-up, but the string manipulations and XML creation and query are the core of the solution.
If you have a table structure you need to work with, provide the create script for that, and a few lines of inserts for sample data, and we can help modify this to work with that.
Edit: Looks like there was a change in the way that code blocks work on the page since I posted the above. It removed a lot of my line-breaks and blank lines. Trying to put them back in, we'll see how I do on that.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 16, 2009 at 2:24 pm
Hi Wendy
If you are using Visual Studio you should either create a CLR function or work with SSIS.
Here an example with C# and RegEx (use a form and a button):
private void button1_Click_1(object sender, EventArgs e)
{
try
{
string input = "V# H019978 I# 95492175 PO# 854954";
string pattern = @"([VIR]|PO)\#\s([a-zA-Z0-9]*)";
Regex regex = new Regex(pattern);
MatchCollection mc = regex.Matches(input);
foreach (Match m in mc)
{
if (m.Groups.Count == 3)
{
string type = m.Groups[1].Value;
string value = m.Groups[2].Value;
if (type == "V")
{
MessageBox.Show(string.Format("Vendor: {0}", value));
}
else if (type == "I")
{
MessageBox.Show(string.Format("Invoice: {0}", value));
}
else if (type == "PO")
{
MessageBox.Show(string.Format("PONum: {0}", value));
}
else if (type == "R")
{
MessageBox.Show(string.Format("Return: {0}", value));
}
else
{
MessageBox.Show(string.Format("Error! Unknown type: {0}", type));
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
If you want to do this with SQL you can use this:
[font="Courier New"]IF (OBJECT_ID('t_test') IS NOT NULL)
DROP TABLE t_test
CREATE TABLE t_test (txt VARCHAR(100))
GO
INSERT INTO t_test VALUES ('V# H019978 I# 95492175 R# 854954')
INSERT INTO t_test VALUES ('V# H020945 PO# 852194 ACCRUAL')
GO
IF (OBJECT_ID('udf_split_reference') IS NOT NULL)
DROP FUNCTION udf_split_reference
GO
CREATE FUNCTION udf_split_reference (@prefix VARCHAR(10), @txt VARCHAR(100))
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @pos INT
DECLARE @len INT
SET @pos = CHARINDEX(@prefix, @txt) + LEN(@prefix) + 1
SET @len = CHARINDEX(' ', @txt, @pos)
IF (@len = 0)
SET @len = LEN(@txt) - @pos + 1
ELSE
SET @len = @len - @pos
RETURN SUBSTRING(@txt, @pos, @len)
END
GO
SELECT *,
dbo.udf_split_reference('V# ', txt) Vendor,
dbo.udf_split_reference('I# ', txt) Invoice,
dbo.udf_split_reference('R#', txt) Ret,
dbo.udf_split_reference('PO#', txt) PONum
FROM t_test
[/font]
Greets
Flo
March 16, 2009 at 11:14 pm
If the formatting of each element is pretty much cast in stone, then a straight forward method may be in order...
Here's the test data...
--===== Create a demonstration table (This is NOT part of the solution)
IF OBJECT_ID('TempDB..#DemoTable') IS NOT NULL
DROP TABLE #DemoTable
CREATE Table #DemoTable (Line VARCHAR(200))
INSERT INTO #DemoTable (Line)
SELECT 'V# H019978 I# 95492175 R# 854954' UNION ALL
SELECT 'V# H020945 PO# 852194 ACCRUAL' UNION ALL
SELECT 'I# 95492175 R# 854954 V# H019978 ' UNION ALL
SELECT 'PO# 852194 I# 95492175 R# 854954 V# H019978 ' UNION ALL
SELECT 'V# H0123'
... and here's the parsing code...
;WITH
cteStart AS
(--==== Find the starting position of each element and add a trailing delimiter
-- to the data
SELECT NULLIF(CHARINDEX('V# ' ,Line),0)+3 AS VendorStart,
NULLIF(CHARINDEX('I# ' ,Line),0)+3 AS InvoiceStart,
NULLIF(CHARINDEX('R# ' ,Line),0)+3 AS ReturnStart,
NULLIF(CHARINDEX('PO# ',Line),0)+4 AS PoStart,
Line + ' ' AS Line
FROM #DemoTable
)--==== Split the element positions
SELECT SUBSTRING(Line,VendorStart ,CHARINDEX(' ',Line,VendorStart)-VendorStart) AS Vendor,
SUBSTRING(Line,InvoiceStart,CHARINDEX(' ',Line,InvoiceStart)-InvoiceStart) AS Invoice,
SUBSTRING(Line,PoStart ,CHARINDEX(' ',Line,PoStart)-PoStart) AS PoNum,
SUBSTRING(Line,ReturnStart ,CHARINDEX(' ',Line,ReturnStart)-ReturnStart) AS [Return]
FROM cteStart
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply