October 13, 2014 at 10:40 am
I am able to extract a number from a string but there is a case where it is not pulling the data correct. I want to pull the entire number with or without dashes.
It is extracting the number correct from the string below when there is a dash in the number, but when there is no dash it does not pull the entire number:
Contrato Promocional 2010-241772 Descrico CDR TD
Contrato Promocional 2010 241408 Perlodo de Compr
results:
2010-241772
2010
code:
SELECT *, Substring(FullText,PATINDEX('%Contrato%',FullText), 50) as Contract2,
Left(SubString(Substring(FullText,PATINDEX('%Contrato%',FullText), 50), PatIndex('%[0-9.-]%', Substring(FullText,PATINDEX('%Contrato%',FullText), 50)), 50), PatIndex('%[^0-9.-]%', SubString(Substring(FullText,PATINDEX('%Contrato%',FullText), 50), PatIndex('%[0-9.-]%', Substring(FullText,PATINDEX('%Contrato%',FullText), 50)), 8000) + 'X')-1) as Contract3 ,
Substring(FullText,PATINDEX('%Fornecedor%',FullText), 100) as VendorNumber1
FROM [Sonae].[dbo].[Images_Local]
where [FULLTEXT] is not null
AND [FULLTEXT] Like '%Contrato%'
I also want to pull the numbers out and just have the vendor name from the string below into Vendor1 (see code above)
The data in that column looks like this:
Fornecedor 14105 - SASISUNG ELECTRONICA PORTUGUESA SA Promocao Moeda EUR - EURO Negociador 1-!ENR
Fornecedor PromocAo MODELO CONTINENTE HIPERMERCADOS, SA 14105 - SAMSUNG ELECTRONICA PORTUGUESA SA
Fornecedor 14628 - LG ELECTRONICS PORTUGAL,SA Moeda EUR - EURO Presença MODELO CONTINENTE HIPERME
Fornecedor PANASONIC IBERIA S.A. - SUCURSAL PORTUGAL Mo contempla financeiro nem rappel. Ambos os
I want to these results into 2 columns:
VendorNumber1 VendorName1
14105 SASISUNG ELECTRONICA PORTUGUESA SA
14105 SAMSUNG ELECTRONICA PORTUGUESA SA
14628 LG ELECTRONICS PORTUGAL,SA
N/A PANASONIC IBERIA S.A.
October 13, 2014 at 10:59 am
SELECT Fulltext, Contract2,
Substring(Contract2, Contract3_Start, Contract3_Length) AS Contract3
FROM (
SELECT 'Contrato Promocional 2010-241772 Descrico CDR TD' AS Fulltext UNION ALL
SELECT 'Contrato Promocional 2010 241408 Perlodo de Compr'
) AS test_data --[Sonae].[dbo].[Images_Local]
cross apply (
SELECT Substring(FullText, PATINDEX('%Contrato%',FullText), 50) as Contract2
) as assign_alias_names_1
cross apply (
SELECT PatIndex('%[0-9.-]%', Contract2) as Contract3_Start
) as assign_alias_names_2
cross apply (
SELECT PatIndex('%[^ 0-9.-]%', Substring(Contract2 + 'X', Contract3_Start, 8000)) - 2 as Contract3_Length
) as assign_alias_names_3
WHERE [FULLTEXT] is not null
AND [FULLTEXT] Like '%Contrato%'
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 13, 2014 at 11:01 am
Your too quick for me... I just edited my original post. Please see new requirement.
thanks
October 13, 2014 at 11:46 am
I did part of the job for you. You should be able to complete the rest.
CREATE TABLE table1(
Col1 varchar(8000))
INSERT INTO table1
VALUES
('Fornecedor 14105 - SASISUNG ELECTRONICA PORTUGUESA SA Promocao Moeda EUR - EURO Negociador 1-!ENR'),
('Fornecedor PromocAo MODELO CONTINENTE HIPERMERCADOS, SA 14105 - SAMSUNG ELECTRONICA PORTUGUESA SA '),
('Fornecedor 14628 - LG ELECTRONICS PORTUGAL,SA Moeda EUR - EURO Presença MODELO CONTINENTE HIPERME'),
('Fornecedor PANASONIC IBERIA S.A. - SUCURSAL PORTUGAL Mo contempla financeiro nem rappel. Ambos os')
SELECT Vendor
FROM table1
CROSS APPLY ( SELECT REPLACE( STUFF( Col1, 1, PATINDEX( '%[0-9]%', Col1), ''), 'Fornecedor ', '') String) AS something
CROSS APPLY ( SELECT LEFT( String, MIN(Position)) Vendor
FROM (VALUES(NULLIF( PATINDEX( '%[^a-zA-Z]SA[^a-zA-Z]%', String + ' '), 0) + 2),
(NULLIF( PATINDEX( '%[^a-zA-Z]S.A.[^a-zA-Z]%', String + ' '), 0) + 4))x(Position) ) s1
GO
DROP TABLE table1
October 13, 2014 at 2:37 pm
Thanks, this is very helpful
October 13, 2014 at 3:40 pm
When I changed the fourth line to point to the Table/column it gives me an error msg. I forgot to mention, FullText has a lot more text in the column. I ran OCR on multipage tif's and put the text in the FullText column.
Msg 537, Level 16, State 5, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function.
SELECT Fulltext, Contract2,
Substring(Contract2, Contract3_Start, Contract3_Length) AS Contract3
FROM (
SELECT Fulltext From [Sonae].[dbo].[Images_Local]
) AS test_data
cross apply (
SELECT Substring(FullText, PATINDEX('%Contrato%',FullText), 50) as Contract2
) as assign_alias_names_1
cross apply (
SELECT PatIndex('%[0-9.-]%', Contract2) as Contract3_Start
) as assign_alias_names_2
cross apply (
SELECT PatIndex('%[^ 0-9.-]%', Substring(Contract2 + 'X', Contract3_Start, 8000)) - 2 as Contract3_Length
) as assign_alias_names_3
WHERE [FULLTEXT] is not null
AND [FULLTEXT] Like '%Contrato%'
October 13, 2014 at 3:45 pm
SELECT Fulltext, Contract2,
Substring(Contract2, Contract3_Start, Contract3_Length) AS Contract3
FROM (
SELECT Fulltext From [Sonae].[dbo].[Images_Local]
) AS test_data
cross apply (
SELECT Substring(FullText, PATINDEX('%Contrato%',FullText), 50) as Contract2
) as assign_alias_names_1
cross apply (
SELECT PatIndex('%[0-9.-]%', Contract2) as Contract3_Start
) as assign_alias_names_2
cross apply (
SELECT CASE WHEN Contract3_Start <= 2 THEN 0 ELSE
PatIndex('%[^ 0-9.-]%', Substring(Contract2 + 'X', Contract3_Start, 8000)) - 2 END as Contract3_Length
) as assign_alias_names_3
WHERE [FULLTEXT] is not null
AND [FULLTEXT] Like '%Contrato%'
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 14, 2014 at 5:40 am
It'll be interesting when an actual vendor name also contains dashes. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
October 14, 2014 at 8:14 am
Jeff Moden (10/14/2014)
It'll be interesting when an actual vendor name also contains dashes. 😛
In that case, presumably we would search first for a digit only, "%0-9%", then do the search for digits with dashes, etc. after that ;-).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 15, 2014 at 11:45 am
This is data I OCR'd and The contract number can show up multiple times in the file. It is a multi-page tif image. Sometimes the contract number is crossed out or not readable on the first page, so in my data it is not pulling the entire contract. I've seen tif images where the contract is readable on the second or third page and it is coming in fine in the FullText column, but the Patindex is searching and finding the first "%Contrato%". If the first contract number is not readable and only brings in say 4 chars or less, how do I make it search for the next contract to pull in the number from the next position in the text?
October 15, 2014 at 12:56 pm
Do you have some type of id or other unique/key value on the row?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 15, 2014 at 10:38 pm
there is a column called ID, which is a unique column.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply