March 10, 2016 at 8:11 pm
You can also just do:
substring(customernotes,
charindex('PO:',customernotes + 'PO:')+3,
charindex(' ',customernotes + ' PO: ',charindex('PO:',customernotes + ' PO:') )-charindex('PO:',customernotes)-3)
This works where no PO is present also.
Or to also handle the case of "test PO: 201530444. New test"
rtrim(ltrim(
substring(customernotes,
charindex('PO:',customernotes + 'PO:')+3,
PATINDEX('%[^ 0-9]%',SUBSTRING(customernotes ,charindex('PO:',customernotes + 'PO:')+4,1000) + '.' )
)
))
March 11, 2016 at 2:53 am
Great article, Steve, because it illustrates the way that one has to go about finding a solution. However, because you don't like PatIndex, it all gets a bit complicated. Why not just do this? (it checks for anything starting with PO: followed by two numbers. If it finds anything it returns all the numbers following until it finds something that isn't a number: otherwise it returns a blank string ''). OK it may not be entirely intuitive but once one understands the technique, I guarantee that it is useful where you have to pull out a single substring that varies in length.
[font="Courier New"]
SELECT customerid, LEFT(TopOfPO, PATINDEX('%[0-9][^0-9]%', TopOfPO))
FROM
(SELECT
SUBSTRING(
customernotes+' ',
PATINDEX('%PO:[0-9][0-9]%', customerNotes+'PO:99')+3,
100) AS topOfPO,
customerID
FROM customers)f(TopOfPO, customerID)[/font]
This technique is called "Top 'n Tailing". You start by nicking off the start of the string, and then finish by lopping off everything after the end of the substring. The only complication is in dealing with the problem of not finding the substring. You just ensure that it always finds something by adding a string on the end! It is easier to play about with the example than to describe!
Best wishes,
Phil Factor
March 11, 2016 at 10:37 am
my personal favorite is to relly on RegExp, with some CLR you can getit into sql server. What I do is add it to master database. this way its available to all databases.
master.dbo.RegExMatch(d,'(?<=PO:)(\d{7,10})(?=\s)')
basiclay capture 7-10 numbers that are preceded by PO: and followed by a space or end.
March 11, 2016 at 10:45 am
jcuribe (3/11/2016)
my personal favorite is to relly on RegExp, with some CLR you can getit into sql server. What I do is add it to master database. this way its available to all databases.master.dbo.RegExMatch(d,'(?<=PO:)(\d{7,10})(?=\s)')
basiclay capture 7-10 numbers that are preceded by PO: and followed by a space or end.
My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2016 at 10:54 am
Steve Jones - SSC Editor (3/10/2016)
Thanks for the catch. I updated the math to get the PO.I wonder which of these works better in complex queries. It's worth some performance testing to see. I hadn't thought about cross apply here, but it's an interesting approach.
In a case like this, where CROSS APPLY is only used to introduce new columns, there is no performance hit at all.
Let's say you had a table of comma-delimited list of one or more integers:
create table #t
(
ID int identity(1,1)
,ListOfNums varchar(50)
)
insert #t
values ('279,37,972,15,175')
,('17,72')
,('672,52,19,23')
,('153,798,266,52,29')
,('77,349,14')
select * from #t
/*
ID ListOfNums
-- -----------------
1 279,37,972,15,175
2 17,72
3 672,52,19,23
4 153,798,266,52,29
5 77,349,14
*/
And you have to find all the rows where the 4th number in the list is less than 50 and sort the result based on the value of the 3rd number.
Before CROSS APPLY, you'd have to do it like this (eye roll):
select ID
,ListOfNums
from #t
where substring(ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)+1,
(charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)+1)-
charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
charindex(',',ListOfNums+',,,,')+1)+1))-1)
< 50
order by substring(ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
charindex(',',ListOfNums+',,,,')+1)+1,(charindex(',',ListOfNums+',,,,',
charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)-
charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1))-1)
/*
ID ListOfNums
-- -------------
2 17,72
5 77,349,14
3 672,52,19,23
1 279,37,972,15
*/
But with CROSS APPLY, you could do it like this... and the query plan is EXACTLY THE SAME as the one above:
select ID
,ListOfNums
from #t
cross apply (select WorkString=ListOfNums+',,,,') F_Str
cross apply (select p1=charindex(',',WorkString)) F_P1
cross apply (select p2=charindex(',',WorkString,p1+1)) F_P2
cross apply (select p3=charindex(',',WorkString,p2+1)) F_P3
cross apply (select p4=charindex(',',WorkString,p3+1)) F_P4
cross apply (select Num3=convert(int,substring(WorkString,p2+1,p3-p2-1))
,Num4=convert(int,substring(WorkString,p3+1,p4-p3-1))) F_Nums
where Num4<50
order by Num3
/*
ID ListOfNums
-- -------------
2 17,72
5 77,349,14
3 672,52,19,23
1 279,37,972,15
*/
March 11, 2016 at 12:08 pm
There's a sneaky trick I like to use in this scenario that can shorten the code, and I feel make it more readable.
'PO' = SUBSTRING(CustomerNotes, CHARINDEX('PO:', CustomerNotes)+3, COALESCE(NULLIF(CHARINDEX(' ', CustomerNotes, CHARINDEX('PO:', CustomerNotes)+3),0),LEN(customernotes)) - CHARINDEX('PO:', CustomerNotes) - 2
The case is replaced with a nested COALESCE NULLIF combo. If the CHARINDEX returns 0 NULLIF converts it to a NULL and COALESCE returns the value from LEN.
This doesn't seem to have any performance impact one way or the other.
March 11, 2016 at 1:45 pm
Great other solutions.
It would be good to showcase a comparison of different methods at scale. I'll try to do one, using Jeff's test data piece to scale up test data and see how things perform. Unless someone wants to beat me to it.
March 12, 2016 at 8:55 am
You're all right... This an all-too-common issue. This piqued my curiosity though and started playing with the script so pull out the PO number from the string where the Purchase Order Number followed "PO:", "PO", "PO: ", or "PO " - since the last thing user's are is consistent.
Here's what I came up with.
Any recommendations on condensing this a little further?
SELECT CustomerID
,REPLACE(
REPLACE(
REPLACE(
SUBSTRING(CustomerNotes
,CHARINDEX('PO', CustomerNotes)
,CASE CHARINDEX(' ', CustomerNotes, CHARINDEX('PO', CustomerNotes)+5)
WHEN 0
THEN LEN(CustomerNotes) - CHARINDEX('PO',CustomerNotes)+1
ELSE CHARINDEX(' ', CustomerNotes, CHARINDEX('PO', CustomerNotes)) - CHARINDEX('PO', CustomerNotes)
END)
,' ','')
,':','')
,'PO','') AS PONumber
FROM dbo.Customers
WHERE CustomerNotes LIKE '%PO%'
March 14, 2016 at 7:05 am
Manic Star (3/10/2016)
This is why i loathe long free-flowing 'note' fields where people embed data they then expect reports or queries to pull from in a consistent way.Good example is credit card #'s in note fields. Even using regular expression processing its difficult to find all types of patterns that represent the way the user entered the CC#'s in the 'note'.
If you need to report or find PO data, then it should go into a field in a table called something like 'PO Number', NOT in a note.
Note fields should be for misc notes, not data fields.
I totally agree.
March 17, 2016 at 10:31 am
My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.
All the more reason to have a nimble regex library (PCRE) running natively within SQL server with a instance-wide cache of compiled patterns. The cumbersome TSQL solutions for even relatively simple patterns are a joke when a short, sweet, regex will suffice. It's 2016. Why don't we have this?
March 17, 2016 at 2:38 pm
quagmired (3/17/2016)
My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.
All the more reason to have a nimble regex library (PCRE) running natively within SQL server with a instance-wide cache of compiled patterns. The cumbersome TSQL solutions for even relatively simple patterns are a joke when a short, sweet, regex will suffice. It's 2016. Why don't we have this?
So what happens when something doesn't fit a "compiled pattern"?
Mid string searches in SQL suck. Regex has it's own problems. The reason either of them exist is to be able to interrogate improperly formed data. The best thing (although rarely the easiest thing) to do would be to have a "Moden Style" pork chop dinner with whomever the data provider is and get them to fix their junk. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 17, 2016 at 3:31 pm
Jeff Moden (3/17/2016)
quagmired (3/17/2016)
My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.
All the more reason to have a nimble regex library (PCRE) running natively within SQL server with a instance-wide cache of compiled patterns. The cumbersome TSQL solutions for even relatively simple patterns are a joke when a short, sweet, regex will suffice. It's 2016. Why don't we have this?
So what happens when something doesn't fit a "compiled pattern"?
Mid string searches in SQL suck. Regex has it's own problems. The reason either of them exist is to be able to interrogate improperly formed data. The best thing (although rarely the easiest thing) to do would be to have a "Moden Style" pork chop dinner with whomever the data provider is and get them to fix their junk. 😉
Not looking to turn this into a drag 'em out, but...
Most runtimes maintain a cache of compiled patterns. If a new one comes along, compile it, and keep it around for the next time. I have no idea, but I'd be willing to bet that SQL Server does this with LIKE patterns.
Regex or any variety of pattern matching exists for a whole lot more than mid-string searches. Think data validation. I absolutely agree that storing structured data out of place in a blob is a piss-poor practice. But pattern matching still has its place in SQL, especially in analytics, and providing a subset of full blown regex is just putting up unnecessary hoops and hurdles.
Name a problem with regex that is not a problem with what you get in standard SQL pattern matching.
Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?
March 18, 2016 at 1:39 pm
quagmired (3/17/2016)
Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?
Starting thought.... Have you ever actually had to do that anywhere? If so, what was it for?
Also, there's a whole thread where people spent the time trying to justify RegEx from SQL and got their heads handed to them by coded proof. I'll see if I can find it...
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2016 at 7:40 pm
Jeff Moden (3/18/2016)
quagmired (3/17/2016)
Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?Starting thought.... Have you ever actually had to do that anywhere? If so, what was it for?
Also, there's a whole thread where people spent the time trying to justify RegEx from SQL and got their heads handed to them by coded proof. I'll see if I can find it...
Fine, how about a string with 'PO:' preceded by a non-alphanumeric character or nothing at all, followed by an arbitrary number of spaces followed by up to ten decimal digits, terminating, or followed by a non decimal character. When have I ever had to match 10*1? Maybe in automata class. Why does it matter? It's a common class of problems that the SQL community, apparently, does not want to own, and the waste of effort and bloated TSQL code that ensues is really very much a joke, hence this article/forum topic.
Don't pay any attention to the fact that LIKE pattern matching already available in SQL is dinosaur regex.
Handed heads: if that coded "proof" involves a native engine with caching and features that prevent runaway patterns, (non)?greedy, possessive, anchored, etc (again, PCRE), then perhaps that's proof, but I really doubt it. Looking forward to a link.
March 18, 2016 at 9:30 pm
Jeff Moden (3/18/2016)
quagmired (3/17/2016)
Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?Starting thought.... Have you ever actually had to do that anywhere? If so, what was it for?
Also, there's a whole thread where people spent the time trying to justify RegEx from SQL and got their heads handed to them by coded proof. I'll see if I can find it...
Found the discussion... and RegEx got its ears folded back pretty badly.
http://www.sqlservercentral.com/Forums/Topic1296195-60-1.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 46 total)
You must be logged in to reply to this topic. Login to reply