March 10, 2016 at 12:00 am
Comments posted to this topic are about the item Find a String in a String in SQL Server
March 10, 2016 at 7:27 am
It's a nice article that explains this problem which unfortunately is more common that it should be.
I'd take a different approach which includes some "safe nets". It seems that your last code is trunctating the last digit.
SELECT *
FROM dbo.Customers
CROSS APPLY ( SELECT CHARINDEX('PO:', CustomerNotes + 'PO:') + 3) x(StartofPO)
CROSS APPLY ( SELECT SUBSTRING(CustomerNotes, StartofPO, 20)) y(TruncNotes)
CROSS APPLY ( SELECT LEFT(TruncNotes, PATINDEX('%[^0-9]%', TruncNotes + '.')-1)) z(PO);
SELECT *
FROM dbo.Customers
CROSS APPLY ( SELECT CHARINDEX('PO:', CustomerNotes + 'PO:') + 2) x(StartofPO)
CROSS APPLY ( SELECT STUFF(CustomerNotes, 1, StartofPO, '')) y(TruncNotes)
CROSS APPLY ( SELECT LEFT(TruncNotes, PATINDEX('%[^0-9]%', TruncNotes + '.')-1)) z(PO);
Basically, I remove the first part of the string and then identifying the end of the PO. I can't make it in a single step because the PATINDEX does not have a starting position parameter.
March 10, 2016 at 8:41 am
Thanks for the catch. I updated the math to get the PO.
I wonder which of these works better in complex queries. It's worth some performance testing to see. I hadn't thought about cross apply here, but it's an interesting approach.
March 10, 2016 at 8:54 am
Good article Steve. I think you under sold the power of PATINDEX...
CHARINDEX can start at a certain position in the string while PATINDEX can take wildcards. In this simplistic case, we can use either one.
PATINDEX can accept patterns, as Luis demonstrates above... wildcard or otherwise. This is what makes it so special. It's what makes things PatternSplitCM (referenced in my signature) possible.
Over the years I have always believed that, for complex string problems, you need Regular Expressions (Regex) but no longer feel that way. With CHARINDEX, PATINDEX, SUBSTRING, STUFF, a few other T-SQL string functions I can't think of at the moment and a Tally table you can do pretty much anything you would do using Regex.
-- Itzik Ben-Gan 2001
March 10, 2016 at 9:03 am
PATINDEX is powerful, without a doubt. I have more of a fondness for charindex, mostly because I can indicate a start position. However, as Luis showed, you can get around this.
Certainly there is power in PATINDEX, and I didn't tackle that here. I wanted to focus on a few common problems that I have seen across the last 6 months in questions. It seems many people are struggling with simple parsing.
A more complex treatment of PATINDEX would have to wait for another piece. Unless you want to write one 😉
March 10, 2016 at 9:04 am
This is why i loathe long free-flowing 'note' fields where people embed data they then expect reports or queries to pull from in a consistent way.
Good example is credit card #'s in note fields. Even using regular expression processing its difficult to find all types of patterns that represent the way the user entered the CC#'s in the 'note'.
If you need to report or find PO data, then it should go into a field in a table called something like 'PO Number', NOT in a note.
Note fields should be for misc notes, not data fields.
March 10, 2016 at 9:17 am
Completely agree. However, when the data is in there, even if you have modified and built a good data model, this is one way you might move that data back to the correct fields.
Ideally, you'd also remove the data from here.
March 10, 2016 at 9:29 am
Steve Jones - SSC Editor (3/10/2016)
Completely agree. However, when the data is in there, even if you have modified and built a good data model, this is one way you might move that data back to the correct fields.Ideally, you'd also remove the data from here.
I've written way too many 'scrub' routines to pull data out of notes and populate a table.
What usually ends up happening is they still keep using the note field, in one case i turned over my regular expressions to the dev so they could write a validater to keep the credit card #'s out of the notes.
Note fields are evil, basically, and should only be used in Notepad or Notepad ++ 🙂
March 10, 2016 at 9:42 am
However, I'm not sure we'll ever get rid of "notes" fields.
March 10, 2016 at 10:14 am
Steve Jones - SSC Editor (3/10/2016)
However, I'm not sure we'll ever get rid of "notes" fields.
*sigh* probably not. Its the 'cheap','fast' way to extend the data model, just add the new field to the 'note'.
Then 12 months down the road, 'pull the <whatever> out of the notes so we can search on it'.
*headdesk*
March 10, 2016 at 10:37 am
Manic Star (3/10/2016)
This is why i loathe long free-flowing 'note' fields where people embed data they then expect reports or queries to pull from in a consistent way.Good example is credit card #'s in note fields. Even using regular expression processing its difficult to find all types of patterns that represent the way the user entered the CC#'s in the 'note'.
If you need to report or find PO data, then it should go into a field in a table called something like 'PO Number', NOT in a note.
Note fields should be for misc notes, not data fields.
I loath such things not so much because there is a notes "field" but because people do really, really sssttttooooppppiiiidddd things like violating PCI and PII standards by putting things like Credit Card Numbers and SSNs in the notes "field". One "easy" way to protect the data at rest is to encrypt notes "fields". That does not, however, do the job correctly because anyone that has the necessary casual access to such "fields" can see that PCI or PII information without the necessary controls. When I did work as a "front ender" (more than a decade ago), I had a module that would replace the digits of such information with "X"s and would still leave phone numbers and other numbers intact. It's not that difficult and well worth the effort to protect the customers, which inherently protects the company.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2016 at 10:44 am
Shifting gears, this is a good introductory article for those people that really have no clue on how to do such a thing. It's not meant to be the end-all to be-all for mid-string lookups. I suspect that some people have read it as such and graded it low because of their misunderstanding of the purpose of the article.
I'll also say that grading an article low without providing feedback to the author of what the problem is accomplishes nothing because it doesn't tell the author what they may have done wrong and that also doesn't help people looking for a solution.
Yes, I realize that 3 stars means "average" but I'd recommend that any vote with 3 stars or less be accompanied with a brief description of why the article is consider to be "only" average and what the author could do to make it better in your opinion.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2016 at 11:26 am
Jeff Moden (3/10/2016)
I loath such things not so much because there is a notes "field" but because people do really, really sssttttooooppppiiiidddd things like violating PCI and PII standards by putting things like Credit Card Numbers and SSNs in the notes "field". One "easy" way to protect the data at rest is to encrypt notes "fields". That does not, however, do the job correctly because anyone that has the necessary casual access to such "fields" can see that PCI or PII information without the necessary controls. When I did work as a "front ender" (more than a decade ago), I had a module that would replace the digits of such information with "X"s and would still leave phone numbers and other numbers intact. It's not that difficult and well worth the effort to protect the customers, which inherently protects the company.
Totally agree. I had a version of that module too 🙂
March 10, 2016 at 2:39 pm
The patindex function and use of substrings is very useful for creating spiders to parse data from web pages.
March 10, 2016 at 3:28 pm
Great article. Steve, especially for those who want to really reach an expert level of SQL use. Microsoft SQL Server has by far the best set of string manipulation operators that I ever saw in 42 years of IT experience, including front-end development, SQL development, database administration and 11 years of IT management. Anyone aspiring to more than a cursory knowledge of query-writing should be sure they completely MASTER string manipulation. I turned 73 years old today, and I still actively create SQL code, now just for myself to use in maintaining and updating my retirement portfolio and investment transaction history. I have used Quicken for my records since 1986 and can account for every penny of my retirement savings and spending since that point. You can't count on software development companies to consistently cover all your needs for data in their packages, so I routinely develop code that will scan and parse data out of text emails and PDF documents and such, run it through SQL Server to store, manipulate, and create proper formats for importing into Quicken. One word of advice here is to never develop code that depends on finding data at a hard-coded position within a string, since any and all textual information is subject to change without notice or reason. Thus, your SQL string-parsing code must always handle text data on a 'this time' basis instead of 'it's always ...' . For instance, you can't count on the PO number being nine digits and beginning in a certain fixed position within the string. Safer is to search for the first non-blank CHARACTER, other than punctuation such as the colon, located following the literal PO (don't assume it's numeric!) and continuing until you find a space, and allowing for special characters such as hyphens which might be included. This usually involves evaluating each character by type and extracting those wanted, appending them to a new string data element, then if appropriate converting to numeric if all individual positions have passed the numeric test, and all such variations of evaluation so you avoid SQL errors in the process. Incidentally, a good way to throw away unneeded spaces to routinely use MyString = LTRIM(RTRIM(MyString)) to remove leading data you have parsed and extracted, and move you to the next succeeding data element. Always, always, ALWAYS test the data-type of an extracted variable BEFORE you move it to it's final variable location. String parsing without causing SQL errors, is definitely an art form, and extremely important for reliable code.
Rick
Disaster Recovery = Backup ( Backup ( Your Backup ) )
Viewing 15 posts - 1 through 15 (of 46 total)
You must be logged in to reply to this topic. Login to reply