May 20, 2018 at 1:40 pm
Greetings,
I am what you will call an accidental DBA. I was a Systems Engineer and now I have to watch over 4 AAG Clusters and 600 DB's. Every once and awhile I am presented with a problem to fix so outside my skill set I just say wow. How in the world can I make this work? I spent days looking for information but cant seem to find anything that has been able to guide me into the correct direction.
I am currently being asked to identify bad check micr's in each database before they are sent to processing. Once and awhile the OCR that reads the checks will add in artifacts due to bad lighting or a fold in the check are a few examples. Working with a coworker familiar with regex. They were able to provide me with one that will identify a bad micr. This was tested on a regex checker site. My issue is how in the world will I be able to do this in SQL? I have read page after page and cannot find a workable written explanation on using regex in SQL. At least nothing I understand. Here is the regex given to me ^(c[\d-]{1,}c)?(\d)?(d[\d-]{9}d)(c?[\d-]{1,}c)?([\d-]{1,4})?(b\d{1,10}b)?$ . Is this even possible in SQL?
I am using SQL 2016.
May 20, 2018 at 5:05 pm
I would say your only option is to use a CLR to do it.
https://www.google.ie/search?q=clr+regex+c%23&oq=regex+clr&aqs=chrome.2.69i57j0l5.5556j0j7&sourceid=chrome&ie=UTF-8 will give you some examples
or eventually R https://www.mssqltips.com/sqlservertip/4748/sql-server-2016-regular-expressions-with-the-r-language/
May 20, 2018 at 7:13 pm
TollHouse - Sunday, May 20, 2018 1:40 PMGreetings,
I am what you will call an accidental DBA. I was a Systems Engineer and now I have to watch over 4 AAG Clusters and 600 DB's. Every once and awhile I am presented with a problem to fix so outside my skill set I just say wow. How in the world can I make this work? I spent days looking for information but cant seem to find anything that has been able to guide me into the correct direction.I am currently being asked to identify bad check micr's in each database before they are sent to processing. Once and awhile the OCR that reads the checks will add in artifacts due to bad lighting or a fold in the check are a few examples. Working with a coworker familiar with regex. They were able to provide me with one that will identify a bad micr. This was tested on a regex checker site. My issue is how in the world will I be able to do this in SQL? I have read page after page and cannot find a workable written explanation on using regex in SQL. At least nothing I understand. Here is the regex given to me ^(c[\d-]{1,}c)?(\d)?(d[\d-]{9}d)(c?[\d-]{1,}c)?([\d-]{1,4})?(b\d{1,10}b)?$ . Is this even possible in SQL?
I am using SQL 2016.
Hold the phone, now... I can't read Regex so you'll have to tell me... does that Regex incorporate a Luhn 10 Checksum check of the bank routing number? If it doesn't, then the Regex check and the bank routing number isn't checked with a Luhn 10 Checksum and isn't played against a full list of back routing numbers, the expense of doing this RegEx check may not be justified.
Since I don't use Regex and can't actually read what this Regex string is actually going to do, can you provide a more humanized definition of that it does, please?.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 20, 2018 at 7:14 pm
Thank you Frederico for the information. I have heard of R server but never thought to use it in this manner. I will read over what you send and update once I figure out how to make it work.
May 20, 2018 at 7:33 pm
TollHouse - Sunday, May 20, 2018 7:14 PMThank you Frederico for the information. I have heard of R server but never thought to use it in this manner. I will read over what you send and update once I figure out how to make it work.
If you can tell me what the Regex does in human terms, you might not have to bother. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 21, 2018 at 1:21 am
Steve,
As far as I can see its not Luhn 10 checksum but rather processing the numbers at the bottom of a printed cheque/check - https://en.wikipedia.org/wiki/Magnetic_ink_character_recognition
the regex in question is processing more than that as it has 6 possible capture groups with groups 2, 4 ,5 ,6 being optional (although in reality only 1 or 2 may be missing if not mistaken)
It can be done in SQL.
TollHouse could you supply as a few examples (replacing all digits with a 9) so we could expand on it?
Few variations of it if you can.
And obviously the rules for it to "fail" which I think is what you are after.
examples below that pass the regex with varying groups being returned
number 1 is required groups, any number of digits or (-)
number 2 is required group - exactly 9 chars (any valid combination of digits or (-))
numbers 4 is optional groups
number 5 is optional group but if present must be 4 chars in size (any valid combination of digits or (-))
c1---c4d222222222dc4-c555-b4b
c1-c4d22-2-222-dc4-c555-b4b
c1-cd222---22-dc4-c555-b4b
c1-cd222--222-d555-b4b
c1-cd22222222-dc4-cb4b
c1-cd22222222-dc4-c555
c1-cd22222222-dc4-c555-
c1-cd22222222-db4b
c1-cd22222222-d
first line matches as follows
Group 1. | 0-6 | `c1---c` |
Group 2. | 6-7 | `4` |
Group 3. | 7-18 | `d222222222d` |
Group 4. | 18-22 | `c4-c` |
Group 5. | 22-26 | `555-` |
Group 6. | 26-29 | `b4b` |
Now I may be wrong but I believe that the regex may not be doing exactly what is required as it may return a string of (-) as valid.
May 21, 2018 at 5:20 am
For those who would like to try for themselves, a Regex tester can be found here. It suggests that Frederico's concern is not an issue, because I tested strings containing only hyphens – no match.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 21, 2018 at 5:53 am
frederico_fonseca - Monday, May 21, 2018 1:21 AMSteve,As far as I can see its not Luhn 10 checksum but rather processing the numbers at the bottom of a printed cheque/check - https://en.wikipedia.org/wiki/Magnetic_ink_character_recognition
the regex in question is processing more than that as it has 6 possible capture groups with groups 2, 4 ,5 ,6 being optional (although in reality only 1 or 2 may be missing if not mistaken)
It can be done in SQL.
TollHouse could you supply as a few examples (replacing all digits with a 9) so we could expand on it?
Few variations of it if you can.
And obviously the rules for it to "fail" which I think is what you are after.
examples below that pass the regex with varying groups being returned
number 1 is required groups, any number of digits or (-)
number 2 is required group - exactly 9 chars (any valid combination of digits or (-))
numbers 4 is optional groups
number 5 is optional group but if present must be 4 chars in size (any valid combination of digits or (-))
c1---c4d222222222dc4-c555-b4b
c1-c4d22-2-222-dc4-c555-b4b
c1-cd222---22-dc4-c555-b4b
c1-cd222--222-d555-b4b
c1-cd22222222-dc4-cb4b
c1-cd22222222-dc4-c555
c1-cd22222222-dc4-c555-
c1-cd22222222-db4b
c1-cd22222222-d
first line matches as follows
Group 1. 0-6 `c1---c` Group 2. 6-7 `4` Group 3. 7-18 `d222222222d` Group 4. 18-22 `c4-c` Group 5. 22-26 `555-` Group 6. 26-29 `b4b` Now I may be wrong but I believe that the regex may not be doing exactly what is required as it may return a string of (-) as valid.
Yep, I get that it's processing the magnetic line at the bottom of the check (known as the "MICR" or Magnetic Ink Character Recognition). If all it's doing is checking for length and the type of characters, then I'll like to know which ACH the OP is working for so that I can avoid it. 😀
If there is no Luhn 10 Checksum check for the Bank Routing Number (also known as a "Transit Number") and no indication of playing that number against an actual list of all existing Bank Routing Numbers, the this whole thing is doomed to failure no matter which method of reading you take. Just like credit card numbers, the Bank Routing Number always has a Luhn 10 Checksum checkdigit and it should be verified as correct at the very least.
The software on the machines that read the MICR will typically categorize what the Bank Routing Number, Customer Account Number, Check Number and, once cleared by the bank or ACH, what the amount of the check is. If you're fortunate, the MICR reader will also do the checksum and comparison to a list of actual Bank Routing Numbers so I'm also questioning the actual need for all of this. A simple string splitter should be all you need for this if it actually has been read by a good and proper MICR reader. If they don't have one of those, then they shouldn't be in the business that they're in. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 21, 2018 at 6:45 am
We see the issue with mobile deposit and the check is being read from a picture. Which happens to cause issues with the micr being read into the OCR software from time to time. The business side uses a reader and does not have this issue. If you are telling me its a waste of time. Then I will let the party requesting it know that. Remember I am new to DBA role and looking to people much smarter then I on this topic. I also didn't develop this product but I have to support to the best of my abilities, even as limited as they are.
As for what I am checking for it in a simple way a malformed micr. We know that a micr has a specif format it must follow. I want to find ones that do not follow this standard. I am reaching out to the department requesting it for examples of what a bad micr looks like and a good one for reference. I should have an answer to post in the next hour or so.
May 21, 2018 at 6:59 am
TollHouse - Monday, May 21, 2018 6:45 AMWe see the issue with mobile deposit and the check is being read from a picture. Which happens to cause issues with the micr being read into the OCR software from time to time. The business side uses a reader and does not have this issue. If you are telling me its a waste of time. Then I will let the party requesting it know that. Remember I am new to DBA role and looking to people much smarter then I on this topic. I also didn't develop this product but I have to support to the best of my abilities, even as limited as they are.As for what I am checking for it in a simple way a malformed micr. We know that a micr has a specif format it must follow. I want to find ones that do not follow this standard. I am reaching out to the department requesting it for examples of what a bad micr looks like and a good one for reference. I should have an answer to post in the next hour or so.
It's not a waste of time getting this information into SQL Server. What I'm saying is that it's a waste of time to try to verify and validate the MICR in SQL Server. It doesn't matter whether of not it's a physical paper check or a picture of a check. The software that does both should be able to correctly validate the data. All you should need to do is load it once the other software has parsed it, validated it, and split it into importable data.
If they want SQLServer to do the validations, then the OCR or whatever software they're using to read MICR should at least be able to read the embedded tokens and split it correctly and identifiably so that it can be imported to a table. If the OCR/MICR software can't do that at least that, then there's a real problem with that software and either someone needs to spend a bit more time on it or you need to buy stuff from a vendor that does it well and has tested the bejeezus out of it.
I do some pretty insane stuff using SQL Server that a lot of folks say shouldn't be done in SQL Server and I usually tell them to shove off. This wouldn't be one of this cases. In this case, the external software should do the heavy lifting at least on the splitting and identification of the fields in the MICR.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 21, 2018 at 7:07 am
My initial reaction is that you should be able to do this in SSIS with a regex extension from either APEX or Pragmatic Works. If not then can you run it through an Azure function called from an Azure Logic App. Logic apps are ridiculously cheap. From memory you get 5M - yes million calls a month for free then then $1.25 per 10000 calls.
https://azure.microsoft.com/en-gb/pricing/details/logic-apps/
May 21, 2018 at 7:32 am
Phil Parkin - Monday, May 21, 2018 5:20 AMFor those who would like to try for themselves, a Regex tester can be found here. It suggests that Frederico's concern is not an issue, because I tested strings containing only hyphens – no match.
better this one for testing https://regex101.com/
regarding the ----
according to both sites this is a valid string c----c4d---------dc--c----b4b - and I do not believe this would be a valid for the purposes we are talking about.
May 21, 2018 at 7:33 am
aaron.reese - Monday, May 21, 2018 7:07 AMMy initial reaction is that you should be able to do this in SSIS with a regex extension from either APEX or Pragmatic Works. If not then can you run it through an Azure function called from an Azure Logic App. Logic apps are ridiculously cheap. From memory you get 5M - yes million calls a month for free then then $1.25 per 10000 calls.https://azure.microsoft.com/en-gb/pricing/details/logic-apps/
Regex is supported natively in SSIS (assuming you're prepared to write a little code – just a few lines of C#). But, in this case, a CLR is a better fit IMO – the data is being checked, rather than moved from a to b.
Having said that, Jeff sounds like he knows about this stuff and he's suggesting a non-SQL Server solution is the way to go.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 21, 2018 at 7:35 am
frederico_fonseca - Monday, May 21, 2018 7:32 AMPhil Parkin - Monday, May 21, 2018 5:20 AMFor those who would like to try for themselves, a Regex tester can be found here. It suggests that Frederico's concern is not an issue, because I tested strings containing only hyphens – no match.better this one for testing https://regex101.com/
regarding the ----
according to both sites this is a valid string c----c4d---------dc--c----b4b - and I do not believe this would be a valid for the purposes we are talking about.
Aha, your original post said "... it may return a string of (-) as valid", so that's what I tested! I know nothing about MICRs.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 21, 2018 at 9:18 am
I really think that cleaning up data before it goes to the database should be done at another tier in your system. Other languages and tools have much much better string handling and specialized edits. The case of MICR , it's pretty easy to write a regular expression for it. But SQL was never meant to be a text or string language, and you need to see which routing numbers are valid even after you do character correction.
For future reference, if Microsoft catches up with the ANSI/ISO standard SQL, you'll have a regular expression function SIMILAR TO, which is based on the POSIX regular expression functions. We picked that for the standards because it looked like POSIX was going to be a federal standard, and supported across many standard languages and packages.
Please post DDL and follow ANSI/ISO standards when asking for help.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply