March 21, 2022 at 3:38 pm
We are using an older version of SQL (2012 I believe) so I know that the SPLIT function is not available. However, I am creating a new SSIS job that intakes consumer results from a store procedure and eventually will become a csv file that is uploaded to a file share and used for different purposes. There is a new request to take the full street address and if the word 'Apt' appears, to split it into a new column called 'Apt'. I know I need to use Derived Column, but what is the function to split this column if Apt appears in the string?
Thanks for the information
March 21, 2022 at 3:48 pm
There is a STRING_Split function. If you are using a SSIS package why not use a Script component to split the string? Its the easiest way to get what yo want.
March 21, 2022 at 3:52 pm
Split implies that "apt" would be a delimiter in a string. This wouldn't be the case.
Ex. 413 W. Third Ave Apt 38 Morrison, NC 23669
You're just looking to put 38 in an Apartment field.
If you're just looking for the term "apt" then in SSIS you could use the FINDSTRING function to get a position which you could feed into a substring. You'd also need to find the length.
These get a little messy especially with a derived column. A combination of FINDSTRING and SUBSTRING functions would be used. The problem with FINDSTRING is it doesn't allow you to specify a start location, just occurrences.
If I was to do this I would use a Script Component in the dataflow setup as a transformation. You can specify a new output column and then you have the power of .net to parse this out and you can also employ regex for more powerful parsing if needed. HTH
Edit: To be clear you could split on apt not saying you can't but you would end up with 2 string:
"413 W. Third Ave" and "38 Morrison, NC 23669"
You can do this in .net easily using String.Split() and the output would be put into an array of strings. You could then grab the beginning part of string #2.
The address 123 Dorapt Ave apt 239 Dorapt, KY 423658 might cause some issues 🙂
March 21, 2022 at 4:00 pm
Thanks and you bring up some great points. I might do a script component with some regex. And because of how this data is being input and there are not any validation checks there (and no, I do not have access to the code to change that UI), I will have to do it at this level.
March 21, 2022 at 4:09 pm
Yeah I would definitely use .net if we're talking about possibly unsanitized data. It's going to be a struggle as the term "apt" probably won't always be used. You might check around for some pre-built regex that does address parsing or validation. Maybe part of this process could be to kick out bad data back to the business for correction or standardization. Just a thought. Good luck!
March 21, 2022 at 9:28 pm
I agree with TangoVictor on this one - I would be looking to have the business sanitize the data before allowing it to be handled by SSIS. one user puts it in as "apt", the next does "apt.", the next does "appt" due to a typo and so on. You are going to miss some use cases and that may render your CSV file unusable due to that.
As for access to the code to change the UI, would it be possible to make a new UI for address sanitization where you would take the string from the application and parse it as best as you can and then present it to the end user to validate? Then the end user would use the tool you created (rather than SSIS) to generate the CSV file. The advantage to this is the end user could review and modify the data as needed. If you have millions of records, this may not be reasonable, but this then allows you to push the process back to the business rather than having them push you to "fix the automation" when it gives them the wrong results due to bad data. Plus, in the event they need the SSIS package run outside of the schedule, they can now handle that. And if you build up the tool to handle command line arguments, you could even have it scheduled with windows task scheduler or a cron job if it is a linux tool. A bit more work than building an SSIS package, but better future proofing and easier to customize to fit the users experience... mind you that assumes that there are NOT that many rows in the table for them to review. If there are millions or billions and this needs to be generated daily, that's not going to end well...
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 22, 2022 at 1:15 am
Even I don't try to "split" address. My recommendation would be for you to buy some C.A.S.S. Certification software. You spend less money than you will trying to develop something on your own and they get their data from the USPS or whatever your local mail delivery service is, which also means that they do it right and so no headaches after the fact.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply