Splitting a String in SQL - Help Please!

Question

Splitting a String in SQL - Help Please!

Viewing 15 posts - 16 through 30 (of 48 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 1

I count 8 columns in the first line and only 7 columns in the second line. Is that correct? And, the single quotes on each end are also actually in the file as well, correct?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 2

Single quotes are for test strings.

They are not from file, of course. 🙂

And, I believe, there are 7 columns in both strings.

1st one should be split into:

1Pick-up

2

3

41234

52008-08-24

6 "2008-08-24"

717", unknown brand

_____________
Code for TallyGenerator

Mike DiRenzo SSCrazy Points: 2013 More actions · Answer 3

From what I can tell so far, you are trying to parse a whole row of information into columns to form a single record where some of the columns themselves need sub-parsing.

As I mentioned earlier, Tally or numbers table approach would help but I can see it getting complex. The UDFs that I have created are designed for an RBAR approach.

With my approach, you pass in the whole line, determine the number of delimiters, use the # of delimiters as domain for a loop, do a string parse, return back a table fully hydrated with your desired output. In some cases, where you are going to sub-parse a column, recursivity would be needed.

I use a combination of three UDFs to do this:

fn_Delim_To_Table

fn_PatCount

fn_Repetitive_Str_Parse

Some of the code I found right here in Sql Server Central. Efficiency is OBVIOUSLY lacking in this approach where brute force reigns supreme.

I will continue to monitor this thread because if there is better solution, and I believe there is, I am wide open for suggestion.

Good luck,

Mike

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 4

Sergiy (9/14/2008)
Single quotes are for test strings.
They are not from file, of course. 🙂
And, I believe, there are 7 columns in both strings.
1st one should be split into:
1Pick-up
2
3
41234
52008-08-24
6 "2008-08-24"
717", unknown brand

Heh... just making sure... I had to ask because I just got done splitting a similar "properly formed" CSV that actually had the single quotes in them like this.

I've got a couple of "obligations" in the real world that I have to take care of today, Sergiy... I'll try to get back to this soon because these types of things are so much fun 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 5

Sorry for the delay...

Absolutely the easiest way to read a "properly formed" CSV file is using OpenRowSet. If the file is called "ImportTest.csv" and it's stored in "C:\Temp", and the file has no headers, then the following does a dandy job...

SELECT *

FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',

'Text;Database=C:\Temp;HDR=NO;FMT=Delimited',

'SELECT * FROM ImportTest.csv')

This is just one way to import the file. Of course, it can easily be turned into an INSERT/SELECT or a SELECT/INTO.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 6

Jeff Moden (9/15/2008)
Sorry for the delay...
Absolutely the easiest way to read a "properly formed" CSV file is using OpenRowSet. If the file is called "ImportTest.csv" and it's stored in "C:\Temp", and the file has no headers, then the following does a dandy job...
SELECT *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Text;Database=C:\Temp;HDR=NO;FMT=Delimited',
'SELECT * FROM ImportTest.csv')
This is just one way to import the file. Of course, it can easily be turned into an INSERT/SELECT or a SELECT/INTO.

Well, I heard about this approach 😉

But there is a little problem.

We need to process about 40 different file formats, all with different number of columns, some have different number of columns in "Header" and "Body" rows.

And we expect the number of formats to grow.

Files happened to have all sorts of cra..(shut up, Sergiy!) inside, including broken lines (e.g. CR/LF in "Description" fields), badly formatted dates, missing quotes around values with delimiters inside, missing columns in some lines, etc.

That's why I do BULK INSERT the file into temp table with 2 columns: INT Row Identity and TEXT Row ?ontents, and then parse it into "vertical" table having columns RowID, ColID, Value.

Then I can count columns, make sure that there are no missing or not expected ones, validate data in each column without dynamic SQL mess, check file structure according expected schema, and, if everything is right, load values from specified set of ColID to destination tables (pivoting it back, of course)

And, if anything is wrong, I can automatically report what exactly and where is happened to be wrong and which file cannot be processed.

I'm not sure I can achieve it with Microsoft.Jet.OLEDB.4.0 driver. It will not provide me with specific enough error mesages.

And because some of our customers on another side of the Globe I'm trying to avoid answerring urgent calls about missing crucial files at 2am. :w00t:

I have this thing working, and working OK.

But because I cannot figure out the way to do it with Tally I loop through every row.

Not bad, when the file is up to 100 rows, but some of them have more then 50k - and it hurts.

:sick:

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 7

Heh... understood... and I frequently do the same thing if the file is crap...

... but that's not what you asked.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 8

Jeff Moden (9/15/2008)
Heh... understood... and I frequently do the same thing if the file is crap...
... but that's not what you asked.

I asked about splitting of bunch of quoted delimited strings without loop.

So far I saw only option without considering quotes.

Tried to make it myself - and abandoned when it has become overcomplicated.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 9

You want the rows you posted to be split instead of imported?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 10

BTW... as you already know, the rows you originally posted import just fine because they are, despite appearances, properly formatted CSV's. The delimiters also line up and could be imported using a BCP Format File with \" to mark the quotes in the delimeters. These files typical for you or are you having to put up with cannon fodder from people that aren't qualified to provide data? 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 11

If I try to import file with wrong formatting (broken rows, unquoted delimiters in strings, missing header or header row where it not supposes to be, etc.) I'll end up with some error mesage about "ODBC driver error" or "Binary string truncated" or something like that.

If my import process will report such error to a customer 11 hours away from me I'll get a call at 2am: "What the heck is that?"

It's not gonna make me happy, even if I'm not asleep and sitting with friends drinking beer or whatever has left by the time.:hehe:

That's why I import file rows as a whole thing (bullet proof, especially when it's imported into TEXT column) and then I split them in order to process futher.

Of course, doing that split I need to consider quotation rules used fot delimited files.

Can you suggest any another way?

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 12

The delimiters also line up and could be imported using a BCP Format File with \" to mark the quotes in the delimeters.

BCP is too vulnerable to format errors.

And it does not let you know what exactly was wrong with the file.

These files typical for you or are you having to put up with cannon fodder from people that aren't qualified to provide data? 😀

I put up with manual extracts from Excel, sometimes manually "corrected" by users before sending to us.

Is it enough to get the picture?

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 13

Heh... yeah, I wouldn't want to take a call in the middle of a cold one, either.

Split functions always work, especially on a single delimiter. Obviously, your stuff has several delimiters that look like some of the text. So, going back to rule one of a single delimiter... you could split on the results of this...

DECLARE @String1 VARCHAR(8000)

--SET @String1 = '"Pick-up",,"",1234,"2008-08-24", "2008-08-24","17"", unknown brand"'

SET @String1 = '"""Next Day"" delivery",5.00,"9,5"" box",10,"2008-09-01", "2008-09-11","9"" nails"'

SELECT

REPLACE(

SUBSTRING(@String1,2,LEN(@String1)-2)

,'" ,"','þ')

,'", "','þ')

,'","' ,'þ')

,'",' ,'þ')

,',"' ,'þ')

,'"þ' ,'"')

,'þ"' ,'"')

,'""' ,'"')

Heh, yeah, I know... that code is a real thorn in the side... 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 14

Ummm... all those bloody REPLACE's are gonna be slow. I just thought of another way using our friends the Tally table and the "quirky update". It'll take me a bit to put together if you think it'll be worth it. Not sure I can make it "read minds" and do necessary in-flight corrections, but it might be a lot faster.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 15

Even worse - those REPLACEs don't work.

For another string :

'"Pick-up",,"",1234,"2008-08-24", "2008-08-24","17"", unknown brand"'

it fails.

_____________
Code for TallyGenerator