Splitting a String in SQL - Help Please!

Question

Post reply

Splitting a String in SQL - Help Please!

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004409 More actions · Answer 1

It sure does...

... but the OpenRowSet thing I showed does it right.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChiragNS One Orange Chip Points: 26137 More actions · Answer 2

ChiragNS

One Orange Chip

Points: 26137

September 16, 2008 at 12:45 am

#872154

What happened to op?

"Keep Trying"

Sergiy SSC Guru Points: 110208 More actions · Answer 3

Jeff Moden (9/16/2008)
It sure does...

It "eats" the comma from the last value.

... but the OpenRowSet thing I showed does it right.

I explained why it's not an option.

_____________
Code for TallyGenerator

Mike DiRenzo SSCrazy Points: 2013 More actions · Answer 4

Sorry for not having posted sooner. I am still reading the thread. Aside from RBAR UDFs as I have suggested earlier, perhaps you can preprocess your data outside of SQL Server using C, C++, or C#. Afterward you can employe Jeff's or Sergiy's approaches without issue.

Another option while using a preprocess approach (Source to Stage) would be to create file structures for each of your sources, eg.

File Structure Alpha

===============

Field1 10

Field2 20

Field3 15

Field4 30

File Structure Beta

===============

Field1 5

Field2 25

Field3 10

Field4 10

Using the structures as a template against your source data affords you greater flexibility to clean/massage prior to dumping into a temp or stage table. Plus, you can apply each template to a particular data source over and over again. Preprocessing the data outside of SQL Server will get the job done faster and allow you to employ downstream routines as Jeff and Sergiy have suggested.

But there is a margin of diminishing returns with every approach. I am familiar with what you are trying to accomplish and clearly everyone else who contributed posts to this thread is too. Being bias towards routines that I know work, I am still leaning on the side of a brute force RBAR approach using 100% SQL Server (UDFs). Like the preprocess template approach using tools outside of SQL Server, the routines that I spoke of in an earlier thread can be tailored toward each data source and reused over and over.

-Mike

Garadin One Orange Chip Points: 29613 More actions · Answer 5

Getting back to my initial suggestion... aside from it being potentially slow / reliant on a '-' as a delimiter, is there any reason that wouldn't have worked? That's my standard practice when I need to split a field for display purposes, such as a name field that looks like 'John Smith (123-456-789) ext. 1234'. I know it'd be more efficient to do it in the app doing the dispaly, but that isn't always an option here.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Sergiy SSC Guru Points: 110208 More actions · Answer 6

You'd better split it for data storage purposes.

And (if needed) concatenate for display purposes.

It would save you from a lot of problems.

Google "Normalization Rules" - sometimes it's useful to know essential things about stuff you're working with.

😛

_____________
Code for TallyGenerator

Garadin One Orange Chip Points: 29613 More actions · Answer 7

We do... for any fields that we are able to change. However, some fields we can't change without getting our front end or middle layer modified. That one in particular we have both ways(with the concatenated format, + fields for first name, last name, phone and ext), but there are a lot of things that still reference the old version. You can argue that it shouldn't have been that way in the first place, and I'd completely agree, but I wasn't here at that point.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Sergiy SSC Guru Points: 110208 More actions · Answer 8

If I was you I'd create small and effective trigger on that table updating another, properly formatted table(s) when the value is updated.

Then I'd gradually move selects from old to new structure.

And I'd bomb my boss with articles about SQL injections to make him forbid ad-hoc SQL queries to database. Then I'd have a chance populate new structure without triggers.

It's all about making professional decisions.

_____________
Code for TallyGenerator

Garadin One Orange Chip Points: 29613 More actions · Answer 9

Due to the nature of our setup, involving a complete custom built front end application, which only communicates through a custom built middle layer(which we have no access to modify in any way) and several hundred tables that I have very limited capabilities to alter(without affecting the front/middle layers), this isn't about simply deciding to be more uniform and implementing a process to get there. The entire database/coding structure is probably one the worst examples of RBAR logic (to borrow the term) imaginable. Literally thousands of triggers, many of which are several thousand lines long, and levels of sp/udf nesting that would take a week to completely map out.

That said, I wasn't wondering how to make what I was suggesting unnecessary for the future, so much as I was suggesting it for the case of the OP, who also has data that should be normalized into 3 separate fields, but for reasons he did not specify, is not.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

dbajunior SSCrazy Points: 2307 More actions · Answer 10

More than one way to 'skin' a string! 😉

CREATE TABLE #proditem (item1 VARCHAR(255))

INSERT INTO #proditem values ('CSCAR009/0001/001-PASPBD-Pedestrian automatic sliding door')

INSERT INTO #proditem values ('CSCAR009/0001/002-PASPBD2-Pedestrian automatic sliding double door')

INSERT INTO #proditem values ('CSCAR009/0001/003-PASPBDADA-Pedestrian automatic sliding door for disabilities')

Select LEFT(item1,PATINDEX('%-%',item1)-1) as firstID,

LEFT(SUBSTRING(item1,PATINDEX('%-%',item1)+1,LEN(item1)),PATINDEX('%-%',SUBSTRING(item1,PATINDEX('%-%',item1)+1, LEN(item1)))-1) as secondID

FROM #proditem

firstID secondID

CSCAR009/0001/001PASPBD

CSCAR009/0001/002PASPBD2

CSCAR009/0001/003PASPBDADA

DROP TABLE #proditem

Jeff Moden SSC Guru Points: 1004409 More actions · Answer 11

Sergiy (9/16/2008)
Jeff Moden (9/16/2008)
It sure does...
It "eats" the comma from the last value.
... but the OpenRowSet thing I showed does it right.
I explained why it's not an option.

Could re-export what you think is good and reimport letting the split occur. Only other way I can think of to do it all in T-SQL would be to split each row into single characters in a table variable or temp table and do a quirky update with a counter that counts quotes and commas.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 12

Jeff Moden (9/16/2008)
Could re-export what you think is good and reimport letting the split occur. Only other way I can think of to do it all in T-SQL would be to split each row into single characters in a table variable or temp table and do a quirky update with a counter that counts quotes and commas.

Did not quite get it.

Can you please show what kind of update you're talking about?

No need for a working code, just some schema.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004409 More actions · Answer 13

By the way, when I said "It sure does", I meant it sure does fail... and in exactly the manner you suggested. Sorry for the confusion. 🙂

I'll be back... I'm not sure I can pull this one off... guys that wrote OpenRowSet where a whole lot smarter than me. 😀

I do see why you gave up... writing an "intelligent" true CSV splitter in T-SQL is gonna be tougher than I thought. If it were me, I'd be tempted to import like you do now, figure out which rows are worth saving, export them, and read them with OpenRowSet.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 14

Jeff Moden (9/16/2008)
I do see why you gave up... writing an "intelligent" true CSV splitter in T-SQL is gonna be tougher than I thought.

As I said, I eventually did it, but unfortunately using loop.

That's why I need to use cursor when it comes to populating that long 3-column table.

Everybody around is happy because I manage to upload 100k+ rows files in less than 10 minutes (including all data checks and validations) when previous version took about 3 hours.

But I know how it should be...

If any time you come up with an idea how to split quoted string usint Tally table - you know who's gonna appreciate it the most. 😉

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004409 More actions · Answer 15

Heh... I might have it, ol' friend... I can't explain it, I just gotta do it. I'll be back. And you're right about the table... 3 columns. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)