September 16, 2008 at 12:21 am
It sure does...
... but the OpenRowSet thing I showed does it right.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2008 at 12:45 am
What happened to op?
"Keep Trying"
September 16, 2008 at 1:05 am
Jeff Moden (9/16/2008)
It sure does...
It "eats" the comma from the last value.
... but the OpenRowSet thing I showed does it right.
I explained why it's not an option.
_____________
Code for TallyGenerator
September 16, 2008 at 4:30 am
Sorry for not having posted sooner. I am still reading the thread. Aside from RBAR UDFs as I have suggested earlier, perhaps you can preprocess your data outside of SQL Server using C, C++, or C#. Afterward you can employe Jeff's or Sergiy's approaches without issue.
Another option while using a preprocess approach (Source to Stage) would be to create file structures for each of your sources, eg.
File Structure Alpha
===============
Field1 10
Field2 20
Field3 15
Field4 30
File Structure Beta
===============
Field1 5
Field2 25
Field3 10
Field4 10
Using the structures as a template against your source data affords you greater flexibility to clean/massage prior to dumping into a temp or stage table. Plus, you can apply each template to a particular data source over and over again. Preprocessing the data outside of SQL Server will get the job done faster and allow you to employ downstream routines as Jeff and Sergiy have suggested.
But there is a margin of diminishing returns with every approach. I am familiar with what you are trying to accomplish and clearly everyone else who contributed posts to this thread is too. Being bias towards routines that I know work, I am still leaning on the side of a brute force RBAR approach using 100% SQL Server (UDFs). Like the preprocess template approach using tools outside of SQL Server, the routines that I spoke of in an earlier thread can be tailored toward each data source and reused over and over.
-Mike
September 16, 2008 at 6:20 am
Getting back to my initial suggestion... aside from it being potentially slow / reliant on a '-' as a delimiter, is there any reason that wouldn't have worked? That's my standard practice when I need to split a field for display purposes, such as a name field that looks like 'John Smith (123-456-789) ext. 1234'. I know it'd be more efficient to do it in the app doing the dispaly, but that isn't always an option here.
September 16, 2008 at 6:33 am
You'd better split it for data storage purposes.
And (if needed) concatenate for display purposes.
It would save you from a lot of problems.
Google "Normalization Rules" - sometimes it's useful to know essential things about stuff you're working with.
😛
_____________
Code for TallyGenerator
September 16, 2008 at 6:40 am
We do... for any fields that we are able to change. However, some fields we can't change without getting our front end or middle layer modified. That one in particular we have both ways(with the concatenated format, + fields for first name, last name, phone and ext), but there are a lot of things that still reference the old version. You can argue that it shouldn't have been that way in the first place, and I'd completely agree, but I wasn't here at that point.
September 16, 2008 at 1:57 pm
If I was you I'd create small and effective trigger on that table updating another, properly formatted table(s) when the value is updated.
Then I'd gradually move selects from old to new structure.
And I'd bomb my boss with articles about SQL injections to make him forbid ad-hoc SQL queries to database. Then I'd have a chance populate new structure without triggers.
It's all about making professional decisions.
_____________
Code for TallyGenerator
September 16, 2008 at 3:02 pm
Due to the nature of our setup, involving a complete custom built front end application, which only communicates through a custom built middle layer(which we have no access to modify in any way) and several hundred tables that I have very limited capabilities to alter(without affecting the front/middle layers), this isn't about simply deciding to be more uniform and implementing a process to get there. The entire database/coding structure is probably one the worst examples of RBAR logic (to borrow the term) imaginable. Literally thousands of triggers, many of which are several thousand lines long, and levels of sp/udf nesting that would take a week to completely map out.
That said, I wasn't wondering how to make what I was suggesting unnecessary for the future, so much as I was suggesting it for the case of the OP, who also has data that should be normalized into 3 separate fields, but for reasons he did not specify, is not.
September 16, 2008 at 3:46 pm
More than one way to 'skin' a string! 😉
CREATE TABLE #proditem (item1 VARCHAR(255))
INSERT INTO #proditem values ('CSCAR009/0001/001-PASPBD-Pedestrian automatic sliding door')
INSERT INTO #proditem values ('CSCAR009/0001/002-PASPBD2-Pedestrian automatic sliding double door')
INSERT INTO #proditem values ('CSCAR009/0001/003-PASPBDADA-Pedestrian automatic sliding door for disabilities')
Select LEFT(item1,PATINDEX('%-%',item1)-1) as firstID,
LEFT(SUBSTRING(item1,PATINDEX('%-%',item1)+1,LEN(item1)),PATINDEX('%-%',SUBSTRING(item1,PATINDEX('%-%',item1)+1, LEN(item1)))-1) as secondID
FROM #proditem
firstID secondID
CSCAR009/0001/001PASPBD
CSCAR009/0001/002PASPBD2
CSCAR009/0001/003PASPBDADA
DROP TABLE #proditem
September 16, 2008 at 5:40 pm
Sergiy (9/16/2008)
Jeff Moden (9/16/2008)
It sure does...It "eats" the comma from the last value.
... but the OpenRowSet thing I showed does it right.
I explained why it's not an option.
Could re-export what you think is good and reimport letting the split occur. Only other way I can think of to do it all in T-SQL would be to split each row into single characters in a table variable or temp table and do a quirky update with a counter that counts quotes and commas.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2008 at 6:56 pm
Jeff Moden (9/16/2008)
Could re-export what you think is good and reimport letting the split occur. Only other way I can think of to do it all in T-SQL would be to split each row into single characters in a table variable or temp table and do a quirky update with a counter that counts quotes and commas.
Did not quite get it.
Can you please show what kind of update you're talking about?
No need for a working code, just some schema.
_____________
Code for TallyGenerator
September 16, 2008 at 7:27 pm
By the way, when I said "It sure does", I meant it sure does fail... and in exactly the manner you suggested. Sorry for the confusion. 🙂
I'll be back... I'm not sure I can pull this one off... guys that wrote OpenRowSet where a whole lot smarter than me. 😀
I do see why you gave up... writing an "intelligent" true CSV splitter in T-SQL is gonna be tougher than I thought. If it were me, I'd be tempted to import like you do now, figure out which rows are worth saving, export them, and read them with OpenRowSet.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2008 at 7:44 pm
Jeff Moden (9/16/2008)
I do see why you gave up... writing an "intelligent" true CSV splitter in T-SQL is gonna be tougher than I thought.
As I said, I eventually did it, but unfortunately using loop.
That's why I need to use cursor when it comes to populating that long 3-column table.
Everybody around is happy because I manage to upload 100k+ rows files in less than 10 minutes (including all data checks and validations) when previous version took about 3 hours.
But I know how it should be...
If any time you come up with an idea how to split quoted string usint Tally table - you know who's gonna appreciate it the most. 😉
_____________
Code for TallyGenerator
September 16, 2008 at 7:47 pm
Heh... I might have it, ol' friend... I can't explain it, I just gotta do it. I'll be back. And you're right about the table... 3 columns. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 48 total)
You must be logged in to reply to this topic. Login to reply