January 24, 2011 at 10:35 am
Hi Craig
Well spotted on the placeholder piece, yes I've already hit the problem you mentioned and have not found a solution yet.
Here's ParseFile:
CREATE Function ParseFile(
@STRINGnvarchar(max),
@SEPARATORnchar(1)
)
Returns @PARSEDSTRING table(STRING nvarchar(max))
As
Begin
Declare@POSITION int = 1
Select@STRING = @STRING + @SEPARATOR
While charindex(@SEPARATOR,@STRING,@POSITION) != 0
Begin
Insert@PARSEDSTRING(STRING)
SelectSubString(@STRING,@POSITION,CharIndex(@SEPARATOR,@STRING,@POSITION) - @POSITION)
Select@POSITION = CharIndex(@SEPARATOR,@STRING,@POSITION) + 1
End
Return
End
January 24, 2011 at 11:12 am
Try swapping:
PatIndex('%"%,%"%',THE_LINE)),
-- with this:
PatIndex('%,"%,%",%',THE_LINE)),
Note where the additional commas are to detect beginning and ending doublequotes.
I may have an alternative for you but it's going to be a busy day. Will try to get back this afternoon.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 24, 2011 at 11:53 am
Craig Farrell (1/24/2011)
Try swapping:
PatIndex('%"%,%"%',THE_LINE)),
-- with this:
PatIndex('%,"%,%",%',THE_LINE)),
Note where the additional commas are to detect beginning and ending doublequotes.
I may have an alternative for you but it's going to be a busy day. Will try to get back this afternoon.
Tried that, no joy :crying:
Far away is close at hand in the images of elsewhere.
Anon.
January 24, 2011 at 2:14 pm
This solution uses a temp table to add a unique id to the data
It needs to be repeated until no updates are left
(sorry about the short names but it was getting late and I got fed up with long typing)
CREATE TABLE #tmp (I int,X varchar(1000))
INSERT #tmp VALUES (1,'1,2,3,"abc,def",4,5,"ghi,jkl",6,7,"8,"""",x,y"""",",9')
INSERT #tmp VALUES (2,'1,2,3,"abcdef",4,5,"ghijkl",6')
WITH b AS (
SELECT I,
ROW_NUMBER() OVER(PARTITION BY I ORDER BY I) AS [P],t.N
FROM #tmp x
JOIN Tally t ON t.N BETWEEN 1 AND LEN(X)
WHERE ((t.N=1 AND SUBSTRING(X,t.N,1) = '"') OR (SUBSTRING(X,t.N-1,2) LIKE '[^"]"'))
AND SUBSTRING(X,t.N-1,3) LIKE '[^"]"[^"]'
)
UPDATE a SET a.X=STUFF(a.X,b.N+CHARINDEX(',',SUBSTRING(a.X,b.N,c.N-b.N))-1,1,'|')
FROM #tmp a
JOIN b b ON b.I=a.I AND P%2=1
JOIN b c ON c.I=b.I AND c.P=b.P+1
WHERE SUBSTRING(a.X,b.N,c.N-b.N) LIKE '%,%'
Far away is close at hand in the images of elsewhere.
Anon.
January 24, 2011 at 3:12 pm
Anyone have some code to generate a million row table for this particular problem? My brain isn't quite functional at the moment, but I may have a solution that needs further testing.
January 24, 2011 at 3:17 pm
Just in case someone wants to play with my code, here it is.
create function dbo.ChangeDelimiters (
@pLine varchar(max),
@pSDim char(1),
@pRDim char(1)
) returns table
as return
with
a1 as (select 1 as N union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1),
a2 as (select
1 as N
from
a1 as a
cross join a1 as b),
a3 as (select
1 as N
from
a2 as a
cross join a2 as b),
Tally as (select top (len(@pLine))
row_number() over (order by a.N) as N
from
a3 as a
cross join a2 as b),
ItemSplit(
ItemOrder,
Item
) as (
SELECT
N,
SUBSTRING(@pLine,N,1)
FROM
Tally
WHERE
N <= LEN(@pLine)
),
QuoteSplit as (
select
row_number() over (order by ItemOrder) as ItemID,
Item
from
ItemSplit),
QuoteRep as (
select
((ROW_NUMBER() over (order by ItemID)+1)/2) as QuotePos,
ItemID,
Item
from
QuoteSplit
where
Item = '"'
),
QuoteChange as (
select
QuotePos,
min(ItemID) as StartPos,
max(ItemID) as EndPos
from
QuoteRep
group by
QuotePos
),
ReJoin as (
select
case when qc.QuotePos is not null and its.Item = @pSDim then @pRDim else its.Item end as Item
from
ItemSplit its
left outer join QuoteChange qc
on (its.ItemOrder >= qc.StartPos
and its.ItemOrder <= qc.EndPos)
)
select
Rec = ((select '' + Item from ReJoin for xml path(''),TYPE).value('.','varchar(max)'))
;
go
declare @TestTab table (
--RecID int identity(1,1),
Rec varchar(max)
);
insert into @TestTab (Rec)
select '"1,2,3",4,5,6,"7,8",9' union all
select '123,456,"101,102,103",432,654,"909,908,907"';
select
*
from
@TestTab;
select
*
from
@TestTab
cross apply dbo.ChangeDelimiters(Rec,',','~')
;
update tt set
tt.Rec = cd.Rec
from
@TestTab tt
cross apply dbo.ChangeDelimiters(Rec,',','~') cd
;
select
*
from
@TestTab
;
January 24, 2011 at 8:27 pm
See how this works for you:
Sample Data:
IF OBJECT_ID( 'DeFielder') IS NOT NULL
DROP TABLE DeFielder
CREATE TABLE DeFielder
(RowID INT IDENTITY( 1, 1),
FileTypeID INT,
FileData VARCHAR(MAX)
)
ON [PRIMARY]
GO
INSERT INTO DeFielder( FileTypeID, FileData)
SELECT1, '1,2,3,4,5,"abc,def",6,7,"yui,asd"' UNION ALL
SELECT1, '1,"qrs",3,4,5,32,6,7,81' UNION ALL
SELECT1, '1,"qrs",3,"xyz,abc,def,boo",5,32,6,7,81'
GO
Now, for the meat of the process. This is generic and I built it a bit ago, but you should be able to adapt it as needed. It transforms the data into a rowset (I know, not what you're looking for), and then all you need to do is Pivot it back to the way you want. See the adaptation on the end.
DECLARE @FileData VARCHAR(MAX),
@nextDelim INT,
@nextQuote INT,
@RowCounter INT,
@FieldCounter INT,
@Delim CHAR(1)
SELECT @RowCounter = 0,
@FieldCounter = 0,
@nextDelim = 0,
@nextQuote = 0,
@Delim = ','-- Change this to pipe, comma, or what you need. Will accept TAB via CHAR()
DECLARE curFileData Cursor FOR
SELECT FileData FROM DeFielder WHERE FileTypeID = 1
DECLARE @targetTable TABLE
(RowNum INT,
FieldPosition INT,
colValue VarChar(50) -- Make sure this fits your largest expected field. We're shoving everything in it.
)
OPEN curFileData
FETCH NEXT FROM curFileData INTO @FileData
WHILE @@FETCH_STATUS = 0
BEGIN
SET @FileData = @FileData + @Delim
Set @RowCounter = @RowCounter + 1
SET @FieldCounter = 1
WHILE @FileData <> ''
BEGIN
SET @nextDelim = CHARINDEX( @delim, @FileData)
SET @nextquote = CHARINDEX( '"', @FileData)
-- If 0 on quote skip the quoted logic... no more quotes in string, make it think it's
-- past next comma.
IF @nextQuote = 0 SET @nextQuote = 999999
IF @nextDelim = @nextQuote
RAISERROR('Serious problem in defielder, please check it', 16, 1)
PRINT @nextDelim
PRINT @nextQuote
-- Next Delimeter is hit before the next record is quoted.
IF @nextDelim < @nextQuote
BEGIN
INSERT INTO @targetTable VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 1, @nextDelim - 1))
SET @FileData = STUFF( @FileData, 1, @nextDelim, '')
END
IF @nextQuote = 1
BEGIN
--If it's not the first character for the first field, something's gone wrong
-- Always working on the 'front' field, so skip the quote. Skip it on the end too.
INSERT INTO @targetTable
VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 2, CHARINDEX( '"', @fileData, @nextQuote + 1) -2 ))
SET @FileData = STUFF( @FileData, 1, CHARINDEX( '"', @fileData, @nextQuote + 1) +1 /*add the delim*/, '')
END
PRINT @FileData
SET @FieldCounter = @FieldCounter + 1
END
FETCH NEXT FROM curFileData INTO @FileData
END
CLOSE curFileData
DEALLOCATE curFileData
SELECT * FROM @targetTable
And for the pivot:
SELECT
RowNum,
[1], [2], [3], [4], [5], [6], [7], [8], [9]
FROM
@targetTable AS tt
PIVOT
( MIN( ColValue )
FOR FieldPosition IN ( [1], [2], [3], [4], [5], [6], [7], [8], [9])
) AS pvt
This was a chunk of code I was working on to try out procedures passing table variables around, still haven't gotten around to it but it seemed like a worthy candidate. There may be a faster, more efficient way to do this, but I know this functions. It will get you out of the hole until someone comes up with better stuff. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 24, 2011 at 8:56 pm
Million Row Testbed:
INSERT INTO DeFielder (FileTypeID, FileData)
SELECT
1,
CASE WHEN t.N%2 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%3 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%4 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%5 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%6 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%7 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%8 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%9 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END + ',' +
CASE WHEN t.N%10 = 0 THEN '"' + CONVERT( VARCHAR(150), NEWID()) + ',' + CONVERT( VARCHAR(150), NEWID()) + '"' ELSE CONVERT( VARCHAR(15), t.N) END AS FileData
FROM
tempdb..Tally AS t,
tempdb..Tally AS t2
WHERE
t.N <= 10000
AND t2.N <= 100
Modified code. Use this, not the above. VARCHAR(4000) makes sure you don't bang your head on the wall, and I left some print statements in the above that should have been removed after I tested on the new sample data to make sure I had grabbed the finished one.
SET NOCOUNT ON
DECLARE @FileData VARCHAR(MAX),
@nextDelim INT,
@nextQuote INT,
@RowCounter INT,
@FieldCounter INT,
@Delim CHAR(1)
SELECT @RowCounter = 0,
@FieldCounter = 0,
@nextDelim = 0,
@nextQuote = 0,
@Delim = ','-- Change this to pipe, comma, or what you need. Will accept TAB via CHAR()
DECLARE curFileData Cursor FOR
SELECT FileData FROM DeFielder WHERE FileTypeID = 1
DECLARE @targetTable TABLE
(RowNum INT,
FieldPosition INT,
colValue VarChar(4000) -- Make sure this fits your largest expected field. We're shoving everything in it.
)
OPEN curFileData
FETCH NEXT FROM curFileData INTO @FileData
WHILE @@FETCH_STATUS = 0
BEGIN
SET @FileData = @FileData + @Delim
Set @RowCounter = @RowCounter + 1
SET @FieldCounter = 1
WHILE @FileData <> ''
BEGIN
SET @nextDelim = CHARINDEX( @delim, @FileData)
SET @nextquote = CHARINDEX( '"', @FileData)
-- If 0 on quote skip the quoted logic... no more quotes in string, make it think it's
-- past next comma.
IF @nextQuote = 0 SET @nextQuote = 999999
IF @nextDelim = @nextQuote
RAISERROR('Serious problem in defielder, please check it', 16, 1)
-- Next Delimeter is hit before the next record is quoted.
IF @nextDelim < @nextQuote
BEGIN
INSERT INTO @targetTable VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 1, @nextDelim - 1))
SET @FileData = STUFF( @FileData, 1, @nextDelim, '')
END
IF @nextQuote = 1
BEGIN
--If it's not the first character for the first field, something's gone wrong
-- Always working on the 'front' field, so skip the quote. Skip it on the end too.
INSERT INTO @targetTable
VALUES ( @RowCounter, @FieldCounter, SUBSTRING( @FileData, 2, CHARINDEX( '"', @fileData, @nextQuote +1) -2 ))
SET @FileData = STUFF( @FileData, 1, CHARINDEX( '"', @fileData, @nextQuote + 1) +1 /*add the delim*/, '')
END
SET @FieldCounter = @FieldCounter + 1
END
FETCH NEXT FROM curFileData INTO @FileData
END
CLOSE curFileData
DEALLOCATE curFileData
This dies in the million row test. Time to complete:
6:30 without the pivot, 7:45 with the pivot, on my desktop.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 25, 2011 at 4:56 am
Hi.
Wow - the help on here is fantastic. I'm going to have to start offering my help to others more - I'm really starting to feel guilty 🙂
Lynn I'm currently working through your solution, which I love due to the lack of loops. A big thanks to everyone who has offered solutions and help!
Brian
January 25, 2011 at 5:02 am
Lynn, won't QuoteSplit and ItemSplit always contain the same data? Isn't QuoteSplit redundant here, and QuoteRep could really just pull from ItemSplit?
Or have I missed something? 🙂
January 25, 2011 at 5:21 am
My query had a problem and for the record I post a revised version
Based on a million rows of my test data
'1,2,3,"abc,def",4,5,"ghi,jkl",6,7,"8,"""",x,y"""",",9'
on my desktop, it took 7 iterations taking between 52 and 66 seconds per iteration
WITH b AS (
SELECT x.I,
ROW_NUMBER() OVER(PARTITION BY x.I ORDER BY x.I,t.N) AS [P],t.N
FROM TestTab x
JOIN Tally t ON t.N BETWEEN 1 AND LEN(X)
WHERE ((t.N=1 AND SUBSTRING(X,t.N,1) = '"') OR (SUBSTRING(X,t.N-1,2) LIKE '[^"]"'))
AND (t.N>1 AND SUBSTRING(X+' ',t.N-1,3) LIKE '[^"]"[^"]')
)
UPDATE a SET a.X=STUFF(a.X,b.N+CHARINDEX(',',SUBSTRING(a.X,b.N,c.N-b.N))-1,1,'|')
FROM TestTab a
JOIN b b ON b.I=a.I AND P%2=1
JOIN b c ON c.I=b.I AND c.P=b.P+1
WHERE SUBSTRING(a.X,b.N,c.N-b.N) LIKE '%,%'
Far away is close at hand in the images of elsewhere.
Anon.
January 25, 2011 at 7:45 am
Hi everyone
Once again thanks to everyone for all the suggestions. I have implemented Lynn's suggestion with one or two minor tweaks, and it's working like a charm 🙂
Thank you again to everyone who responded to my post, and got me out of this hole. Particular thanks to Lynn for the solution with no loops at all ! That is some VERY nice SQL! I'm 16 years working with T-SQL and I can honestly say I would never have been able to come up with that!
Brian
January 25, 2011 at 8:52 am
Brian, would you please post your final solution? I would like to see what you tweaked.
I usually make things more complicated than they need to be to start, so this would help me learn where I went off-track a bit.
January 25, 2011 at 9:03 am
Hi
I removed QuoteSplit as it is redundant, and I added an A4, to give me more than 10000 rows available to the tally table. That's it really, apart from layout/naming conventions to suit our company etc!
Before I paste the code, how do you create those code sample boxes that everyone uses here which retain the formatting and layout?
Brian
January 25, 2011 at 9:44 am
Brian McGee-355525 (1/25/2011)
HiI removed QuoteSplit as it is redundant, and I added an A4, to give me more than 10000 rows available to the tally table. That's it really, apart from layout/naming conventions to suit our company etc!
Before I paste the code, how do you create those code sample boxes that everyone uses here which retain the formatting and layout?
Brian
Over on the left, while posting, there's IfCode shortcuts. Highlight your code and then click the one that says code="sql".
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply