February 23, 2011 at 6:46 am
JJH82 (2/22/2011)
hi kumar20, the script cannot process some name characters, e.g. '&', '<'',2 Column_Names "FIELD_NAME" "READ_FLG" ,2 Old_Column_Values "Rural & Remote" "N" ,2 New_Column_Values "" ""'
'+2 Column_Names "ATTRIB_VALUE" "SCORE" +2 New_Column_Values "<5 volunteers:1 staff" "0" +2 Old_Column_Values "" ""'
is there a way to overcome that?
Sorry I didn't get to the code last night. The above smacks of another question that I need to ask... will the values ever have embedded commas? Will they ever have embedded double quotes?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 23, 2011 at 3:33 pm
There will not be any comma in the strings, but the strings are wrapped in double quotes
Jeff Moden (2/23/2011)
JJH82 (2/22/2011)
hi kumar20, the script cannot process some name characters, e.g. '&', '<'',2 Column_Names "FIELD_NAME" "READ_FLG" ,2 Old_Column_Values "Rural & Remote" "N" ,2 New_Column_Values "" ""'
'+2 Column_Names "ATTRIB_VALUE" "SCORE" +2 New_Column_Values "<5 volunteers:1 staff" "0" +2 Old_Column_Values "" ""'
is there a way to overcome that?
Sorry I didn't get to the code last night. The above smacks of another question that I need to ask... will the values ever have embedded commas? Will they ever have embedded double quotes?
February 23, 2011 at 3:38 pm
Ok... thanks. I'll be back.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2011 at 8:49 pm
Dang... I've just not been able to devote the time to this problem. Do you have a solution, yet, or do you still need help on this one?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2011 at 10:17 pm
JJH82 (2/17/2011)
a table called CLOB contains ROW_ID (pk)PARSE_CLOB (where the strings sit)
Are you using OPENROWSET BULK SINGLE_CLOB to load the initial table from a file? If so, best performance would come from combining the reading and decoding into a single step. If you can provide a sample text file and a table definition to hold the final parsed output, I'd be happy to provide a solution.
February 25, 2011 at 5:12 am
SQLkiwi (2/24/2011)
JJH82 (2/17/2011)
a table called CLOB contains ROW_ID (pk)PARSE_CLOB (where the strings sit)
Are you using OPENROWSET BULK SINGLE_CLOB to load the initial table from a file? If so, best performance would come from combining the reading and decoding into a single step. If you can provide a sample text file and a table definition to hold the final parsed output, I'd be happy to provide a solution.
That'll be worth the price of admission... please see attached. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 5:31 pm
to decode 9632 records into 32,000+, it took 9 hours and 45 minutes using kumar20's method.
I created a loop to process 1000 Records at a time.
any comments or suggestions please?
SET NOCOUNT ON
DECLARE @T_Data TABLE
(
ID INT IDENTITY(1, 1), Data NVARCHAR(MAX)
)
INSERT INTO @T_Data
SELECT ',2 Column_Name "Col1" "Col2" , 2 New_Column_Value "A" "B", 2 Old_Column_Value "a" "b"'
UNION
SELECT ',2 Field_Name "F1" "F2", 2 New_Field_Name "F1" "F2", 2 Old_Field_Name "f1" "f2" ,1 Column_Name "Col3" ,1 New_Column_Value "C", 1 Old_Column_Value "c"'
UNION
SELECT ',1 Field_Name "F3", 1 New_Field_Name "F3", 1 Old_Field_Name "f3" ,1 Column_Name "Col6" ,1 New_Column_Value "1<", 1 Old_Column_Value "2&"'
DECLARE @T_FirstLevel TABLE
(
Lvl1ID INT IDENTITY(1, 1), ID INT, Lvl1Data NVARCHAR(MAX)
)
INSERT INTO @T_FirstLevel
SELECT ID, LTRIM(SUBSTRING(Data, number+1, CHARINDEX(',', Data, number+1)-number - 1) ) AS 'Data'
FROM (SELECT ID, Data + ',' AS Data FROM @T_Data) A
INNER JOIN master..spt_values N ON N.number < LEN(A.Data)
AND N.type = 'p'
WHERE LTRIM(SUBSTRING(Data, number+1, CHARINDEX(',', Data, number+1)-number - 1) ) <> ''
AND SUBSTRING(Data, number, 1) = ','
DECLARE @T_SecondLevel TABLE
(
Lvl2ID INT IDENTITY(1, 1), Lvl1ID INT, ID INT, Lvl2Data NVARCHAR(MAX),
ColRow INT
)
INSERT INTO @T_SecondLevel
SELECT Lvl1ID, ID, LTRIM(SUBSTRING(Lvl1Data, number+1, CHARINDEX(' ', Lvl1Data, number+1)-number - 1) ) AS 'Data'
, ROW_NUMBER() OVER (PARTITION BY Lvl1ID ORDER BY Lvl1ID, ID) AS ColRow
FROM (SELECT Lvl1ID, ID, ' ' + Lvl1Data + ' ' AS Lvl1Data FROM @T_FirstLevel) A
INNER JOIN master..spt_values N ON N.number < LEN(A.Lvl1Data)
AND N.type = 'p'
WHERE LTRIM(SUBSTRING(Lvl1Data, number+1, CHARINDEX(' ', Lvl1Data, number+1)-number - 1) ) <> ''
AND SUBSTRING(Lvl1Data, number, 1) = ' '
DECLARE @T_Final TABLE
(
Lvl2ID INT, Lvl1ID INT, ID INT, Lvl2Data NVARCHAR(MAX), ColRow INT,
Cnt INT, GrpCnt INT, Batch INT
)
INSERT INTO @T_Final
SELECT T.*, (N.number + 1) AS GrpCnt, NULL AS Batch
FROM (SELECT
Lvl2ID, Lvl1ID, ID, Lvl2Data
, CASE WHEN ColRow > 3 THEN 3 ELSE ColRow END AS ColRow
, COUNT(CASE WHEN PATINDEX('%"%', Lvl2Data) > 0 THEN 1 ELSE NULL END) OVER (PARTITION BY Lvl1ID) AS Cnt
FROM @T_SecondLevel) T
LEFT JOIN master..spt_values N ON (N.type = 'p' AND N.number <= (T.Cnt - 1) AND T.ColRow < 3)
ORDER BY T.Lvl2ID
--create a table for collection final results
IF EXISTS (
SELECT *
FROM sysobjects
WHERE id = object_id(N'[dbo].T_Final'))
drop table [dbo].T_Final
CREATE Table T_Final
(
RowID int,
FieldType nvarchar(8),
FieldName nvarchar(4000),
NewValue nvarchar(4000),
OldValue nvarchar(4000)
)
--creating a loop to run 1000 records at a time
DECLARE @y int
DECLARE @x int
DECLARE @Count int
DECLARE @start int
DECLARE @end int
SET @x = 1000
SET @Count = 0
set @y = (select max(ID) from @T_Final)
--print @y
--print @y/@x+1
while (@Count < @y/@x+1)
begin
set @start = @x*@Count+1
if @x*(@Count+1)>@y
set @end = @y
else
set @end = @x*(@Count+1)
--print 'Start from '+convert(nvarchar(12),@start)+' End at '+convert(nvarchar(12),@end)
--
--set @Count = @Count+1
--end
SET @T1 = getdate()
UPDATE F1
SET GrpCnt = F2.Row
FROM @T_Final F1
INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY Lvl1ID ORDER BY Lvl2ID) AS Row
FROM @T_Final WHERE GrpCnt IS NULL ) F2 ON F1.Lvl2ID = F2.Lvl2ID
WHERE F1.GrpCnt IS NULL AND F1.ID between @start AND @end
/*
UPDATE F1
SET Batch = F2.Lvl2Data
FROM @T_Final F1
INNER JOIN @T_Final F2 ON F1.Lvl1ID = F2.Lvl1ID
AND F2.ColRow = 1
*/
SET @T2 = getdate()
print 'Runtime of 1st updating @T_Final table is '+convert(nvarchar(12),datediff(ss,@T1,@T2))+' seconds.'
SET @T1 = getdate()
UPDATE F1
SET Batch = N.Number
FROM @T_Final F1
INNER JOIN master..spt_values N ON F1.Lvl1ID BETWEEN N.number AND (N.number * 3)
WHERE N.number <> 0 AND F1.ID between @start AND @end
--SELECT * FROM @T_Final
SET @T2 = getdate()
print 'Runtime of 2nd updating @T_Final table is '+convert(nvarchar(12),datediff(ss,@T1,@T2))+' seconds.'
SET @T1 = getdate()
INSERT INTO T_Final
SELECT F1.ID
, MAX(CASE WHEN F1.Lvl2Data LIKE 'Column%' THEN 'Column'
WHEN F1.Lvl2Data LIKE 'Field%' THEN 'Field' END) as 'FieldType'
, MAX(CASE WHEN F1.Lvl2Data IN ('Column_Names', 'Field_Names') AND F2.ColRow = 3 THEN F2.Lvl2Data END) as 'FieldName'
, MAX(CASE WHEN F1.Lvl2Data LIKE 'New%'/*IN ('New_Column_Values', 'New_Field_Names')*/ AND F2.ColRow = 3 THEN F2.Lvl2Data END) AS 'NewValue'
, MAX(CASE WHEN F1.Lvl2Data Like 'Old%'/*IN ('Old_Column_Values', 'Old_Field_Names')*/ AND F2.ColRow = 3 THEN F2.Lvl2Data END) AS 'OldValue'
FROM @T_Final F1
INNER JOIN @T_Final F2 ON F1.Lvl1ID = F2.Lvl1ID AND F1.GrpCnt = F2.GrpCnt AND F1.Batch = F2.Batch AND F1.ID between @start AND @end
GROUP BY F1.ID, F1.GrpCnt, F1.Batch
ORDER BY 1, 2
SET @T2 = getdate()
print 'Runtime of producing final table is '+convert(nvarchar(12),datediff(ss,@T1,@T2))+' seconds.'
set @Count = @Count+1
end
February 27, 2011 at 7:39 pm
JJH82 (2/27/2011)
to decode 9632 records into 32,000+, it took 9 hours and 45 minutes using kumar20's method.I created a loop to process 1000 Records at a time. any comments or suggestions please?
Are you able to use CLR code on this server? The task at hand is procedural by nature, and would benefit from being written in a language that is optimized for that. The reason I haven't responded with a solution so far is that the majority of the cost is in decoding the records, not bulk loading the CLOB in the first place - the optimization I had in mind just combines the bulk load with the decoding.
February 27, 2011 at 7:47 pm
yes, i believe so
February 27, 2011 at 8:14 pm
JJH82 (2/27/2011)
yes, i believe so
Specifically, I am asking if it is OK to use a SQLCLR stored procedure or function on this SQL Server. Do you have any experience in a .NET language like C#?
February 27, 2011 at 8:21 pm
yes, I can create or use stored procedures on the server.
:crazy:but i have not .Net programming experience
February 27, 2011 at 8:58 pm
JJH82 (2/27/2011)
to decode 9632 records into 32,000+, it took 9 hours and 45 minutes using kumar20's method.I created a loop to process 1000 Records at a time.
any comments or suggestions please?
Just out of curiosity, what was the maximum length of any of those records in bytes?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 9:01 pm
nvarchar(4000) is set to the column, so I assume that no string is longer than 4000.:-)
February 27, 2011 at 9:04 pm
JJH82 (2/27/2011)
nvarchar(4000) is set to the column, so I assume that no string is longer than 4000.:-)
Very cool. I'm sure that Paul will come up with something great on the CLR side of the house. It's been proven many times that a properly written CLR will beat any other kind of splitter/parser in T-SQL.
Still, I'll give it a try on the T-SQL side.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 9:06 pm
thanks guys
Viewing 15 posts - 16 through 30 (of 40 total)
You must be logged in to reply to this topic. Login to reply