May 31, 2006 at 10:52 am
I am looking over a database that I have inherited, and there is one field that I need to work with that is being a pain.
This is a Text field with multiple "records" with multiple values in each appended into it. These records / values are of the format:
AAAA;BBBB;CCCC|aaa;BBBBBBB;xxxxx|aaaaaaa;bbb;xx
Each "record" is separated by a pipe character, and each field is separated by a semicolon.
Now, in cases where the Datalength of the field is 8k or less, this is relatively simple to parse. However, parsing more than 8k is difficult, and made more so by the fact that the READTEXT function will not place the text read into a variable. So, my question is:
How would you parse this data, when more than 8k exists in the TEXT field, using only T-SQL.
Thanks in advance,
Brendt
May 31, 2006 at 2:04 pm
This is the best I have come up with so far. The first version works for a single row, the second is a version which processes a whole rowset. The code assumes an element length < 100, but this could be up to 8000. It also only splits data once, so you would have to add code to do the second step (splitting the row elements into individual data values) - but that would probably be easier since (we hope) rows will be <8001 chars.
@separator varchar(3)
@retval table(item_order int identity, val varchar(100))
@text table(txt text)
@id int
@id = 8, @separator = ' '
@text select DisplayText from dbo.Diary_Entry where reference = @id
@i = 1, @len = datalength(txt), @curpos = 1 from @text
@curpos <= @len
select @lastpos=@curpos
,@patindex=case @patindex when 0 then @len-@lastpos else @patindex end
,@val=substring(txt,@lastpos,@patindex)
,@curpos=@lastpos+@patindex+1
* from @retval order by item_order
@separator varchar(3)
@retval table(rowid int, item_order int, val varchar(100))
@text table(txt text)
@val varchar(100), @lastpos int, @len int, @patindex int, @curpos int, @i int, @j-2 int
@separator = ' ', @j-2 = 0, @i = 1
@vars(Vcurpos, Vlen, Vkey, Vtext)
select 1, datalength(de.displaytext), de.reference, de.displaytext from dbo.Diary_Entry de
@i > 0
update v
,@patindex=case @patindex when 0 then @len-@lastpos else @patindex end
,@curpos=@lastpos+@patindex+1
,Vcurpos=@curpos
* from @retval order by rowid, item_order
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
May 31, 2006 at 3:06 pm
Unfortunately, the rows include some with Text of over 30kb size. So, although this would work for about 65% of the rows, I am still stymied as to how to handle the remainder of the rows.
Thanks for your efforts, though. I appreciate them.
May 31, 2006 at 3:19 pm
Looks like you have some import files stuck in your DB. You want to extract them and use an import process: bcp out, bcp in. Or perhaps ask the DTS fiddlers forum for newly wrapped ways of doing it.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
May 31, 2006 at 5:53 pm
Just out of interest, is the problem with the 30K text values just a performance issue, are your embedded 'rows' > 8K, or is a variable overflowing, or none of the above?
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
May 31, 2006 at 7:25 pm
Data in text column is not actually data. You need to make data from it before you can use it.
Use this procedure to recover your data from the text field and place into appropriate tables.
IF NOT EXISTS (SELECT name FROM sysobjects WHERE name = N'ListDelimitedLong' AND type = 'P')
EXEC (' CREATE PROCEDURE dbo.ListDelimitedLong
@List text,
@Delim char(1)
AS
SELECT @Delim')
GO
ALTER PROC dbo.ListDelimitedLong
@List text, -- A delimiter-separated list of values, this parameter may be up to 2GB in length
@Delim char(1)
AS
SET NOCOUNT ON
DECLARE @ptrval binary(16)
DECLARE @One int
SET @One = 1
DECLARE @LeftPointer int, @RightPointer int, @SubStr varchar(50)
SET @LeftPointer = 0
DECLARE @val TABLE (IntVal bigint )
CREATE TABLE #TXTList (List text)
INSERT INTO #TXTList
SELECT @List
SELECT @ptrval = TEXTPTR(List) FROM #TXTList
WHILE @RightPointer < (SELECT DATALENGTH(List) FROM #TXTList) OR @RightPointer IS NULL
BEGIN
SELECT @RightPointer = CHARINDEX(@Delim, SUBSTRING(List, @LeftPointer+@One, DATALENGTH(List)) )
+ @LeftPointer - @One FROM #TXTList
IF @RightPointer <= @LeftPointer
SELECT @RightPointer = DATALENGTH(List) FROM #TXTList
SELECT @SubStr = SUBSTRING(List, @LeftPointer + @One, @RightPointer-@LeftPointer)
FROM #TXTList
SELECT @LeftPointer = @RightPointer + DATALENGTH(@Delim)
END
SELECT DISTINCT IntVal
FROM @val
GO
_____________
Code for TallyGenerator
May 31, 2006 at 7:43 pm
But you can't do it setwise, of course.
MS have deprecated use of READTEXT with text pointer saying it will be phased out, and recommended using SUBSTRING instead. Does anyone know any more about this?
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 8:16 am
You can use Substring to access blocks of text from a text column (or ntext). The blocks are limited by the maximum varchar sizes, which are varchar=8000 and nvarchar=4000.
The simplest way would be to loop until the amount of text retrieve with the substring() function is less than the block size (i.e. the length of the text requested in the substring function). I use Datalength to do this because Len() doesn't count trailing spaces.
The problem with using Substring() in a loop is that the data may be variable length, and thus a "record" could span two blocks of text. You could use CharIndex to locate the pipes symbols and incorporate that into the loop.
Anyway, here's a simple loop that prints out the text in query analyzer (I have set the maximum number of characters per columns in Tools|Options |Results to 8000):
DECLARE @buffer varchar(4000)
, @blocksize int
, @blockPtr int
SET @blocksize = 4000
SET @buffer = ''
SET @blockPtr = 1
WHILE 1=1
BEGIN
SELECT @buffer = Substring(TextColumn, @blockPtr, @blockSize)
FROM MyTable
WHERE pkey= <A_Key_Value>
PRINT @buffer
IF DataLength(@buffer) < @blocksize
BEGIN
BREAK
END
SET @blockPtr = @blockPtr + @blocksize
END --WHILE
June 1, 2006 at 10:54 am
This one will fetch a single field value at a time and sort out the row numbering in the process. So as long as none of your data values (+ trailing separator!) exceed 8000 chars, you'll be OK on that score.
--These two lines:
,@patindexF=patindex('%'+@separatorF+'%',substring(vtext,@lastpos,8000))
,@patindexR=patindex('%'+@separatorR+'%',substring(vtext,@lastpos,8000))
--require altering to allow a full 8000 characters in your data values.
More than that in a single field, and the SQL approach is reaching its limits.
You don't get a recordset per input row; the data is unpivoted into sourcerecordID, recordID, fieldID, value. You would need to create appropriate tables and pivot the data into those.
@separatorF = ' ', @separatorR = '.', @j-2 = 0, @i = 1, @seplenF = datalength(@separatorF), @seplenR = datalength(@separatorR)
@vars(Vcurpos, Vlen, Vkey, Vtext, vFieldCtr, vRowCtr, vNewRow)
@i > 0
* from @retval order by sourcerowid, recordid, field_order
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply