T-SQL text manipulation and recordset iteration

Question

T-SQL text manipulation and recordset iteration

Viewing 15 posts - 31 through 45 (of 94 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

rjbirkett (6/29/2012)
Bill, I'm sorry, I was not ignoring you, I was trying to figure out if I could use that function. There are 45 separate fields once the string is split, I would need to edit that to get it to work I think. I appreciate ALL the help from all who take the trouble to reply. I am a real newbie and trying something I probably should have spent more time learning.
Lynn,
MsgDate is in the original syslog table along with the CDR detail record. in C# I'm selecting all the records by MsgDate and processing them all at once. Sorry for the confusion. So, I'm goint to select * from dbo.syslogd where MsgDate = today and then run your script to process the data into a newly created sandbox and then process THAT data and insert it into dbo.CDR so wish me luck:-) I'm going to need it. What else am I going to do this weekend? It's July 4th celebrations, I'm in Chicago, and i'm ENGLISH! One of the rare time I keep my mouth shut:-)
Rob

Could you post the DDL (CREATE TABLE statement) for this table? Right click on the table in Object Explorer, click on Script Table, then CREATE to, then clipboard. Then all you have to do is paste it in a post on this thread.

rjbirkett SSC-Addicted Points: 488 More actions · Answer 2

Lynn,

How am I going to process each record without looping through? For/Next? Next n? you have to iterate somehow, right?

Rob

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Bill Talada (6/29/2012)
I can save you a ton of trouble. Pass your string to this function and it will return a table of all the tokens in order. Then you can select directly from the tokens table into a typed table.
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'dbo.StringParse') AND xtype IN (N'FN', N'IF', N'TF'))
DROP FUNCTION dbo.StringParse
GO
CREATE FUNCTION dbo.StringParse(
@STR varchar(max),
@delimiter varchar(20),
@emptytonull bit)
returns @tokens TABLE (rownbr int PRIMARY KEY NOT NULL, token varchar(max) NULL)
AS
-- written BY Bill Talada
BEGIN
/*
split a string INTO tokens AND RETURN them IN a TABLE.
empty string returns one NULL row
a string WITH only a delimiter returns two NULL rows, one before it AND one after it.
a delimiter at the beginning means a NULL value appears before it.
two delimiters IN a row implies a NULL value BETWEEN them.
a delimiter at the END OF string implies a NULL value after it.
*/
DECLARE
@delimiterlength int,
@currptr int,
@nextptr int,
@token varchar(max),
@rownbr int
SET @delimiterlength = datalength(@delimiter)
SET @rownbr = 0
SET @currptr = 1 - @delimiterlength
SET @nextptr = charindex(@delimiter, @STR, @currptr+@delimiterlength)
--SELECT @currptr, @nextptr, @delimiterlength
WHILE @nextptr > 0
BEGIN
SET @rownbr = @rownbr + 1
SET @token = substring(@str, @currptr+@delimiterlength, @nextptr - (@currptr+@delimiterlength))
IF len(@token) = 0
BEGIN
IF @emptytonull=1
SET @token = NULL
ELSE
SET @token = ''
END
INSERT INTO @tokens VALUES (@rownbr, @token)
SET @currptr=@nextptr
SET @nextptr=charindex(@delimiter, @STR, @currptr+@delimiterlength)
END
-- last row
SET @rownbr = @rownbr + 1
SET @token = substring(@str, @currptr + @delimiterlength, datalength(@str) - @currptr + @delimiterlength)
IF len(@token) = 0
BEGIN
IF @emptytonull=1
SET @token = NULL
ELSE
SET @token = ''
END
INSERT INTO @tokens VALUES (@rownbr, @token)
RETURN
END
go

There are a few small issues with this particular split function. One, it uses a while loop. Two, it is a mutli-statement table valued function.

I would be more than willing to put Jeff's DelimitedSplit8K function up against it in a performance test. I will do that tonight when I can do it on my PC at home.

rjbirkett SSC-Addicted Points: 488 More actions · Answer 4

I assume you are looking for this. The table is being updated by the syslog service so the names ar based on what syslog required. Not pretty but it does work.

Rob

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

rjbirkett (6/29/2012)
Lynn,
How am I going to process each record without looping through? For/Next? Next n? you have to iterate somehow, right?
Rob

Take a look at the latest code I posted. I added a second record and changed nothing. What I was doing for one record is what I want to do to all the records. Set-based processing, not record-based processing (or Row By Agonizing Row (RBAR)).

SQL Server is meant to process sets of data.

rjbirkett SSC-Addicted Points: 488 More actions · Answer 6

Lynn,

Coffee not working. Forgot to paste the SQL in:-/

USE [SYSLOG]

GO

/****** Object: Table [dbo].[Syslogd] Script Date: 06/29/2012 14:23:29 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[Syslogd](

[MsgDate] [date] NULL,

[MsgTime] [time](7) NULL,

[MsgPriority] [text] NULL,

[MsgHostname] [text] NULL,

[MsgText] [text] NULL

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

rjbirkett (6/29/2012)
I assume you are looking for this. The table is being updated by the syslog service so the names ar based on what syslog required. Not pretty but it does work.
Rob

Not seeing what you think I should see.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

rjbirkett (6/29/2012)
Lynn,
Coffee not working. Forgot to paste the SQL in:-/
USE [SYSLOG]
GO
/****** Object: Table [dbo].[Syslogd] Script Date: 06/29/2012 14:23:29 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Syslogd](
[MsgDate] [date] NULL,
[MsgTime] [time](7) NULL,
[MsgPriority] [text] NULL,
[MsgHostname] [text] NULL,
[MsgText] [text] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

Could you run this on your table:

select max(datalength(MsgText)) from dbo.Syslogd;

rjbirkett SSC-Addicted Points: 488 More actions · Answer 9

rjbirkett

SSC-Addicted

Points: 488

June 29, 2012 at 1:37 pm

#1507295

Lynn,

It came back with 1024 as the max length.

Rob

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

rjbirkett (6/29/2012)
Lynn,
It came back with 1024 as the max length.
Rob

Good, then the DelimitedSplit8K will work.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 11

rjbirkett (6/29/2012)
Lynn,
How am I going to process each record without looping through? For/Next? Next n? you have to iterate somehow, right?
Rob

In case Lynn's busy, the nature of the T-SQL script he gave you will process all the rows at once in an internally optimized loop instead of one you build yourself.

What you're fighting with is the iterative concepts from front end coding (working with a single instance at a time) instead of working with the entire collection in a single pass. Don't worry, it's a huge hurdle. There's an 'internal loop' in SQL that is optimized to its engine that will allow you to work with entire recordsets at once. That's why Lynn keeps trying to get you to provide a set of 5-10 rows from your original table so we can show you how to work with them in bulk instead of one at a time. Don't worry, you're not the first to feel like you're having a 'bang head here' moment on the transition. 😉

SQL's optimized to work against the collection, though you can force it to work with the internal instances. It's not QUITE the correct parlance, but it'll get you closer to the idea.

So you understand the earlier RBAR comment, it's a phrase we toss around here that stands for "Row-By-Agonizing-Row". Basically, when you're looping a single record at a time when you don't need to.

So in the end, no, you don't iterate yourself. Generally when you're working with log-feed data (my own name, but it gets the point) there's a few general steps you do.

1) Get the data somewhere you can use it. In this case, you're done, you've gotten it to a local table.

2) Transform the data that you haven't processed yet to a local staging table. This is the first query that Lynn's already shown you, that occurs on all the records and transforms them into columns and datatypes that you can function with inside the engine. This is usually done in a single pass (with caveats) from all the unprocessed data that you've stored so far. You'll mark the records that you 'pushed' to the staging as completed somehow as well. I usually push them to an archive table so they're not even part of the next run's concern.

3) Load the staging table data into all the necessary 'real' tables that are normalized through your system, such as adding new addresses and ids, new customers, etc, as well as loading the normalized log entry into the real logging table.

4) Cleanup the staging once complete so the next run has a 'blank slate' to work from.

What you're doing here falls under what we call ETL (Extract, Transform, Load). The rabbit hole can go deep but we can help you with your specific case, but we're going to end up needing a lot more information from you.

Like Lynn's signature, my signature contains a similar link. If you can provide us with sample schema/data, the schema of the target databases, and help us out as Lynn and others work with you we can get you end to end on this with pretty tight performance.

@bill:

RBAR has its place, but it's when you have cross-row dependencies that previously occurring logic affects further logic. Running Totals is an example of where that can come into play, another is the old mainframe file organization where associated sub-rows are loaded below a primary row and you need to trap the ID for the sub-rows off the leader row in the data. This isn't one of those cases.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

rjbirkett SSC-Addicted Points: 488 More actions · Answer 12

Lynn, you have me at a disadvantage. RBAR I understand, a set of data I sort of understand, but I'm old school SQL. I'm assuming that by "set of data" you mean put all the data in a table after the function call, then update all the fields in one shot? Am I getting this straight or still fumbling around in the dark looking for the bra clasp:-)

Rob

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

Take a close look at the CTE part of this code. You will see that I have written it to select all records for yesterday from dbo.Syslogd.

with SampleData as (

select

MsgText as SampleRec

from

dbo.Syslogd

where

MsgDate = dateadd(dd, -1, cast(getdate() as date))

)

select

--sd.SampleRec,

--max(case ds.ItemNumber

--when 1 then ds.Item else null end),

max(case ds.ItemNumber

when 2 then ds.Item else null end) [call_status],

max(case ds.ItemNumber

when 3 then ds.Item else null end) [channel_num],

max(case ds.ItemNumber

when 4 then ds.Item else null end) [sipcall_ident],

max(case ds.ItemNumber

when 5 then ds.Item else null end) [trunk_num],

max(case ds.ItemNumber

when 6 then ds.Item else null end) [b_channel],

max(case ds.ItemNumber

when 7 then ds.Item else null end) [sipconf_id],

max(case ds.ItemNumber

when 8 then ds.Item else null end) [trunk_group_num],

max(case ds.ItemNumber

when 9 then ds.Item else null end) [endpoint_type],

max(case ds.ItemNumber

when 10 then ds.Item else null end) [call_originator],

max(case ds.ItemNumber

when 11 then ds.Item else null end) [source_IP],

max(case ds.ItemNumber

when 12 then ds.Item else null end) [destination_IP],

max(case ds.ItemNumber

when 13 then ds.Item else null end) [sourcephone_numtype],

max(case ds.ItemNumber

when 14 then ds.Item else null end) [sourcephone_numplan],

max(case ds.ItemNumber

when 15 then ds.Item else null end) [sourcephone_number],

max(case ds.ItemNumber

when 16 then ds.Item else null end) [sourcephone_nummap],

max(case ds.ItemNumber

when 17 then ds.Item else null end) [destphone_numtype],

max(case ds.ItemNumber

when 18 then ds.Item else null end) [destphone_numplan],

max(case ds.ItemNumber

when 19 then ds.Item else null end) [destphone_number],

max(case ds.ItemNumber

when 20 then ds.Item else null end) [destphone_nummap],

max(case ds.ItemNumber

when 21 then ds.Item else null end) [call_duration],

max(case ds.ItemNumber

when 22 then ds.Item else null end) [call_codec],

max(case ds.ItemNumber

when 23 then ds.Item else null end) [packet_intervall],

max(case ds.ItemNumber

when 24 then ds.Item else null end) [rtp_IP],

max(case ds.ItemNumber

when 25 then ds.Item else null end) [rtp_port],

max(case ds.ItemNumber

when 26 then ds.Item else null end) [call_release],

max(case ds.ItemNumber

when 27 then ds.Item else null end) [termination_reason],

max(case ds.ItemNumber

when 28 then ds.Item else null end) [fax_data],

max(case ds.ItemNumber

when 29 then ds.Item else null end) [packets_in],

max(case ds.ItemNumber

when 30 then ds.Item else null end) [packets_out],

max(case ds.ItemNumber

when 31 then ds.Item else null end) [packets_lost],

max(case ds.ItemNumber

when 32 then ds.Item else null end) [header_id],

max(case ds.ItemNumber

when 33 then ds.Item else null end) [rtp_id],

max(case ds.ItemNumber

when 34 then convert(datetime,replace(substring(right(ds.Item, 12),5,3) + left(right(ds.Item, 12),4) + right(right(ds.Item, 12),5) + ' ' + left(ds.Item, 13),'.',':'),113) else null end) [callsetup_datetime],

max(case ds.ItemNumber

when 35 then convert(datetime,replace(substring(right(ds.Item, 12),5,3) + left(right(ds.Item, 12),4) + right(right(ds.Item, 12),5) + ' ' + left(ds.Item, 13),'.',':'),113) else null end) [callconnect_datetime],

max(case ds.ItemNumber

when 36 then convert(datetime,replace(substring(right(ds.Item, 12),5,3) + left(right(ds.Item, 12),4) + right(right(ds.Item, 12),5) + ' ' + left(ds.Item, 13),'.',':'),113) else null end) [callrelease_datetime],

max(case ds.ItemNumber

when 37 then ds.Item else null end) [rtp_delay],

max(case ds.ItemNumber

when 38 then ds.Item else null end) [rtp_jitter],

max(case ds.ItemNumber

when 39 then ds.Item else null end) [localrtp_source],

max(case ds.ItemNumber

when 40 then ds.Item else null end) [remotertp_source],

max(case ds.ItemNumber

when 41 then ds.Item else null end) [redirect_reason],

max(case ds.ItemNumber

when 42 then ds.Item else null end) [redirect_number],

max(case ds.ItemNumber

when 43 then ds.Item else null end) [redirect_numplan]--,

--max(case ds.ItemNumber

--when 44 then ds.Item else null end),

--max(case ds.ItemNumber

--when 45 then ds.Item else null end)

from

SampleData sd

cross apply dbo.DelimitedSplit8K(SampleRec,'|') ds

group by

sd.SampleRec;

rjbirkett SSC-Addicted Points: 488 More actions · Answer 14

Yes, it does work. I think I get the picture. Use the script to put the data into dbo.CDR directly, dump all my useless code, and have access pull reports from that database. The data "looks" OK, I'm just wondering if dbo.CDR will accept it as is or if I should create a new table and see what field typoes SQL arbitrarily designates. You should see what it does if you try to move an Access table into SQL. All kinds of crazy s*&^ happens:-)

Anyway. I think you have done enough already. I'm sure you have better things to do than waste time talking to an idiot. I have really enjoyed this. Odd I know, but learning is always fun for me. Have a great weekend. I will let you know how I get on.

Robert.

rjbirkett SSC-Addicted Points: 488 More actions · Answer 15

Yes, it does work. I think I get the picture. Use the script to put the data into dbo.CDR directly, dump all my useless code, and have access pull reports from that database. The data "looks" OK, I'm just wondering if dbo.CDR will accept it as is or if I should create a new table and see what field typoes SQL arbitrarily designates. You should see what it does if you try to move an Access table into SQL. All kinds of crazy s*&^ happens:-)

Anyway. I think you have done enough already. I'm sure you have better things to do than waste time talking to an idiot. I have really enjoyed this. Odd I know, but learning is always fun for me. Have a great weekend. I will let you know how I get on.

Robert.