Parsing a summary / detail flat file

Question

Post reply

Parsing a summary / detail flat file

Brandie Tarvin

SSC Guru

Points: 173095
More actions
May 13, 2013 at 7:29 am

#277381

My team has a flat file from another system that we need to parse and import into 2 tables. We're currently importing this into a staging table then parsing the staging table RBAR style in WHILE loop. As you can imagine, this is causing us an extraordinary amount of pain. It's taking several days to process one file and the bigger they get...
Each line on the file has a record id. There's 1 (the header), 2 (the vendor info), and 3 (the details). A vendor can have 1-N number of 3 records listed after it but (and here's the kicker) none of the detail records have any identifying information that connect it to the vendor info in record 2. The only way we know they are connect is by the order.
Example Data:
1MyFile05122013
2VendorID123 StartDate Terminated AgreementNum AnotherCol AnotherCol2
3ContractNum SaleDate Product Region Col1 Col2 Col3
2VendorID456 StartDate Terminated AgreementNum AnotherCol AnotherCol2
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
2VendorID789 StartDate Terminated AgreementNum AnotherCol AnotherCol2
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
So all the 3 records contain all the details of the vendor that are above it. But again, there's no identifying information between the 3 records and the 2 records. To add to this burden, the report is a rolling 3 month report of all records, so we can't trucate the tables and start over from scratch because it's not inception to date. Also, we have to delete out the current month (though I'm working to get that changed).
Our Staging table looks like this:
CREATE TABLE [dbo].[Staging](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Extract_Record] [varchar](2000) NULL,
[UNID] [int] NULL
) ON [PRIMARY]
GO
Because we can't immediately tell what record is what and we don't want to mess up the order, we insert all the lines into the Extract_Record column and parse everything out later with substrings. We use UNID to create a unifying ID for all these records and insert them into their tables.
Forgive me for scrubbing the heck out of all my column names, but I'm erroring on the side of protecting our business. Here's what our code looks like:
BEGIN
DECLARE @i INT,
@max-2 INT,
@rec SMALLINT,
@newid INT,
@TransactionID INT,
@unid INT;
--Below code sets unid to identify complete record sets
SELECT @i=1,@max=MAX(id),@newid=0
FROM dbo.Staging;
WHILE @i<=@max
BEGIN
SELECT @rec=CONVERT(SMALLINT,LEFT(extract_RI,1))
FROM dbo.Staging
WHERE id=@i;
IF @rec>1 AND @rec<4
BEGIN
IF @rec=2
BEGIN
SET @newid=@newid+1;
UPDATE dbo.Staging
SET unid=@newid
WHERE id=@i;
END
ELSE
BEGIN
UPDATE dbo.Staging
SET unid=@newid
WHERE id=@i;
END
END
SET @i=@i+1;
END
SELECT @i=1,@unid=NULL, @rec=NULL;
WHILE @i<=@max
BEGIN
SELECT @unid=unid
FROM dbo.Staging
WHERE unid IS NOT NULL and id=@i;
IF @unid IS NOT NULL
BEGIN
SELECT @rec=CONVERT(SMALLINT,LEFT(extract_RI,1))
FROM dbo.Staging
WHERE id=@i;
IF @rec=2
BEGIN
--SELECT @ri=Substring(extract_RI, 2,10)
--FROM dbo.Staging
--WHERE id=@i;
--Below code deletes current month data. Reporting team only requires previous months data
WITH CurrentMonth AS
(select * from dbo.Staging
WHERE LTRIM(RTRIM(Substring(Extract_Record, 97,6))) = Substring(CONVERT(varchar,GETDATE(),112),1,6))
DELETE FROM dbo.Staging FROM dbo.Staging ser
INNER JOIN CurrentMonth cm
ON ser.UNID = cm.UNID
SELECT @TransactionID=@@IDENTITY;
--Below code loads differential data from staging table to final reporting table
INSERT INTO Summary (Col1,
Col2,
Col3,
Col4,
Col5,
Col6,
Col7,
Col8,
Col9
)
SELECT
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9
FROM dbo.Staging ri
LEFT OUTER JOIN Summary rit
ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))
WHERE id=@i
AND rit.Col1 IS NULL
AND rit.Col2 IS NULL
AND rit.Col3 IS NULL
AND rit.Col4 IS NULL
AND rit.Col5 IS NULL
AND rit.Col6 IS NULL
AND rit.Col7 IS NULL
AND rit.Col8 IS NULL
AND rit.Col9 IS NULL;
SELECT @TransactionID=@@IDENTITY;
END
ELSE IF @rec=3
BEGIN
--Below code loads differential data from staging table to final reporting table
INSERT INTO Detail (TransactionID,
Col1,
Col2,
Col3,
Col4,
Col5,
Col6,
Col7,
Col8,
Col9,
Col10,
Col11,
Col12,
Col13,
Col14,
Col15,
Col16)
SELECT @TransactionID,
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,
LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,
LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,
LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,
LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,
LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,
LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,
LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16
FROM dbo.Staging ri2
LEFT OUTER JOIN sap.tblRITransactionDetail rit2
ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))
WHERE id=@i
AND rit2.Col1 IS NULL
AND rit2.Col2 IS NULL
AND rit2.Col3 IS NULL
AND rit2.Col4 IS NULL
AND rit2.Col5 IS NULL
AND rit2.Col6 IS NULL
AND rit2.Col7 IS NULL
AND rit2.Col8 IS NULL
AND rit2.Col9 IS NULL
AND rit2.Col10 IS NULL
AND rit2.Col11 IS NULL
AND rit2.Col12 IS NULL
AND rit2.Col13 IS NULL
AND rit2.Col14 IS NULL
AND rit2.Col15 IS NULL
AND rit2.Col16 IS NULL;
END
END
SET @i=@i+1;
END
END
So, messy, slow and very very annoying. I've got a few thoughts on how we can start to fix it, but before I share I wanted to see what everyone else was thinking. I'm hoping someone has an ephiphany that can help.
So, any ideas?
Just an FYI: there's no fixing the input file in our immediate future. Just getting this much information was like pulling teeth and the other team takes months upon months (if even that soon) to work changes through their SDLC. So I have to do what I can to mitigate the load issue now with the file that I have.
Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 1

Can you use SSIS?

I've done something similar recently with SSIS, so if you can use it I think you have a good option.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Also, could you mock-up some sample data and provide a destination for the data? Could help figure out how to approach the problem.

Brandie Tarvin SSC Guru Points: 173095 More actions · Answer 3

Stefan Krzywicki (5/13/2013)
Can you use SSIS?
I've done something similar recently with SSIS, so if you can use it I think you have a good option.

Yes, we can use SSIS. In fact we're using SSIS to load the file into our staging table and run the proc that does the update.

I've got a few meetings, but after those I'll dummy up some data and sanitze the destination tables and post them in a bit.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 4

Brandie Tarvin (5/13/2013)
Stefan Krzywicki (5/13/2013)
Can you use SSIS?
I've done something similar recently with SSIS, so if you can use it I think you have a good option.
Yes, we can use SSIS. In fact we're using SSIS to load the file into our staging table and run the proc that does the update.
I've got a few meetings, but after those I'll dummy up some data and sanitze the destination tables and post them in a bit.

I have two files I need to process every week that have a similar format. Descriptive lines that I don't need to import and multiple header rows that are interspersed with corresponding data rows. I use a conditional split to check the line type and divert it to the proper parsing and insert, which I think you do with the IF @rec=2 line.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Brandie Tarvin SSC Guru Points: 173095 More actions · Answer 5

Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:

/****** Object: Table [dbo].[Detail] Script Date: 05/13/2013 10:56:50 ******/

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Detail]') AND type in (N'U'))

DROP TABLE [dbo].[Detail]

GO

/****** Object: Table [dbo].[Summary] Script Date: 05/13/2013 10:56:50 ******/

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Summary]') AND type in (N'U'))

DROP TABLE [dbo].[Summary]

GO

/****** Object: Table [dbo].[Detail] Script Date: 05/13/2013 10:56:51 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[Detail](

[TransactionDetailID] [int] IDENTITY(1,1) NOT NULL,

[TransactionID] [int] NULL,

[Col1] [varchar](4) NOT NULL,

[Col2] [varchar](4) NOT NULL,

[Col3] [varchar](20) NULL,

[Col4] [varchar](8) NULL,

[Col5] [varchar](24) NULL,

[Col6] [varchar](8) NULL,

[Col7] [varchar](8) NULL,

[Col8] [varchar](24) NOT NULL,

[Col9] [varchar](10) NOT NULL,

[Col10] [varchar](4) NOT NULL,

[Col11] [varchar](10) NOT NULL,

[Col12] [varchar](4) NOT NULL,

[Col13] [varchar](24) NOT NULL,

[Col14] [varchar](3) NOT NULL,

[Col15] [varchar](17) NOT NULL,

[Col16] [varchar](5) NOT NULL,

CONSTRAINT [PK_Detail_TransactionDetailID] PRIMARY KEY NONCLUSTERED

(

[TransactionDetailID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

/****** Object: Table [dbo].[Summary] Script Date: 05/13/2013 10:56:51 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[Summary](

[TransactionID] [int] IDENTITY(1,1) NOT NULL,

[Col1] [varchar](4) NOT NULL,

[Col2] [varchar](10) NOT NULL,

[Col3] [varchar](3) NOT NULL,

[Col4] [varchar](20) NOT NULL,

[Col5] [varchar](8) NOT NULL,

[Col6] [varchar](10) NOT NULL,

[Col7] [varchar](10) NOT NULL,

[Col8] [varchar](30) NULL,

[Col9] [varchar](8) NULL,

CONSTRAINT [PK_Summary_TransactionID] PRIMARY KEY NONCLUSTERED

(

[TransactionID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

WayneS SSC Guru Points: 95461 More actions · Answer 6

Brandie - just to verify... you are using SQL 2008? (I know you posted it in this forum, but we both know that folks sometimes make mistakes...)

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Brandie Tarvin SSC Guru Points: 173095 More actions · Answer 7

WayneS (5/13/2013)
Brandie - just to verify... you are using SQL 2008? (I know you posted it in this forum, but we both know that folks sometimes make mistakes...)

Yes, I am using 2008. Plain vanilla Enterprise Edition 2008.

NOT R2, BTW.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Chad Crawford SSChampion Points: 11610 More actions · Answer 8

Yo Jeff, sounds like a candidate for the "Quirky Update"?

Robyn Page's Article[/url]

Jeff Moden's Article[/url]

I added a second UNID column and ran this:

DECLARE @RollingCounterLevel1 int

, @RollingCounterLevel2 int

UPDATE Staging

SET @RollingCounterLevel1 = UNID = CASE LEFT(Extract_Record, 1) WHEN '1' THEN ID ELSE @RollingCounterLevel1 END

, @RollingCounterLevel2 = UNID2 = CASE LEFT(Extract_Record, 1) WHEN '1' THEN NULL WHEN 2 THEN ID ELSE @RollingCounterLevel2 END

It gave me the correct 1st and 2nd level IDs, you might need to tweak it depending if you want the 2nd level value to be itself or the 1st level value.

As I remember there are several prerequisites to make sure the Quirky Update works (hints, parallelism, indexes, etc.) and MS doesn't "guarantee" it will always work, but Jeff couldn't find a situation where it didn't work as he expected.

Chad

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 9

Brandie,

I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line, so I made up my own. This solution assumes that you can get the rows into a table in the correct order, parse the rows into columns, and once you can assign an identifying number to the rows that belong together, you can do what you need to do with them. This takes three full scans of the table to work, but with some divide-and-conquer (temp tables instead of CTEs, etc.) supported by proper indexing, you may be able to get it to perform acceptably.

In the output of this code, rows with the same listNbr belong together.

-- Please remove the extraneous letter "x" from the CREATE keyword. My company filters internet traffic with certain T-SQL keywords in it.

CxREATE TABLE dbo.fileParseTest (rowNbr INT IDENTITY(1,1), rowType int, data varchar(20))

INSERT INTO dbo.fileParseTest(rowType, data)

VALUES (1, 'HEADER')

,(2, 'VENDOR123')

,(3, 'CONTRACTX')

,(3, 'CONTRACTY')

,(3, 'CONTRACT1')

,(3, 'CONTRACT2')

,(3, 'CONTRACT3')

,(3, 'CONTRACT4')

,(3, 'CONTRACT5')

,(3, 'CONTRACT6')

,(3, 'CONTRACT7')

,(3, 'CONTRACT8')

,(3, 'CONTRACT9')

,(2, 'VENDOR456')

,(3, 'CONTRACTZ')

,(2, 'VENDOR789')

,(3, 'CONTRACTA')

,(3, 'CONTRACTB')

,(2, 'VENDORXYZ')

,(3, 'CONTRACT1')

;

WITH cteA AS (

SELECT ROW_NUMBER() OVER (ORDER BY rowNbr) as listNbr

,rowNbr

FROM dbo.fileParseTest

WHERE rowType = 2

)

,cteB AS (

SELECT c1.listNbr, c1.rowNbr AS rowNbrStart, ISNULL(c2.rowNbr - 1, 100000) AS rowNbrEnd

FROM cteA c1

LEFT OUTER JOIN cteA c2

ON c1.listNbr + 1 = c2.listNbr

)

SELECT b.listNbr, fpt.rowNbr, fpt.rowType, fpt.data

FROM cteB b

LEFT OUTER JOIN dbo.fileParseTest fpt

ON fpt.rowNbr BETWEEN b.rowNbrStart and b.rowNbrEnd

listNbrrowTypedata

12VENDOR123

13CONTRACTX

13CONTRACTY

13CONTRACT1

13CONTRACT2

13CONTRACT3

13CONTRACT4

13CONTRACT5

13CONTRACT6

13CONTRACT7

13CONTRACT8

13CONTRACT9

22VENDOR456

23CONTRACTZ

32VENDOR789

33CONTRACTA

33CONTRACTB

42VENDORXYZ

43CONTRACT1

Jason Wolfkill

WayneS SSC Guru Points: 95461 More actions · Answer 10

Chad Crawford (5/13/2013)
Yo Jeff, sounds like a candidate for the "Quirky Update"?

I don't think that the QU is necessary here, though it would work.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 11

Brandie Tarvin (5/13/2013)
So, any ideas?

Brandie, Try this out:

-- Let's add a column to the staging table to make deletes easier. You could even index this if you want to.

-- However, this isn't necessary.

IF NOT EXISTS (SELECT * FROM sys.columns WHERE object_id = OBJECT_ID('dbo.Staging','U') AND name = 'DateField')

ALTER TABLE dbo.Staging ADD DateField DATE;

-- this is necessary only to prevent any other updates to the Summary table until all of the inserts have taken place.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- get the vendor records assigned first, and assign sequential numbers for the UNID

WITH cte AS

(

SELECT ID, UNID, Extract_Record, DateField, ROW_NUMBER() OVER (ORDER BY ID) AS RN

FROM dbo.Staging

WHERE Extract_Record LIKE '2%' -- get the vendor data

)

UPDATE cte

SET UNID = RN,

DateField=LTRIM(RTRIM(Substring(Extract_Record, 97,6)))+'01';

-- now assign the details to the same UNID for the vendor record < this current record

-- if you don't want to add the DateField column to the Staging table, you can remove

-- the 2nd CROSS APPLY and the update to that column.

UPDATE t1

SET UNID = t2.UNID,

DateField = t3.DateField

FROM dbo.Staging t1

CROSS APPLY (SELECT MAX(UNID) FROM dbo.Staging WHERE ID < t1.ID) t2(UNID)

CROSS APPLY (SELECT DateField FROM dbo.Staging WHERE UNID = t2.UNID) t3

WHERE t1.Extract_Record LIKE '3%';

-- delete the current month:

-- Again, if you don't want to use the DateField column, just use your existing delete statement

-- DELETE FROM dbo.Staging where DateField = CONVERT(DATETIME, CONVERT(char(6),GETDATE(),112) + '01')

BEGIN TRANSACTION

DECLARE @max-2 INT;

SELECT @max-2=ISNULL(MAX(TransactionID),0) FROM dbo.Summary;

-- the repeatable read prevents any other transaction from inserting data that would change the max setting

-- I need to control the value of the identity column, so that I can associate the identity value with the UNID value for the summary record.

-- The association is the @max-2 from above with the UNID already calculated for this record.

SET IDENTITY_INSERT dbo.Summary ON;

INSERT INTO dbo.Summary( TransactionID, Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9)

SELECT @max-2+UNID,

LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,

LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,

LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,

LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,

LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,

LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,

LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,

LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,

LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9

FROM dbo.Staging ri

LEFT OUTER JOIN Summary rit

ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))

AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))

WHERE rit.TransactionID IS NULL

-- TransactionID is a PK -->> NOT NULL. A null here means that no matching record was found, so it's okay to insert this record.

-- No need to compare every column.

AND ri.Extract_Record LIKE '2%'; -- just the "2" records

-- don't forget to turn IDENTITY INSERT off!

SET IDENTITY_INSERT dbo.Summary OFF;

INSERT INTO Detail (TransactionID,

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,

Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16)

SELECT @max-2+UNID, -- since we already have the UNID matching in the table, and we inserted UNID+@Max into the summary, use this same calc for the detail.

LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,

LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,

LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,

LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,

LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,

LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,

LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,

LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,

LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,

LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,

LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,

LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,

LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,

LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,

LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,

LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16

FROM dbo.Staging ri2

LEFT OUTER JOIN dbo.Detail rit2

ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))

AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))

WHERE rit2.TransactionDetailID IS NULL

-- TransactionDetailID is PK -->> NOT NULL. Null here means no matching record was found, so it's okay to insert.

-- No need to compare every column

AND ri2.Extract_Record LIKE '3%' -- just the "3" rows

COMMIT TRANSACTION

SELECT * FROM dbo.Staging ORDER BY ID;

SELECT * FROM dbo.Summary;

SELECT * FROM dbo.Detail;

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Brandie Tarvin SSC Guru Points: 173095 More actions · Answer 12

wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line

Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Brandie Tarvin SSC Guru Points: 173095 More actions · Answer 13

Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 14

Brandie Tarvin (5/13/2013)
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line
Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.
I need a solution that addresses my current setup, including the inconsistent data.

Handling the inconsistent data is my favorite part of doing it in SSIS, but the other approaches put forward here look good too.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Parsing a summary / detail flat file

Cookies on SQLServerCentral