May 13, 2013 at 7:29 am
My team has a flat file from another system that we need to parse and import into 2 tables. We're currently importing this into a staging table then parsing the staging table RBAR style in WHILE loop. As you can imagine, this is causing us an extraordinary amount of pain. It's taking several days to process one file and the bigger they get...
Each line on the file has a record id. There's 1 (the header), 2 (the vendor info), and 3 (the details). A vendor can have 1-N number of 3 records listed after it but (and here's the kicker) none of the detail records have any identifying information that connect it to the vendor info in record 2. The only way we know they are connect is by the order.
Example Data:
1MyFile05122013
2VendorID123 StartDate Terminated AgreementNum AnotherCol AnotherCol2
3ContractNum SaleDate Product Region Col1 Col2 Col3
2VendorID456 StartDate Terminated AgreementNum AnotherCol AnotherCol2
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
2VendorID789 StartDate Terminated AgreementNum AnotherCol AnotherCol2
3ContractNum SaleDate Product Region Col1 Col2 Col3
3ContractNum SaleDate Product Region Col1 Col2 Col3
So all the 3 records contain all the details of the vendor that are above it. But again, there's no identifying information between the 3 records and the 2 records. To add to this burden, the report is a rolling 3 month report of all records, so we can't trucate the tables and start over from scratch because it's not inception to date. Also, we have to delete out the current month (though I'm working to get that changed).
Our Staging table looks like this:
CREATE TABLE [dbo].[Staging](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Extract_Record] [varchar](2000) NULL,
[UNID] [int] NULL
) ON [PRIMARY]
GO
Because we can't immediately tell what record is what and we don't want to mess up the order, we insert all the lines into the Extract_Record column and parse everything out later with substrings. We use UNID to create a unifying ID for all these records and insert them into their tables.
Forgive me for scrubbing the heck out of all my column names, but I'm erroring on the side of protecting our business. Here's what our code looks like:
BEGIN
DECLARE @i INT,
@max-2 INT,
@rec SMALLINT,
@newid INT,
@TransactionID INT,
@unid INT;
--Below code sets unid to identify complete record sets
SELECT @i=1,@max=MAX(id),@newid=0
FROM dbo.Staging;
WHILE @i<=@max
BEGIN
SELECT @rec=CONVERT(SMALLINT,LEFT(extract_RI,1))
FROM dbo.Staging
WHERE id=@i;
IF @rec>1 AND @rec<4
BEGIN
IF @rec=2
BEGIN
SET @newid=@newid+1;
UPDATE dbo.Staging
SET unid=@newid
WHERE id=@i;
END
ELSE
BEGIN
UPDATE dbo.Staging
SET unid=@newid
WHERE id=@i;
END
END
SET @i=@i+1;
END
SELECT @i=1,@unid=NULL, @rec=NULL;
WHILE @i<=@max
BEGIN
SELECT @unid=unid
FROM dbo.Staging
WHERE unid IS NOT NULL and id=@i;
IF @unid IS NOT NULL
BEGIN
SELECT @rec=CONVERT(SMALLINT,LEFT(extract_RI,1))
FROM dbo.Staging
WHERE id=@i;
IF @rec=2
BEGIN
--SELECT @ri=Substring(extract_RI, 2,10)
--FROM dbo.Staging
--WHERE id=@i;
--Below code deletes current month data. Reporting team only requires previous months data
WITH CurrentMonth AS
(select * from dbo.Staging
WHERE LTRIM(RTRIM(Substring(Extract_Record, 97,6))) = Substring(CONVERT(varchar,GETDATE(),112),1,6))
DELETE FROM dbo.Staging FROM dbo.Staging ser
INNER JOIN CurrentMonth cm
ON ser.UNID = cm.UNID
SELECT @TransactionID=@@IDENTITY;
--Below code loads differential data from staging table to final reporting table
INSERT INTO Summary (Col1,
Col2,
Col3,
Col4,
Col5,
Col6,
Col7,
Col8,
Col9
)
SELECT
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9
FROM dbo.Staging ri
LEFT OUTER JOIN Summary rit
ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))
WHERE id=@i
AND rit.Col1 IS NULL
AND rit.Col2 IS NULL
AND rit.Col3 IS NULL
AND rit.Col4 IS NULL
AND rit.Col5 IS NULL
AND rit.Col6 IS NULL
AND rit.Col7 IS NULL
AND rit.Col8 IS NULL
AND rit.Col9 IS NULL;
SELECT @TransactionID=@@IDENTITY;
END
ELSE IF @rec=3
BEGIN
--Below code loads differential data from staging table to final reporting table
INSERT INTO Detail (TransactionID,
Col1,
Col2,
Col3,
Col4,
Col5,
Col6,
Col7,
Col8,
Col9,
Col10,
Col11,
Col12,
Col13,
Col14,
Col15,
Col16)
SELECT @TransactionID,
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,
LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,
LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,
LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,
LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,
LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,
LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,
LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16
FROM dbo.Staging ri2
LEFT OUTER JOIN sap.tblRITransactionDetail rit2
ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))
WHERE id=@i
AND rit2.Col1 IS NULL
AND rit2.Col2 IS NULL
AND rit2.Col3 IS NULL
AND rit2.Col4 IS NULL
AND rit2.Col5 IS NULL
AND rit2.Col6 IS NULL
AND rit2.Col7 IS NULL
AND rit2.Col8 IS NULL
AND rit2.Col9 IS NULL
AND rit2.Col10 IS NULL
AND rit2.Col11 IS NULL
AND rit2.Col12 IS NULL
AND rit2.Col13 IS NULL
AND rit2.Col14 IS NULL
AND rit2.Col15 IS NULL
AND rit2.Col16 IS NULL;
END
END
SET @i=@i+1;
END
END
So, messy, slow and very very annoying. I've got a few thoughts on how we can start to fix it, but before I share I wanted to see what everyone else was thinking. I'm hoping someone has an ephiphany that can help.
So, any ideas?
Just an FYI: there's no fixing the input file in our immediate future. Just getting this much information was like pulling teeth and the other team takes months upon months (if even that soon) to work changes through their SDLC. So I have to do what I can to mitigate the load issue now with the file that I have.
May 13, 2013 at 7:51 am
Can you use SSIS?
I've done something similar recently with SSIS, so if you can use it I think you have a good option.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 13, 2013 at 7:57 am
Also, could you mock-up some sample data and provide a destination for the data? Could help figure out how to approach the problem.
May 13, 2013 at 8:42 am
Stefan Krzywicki (5/13/2013)
Can you use SSIS?I've done something similar recently with SSIS, so if you can use it I think you have a good option.
Yes, we can use SSIS. In fact we're using SSIS to load the file into our staging table and run the proc that does the update.
I've got a few meetings, but after those I'll dummy up some data and sanitze the destination tables and post them in a bit.
May 13, 2013 at 9:13 am
Brandie Tarvin (5/13/2013)
Stefan Krzywicki (5/13/2013)
Can you use SSIS?I've done something similar recently with SSIS, so if you can use it I think you have a good option.
Yes, we can use SSIS. In fact we're using SSIS to load the file into our staging table and run the proc that does the update.
I've got a few meetings, but after those I'll dummy up some data and sanitze the destination tables and post them in a bit.
I have two files I need to process every week that have a similar format. Descriptive lines that I don't need to import and multiple header rows that are interspersed with corresponding data rows. I use a conditional split to check the line type and divert it to the proper parsing and insert, which I think you do with the IF @rec=2 line.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 13, 2013 at 10:08 am
Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:
/****** Object: Table [dbo].[Detail] Script Date: 05/13/2013 10:56:50 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Detail]') AND type in (N'U'))
DROP TABLE [dbo].[Detail]
GO
/****** Object: Table [dbo].[Summary] Script Date: 05/13/2013 10:56:50 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Summary]') AND type in (N'U'))
DROP TABLE [dbo].[Summary]
GO
/****** Object: Table [dbo].[Detail] Script Date: 05/13/2013 10:56:51 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Detail](
[TransactionDetailID] [int] IDENTITY(1,1) NOT NULL,
[TransactionID] [int] NULL,
[Col1] [varchar](4) NOT NULL,
[Col2] [varchar](4) NOT NULL,
[Col3] [varchar](20) NULL,
[Col4] [varchar](8) NULL,
[Col5] [varchar](24) NULL,
[Col6] [varchar](8) NULL,
[Col7] [varchar](8) NULL,
[Col8] [varchar](24) NOT NULL,
[Col9] [varchar](10) NOT NULL,
[Col10] [varchar](4) NOT NULL,
[Col11] [varchar](10) NOT NULL,
[Col12] [varchar](4) NOT NULL,
[Col13] [varchar](24) NOT NULL,
[Col14] [varchar](3) NOT NULL,
[Col15] [varchar](17) NOT NULL,
[Col16] [varchar](5) NOT NULL,
CONSTRAINT [PK_Detail_TransactionDetailID] PRIMARY KEY NONCLUSTERED
(
[TransactionDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Summary] Script Date: 05/13/2013 10:56:51 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Summary](
[TransactionID] [int] IDENTITY(1,1) NOT NULL,
[Col1] [varchar](4) NOT NULL,
[Col2] [varchar](10) NOT NULL,
[Col3] [varchar](3) NOT NULL,
[Col4] [varchar](20) NOT NULL,
[Col5] [varchar](8) NOT NULL,
[Col6] [varchar](10) NOT NULL,
[Col7] [varchar](10) NOT NULL,
[Col8] [varchar](30) NULL,
[Col9] [varchar](8) NULL,
CONSTRAINT [PK_Summary_TransactionID] PRIMARY KEY NONCLUSTERED
(
[TransactionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
May 13, 2013 at 10:20 am
Brandie - just to verify... you are using SQL 2008? (I know you posted it in this forum, but we both know that folks sometimes make mistakes...)
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 13, 2013 at 10:58 am
WayneS (5/13/2013)
Brandie - just to verify... you are using SQL 2008? (I know you posted it in this forum, but we both know that folks sometimes make mistakes...)
Yes, I am using 2008. Plain vanilla Enterprise Edition 2008.
NOT R2, BTW.
May 13, 2013 at 11:31 am
Yo Jeff, sounds like a candidate for the "Quirky Update"?
I added a second UNID column and ran this:
DECLARE @RollingCounterLevel1 int
, @RollingCounterLevel2 int
UPDATE Staging
SET @RollingCounterLevel1 = UNID = CASE LEFT(Extract_Record, 1) WHEN '1' THEN ID ELSE @RollingCounterLevel1 END
, @RollingCounterLevel2 = UNID2 = CASE LEFT(Extract_Record, 1) WHEN '1' THEN NULL WHEN 2 THEN ID ELSE @RollingCounterLevel2 END
It gave me the correct 1st and 2nd level IDs, you might need to tweak it depending if you want the 2nd level value to be itself or the 1st level value.
As I remember there are several prerequisites to make sure the Quirky Update works (hints, parallelism, indexes, etc.) and MS doesn't "guarantee" it will always work, but Jeff couldn't find a situation where it didn't work as he expected.
Chad
May 13, 2013 at 11:35 am
Brandie,
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line, so I made up my own. This solution assumes that you can get the rows into a table in the correct order, parse the rows into columns, and once you can assign an identifying number to the rows that belong together, you can do what you need to do with them. This takes three full scans of the table to work, but with some divide-and-conquer (temp tables instead of CTEs, etc.) supported by proper indexing, you may be able to get it to perform acceptably.
In the output of this code, rows with the same listNbr belong together.
-- Please remove the extraneous letter "x" from the CREATE keyword. My company filters internet traffic with certain T-SQL keywords in it.
CxREATE TABLE dbo.fileParseTest (rowNbr INT IDENTITY(1,1), rowType int, data varchar(20))
INSERT INTO dbo.fileParseTest(rowType, data)
VALUES (1, 'HEADER')
,(2, 'VENDOR123')
,(3, 'CONTRACTX')
,(3, 'CONTRACTY')
,(3, 'CONTRACT1')
,(3, 'CONTRACT2')
,(3, 'CONTRACT3')
,(3, 'CONTRACT4')
,(3, 'CONTRACT5')
,(3, 'CONTRACT6')
,(3, 'CONTRACT7')
,(3, 'CONTRACT8')
,(3, 'CONTRACT9')
,(2, 'VENDOR456')
,(3, 'CONTRACTZ')
,(2, 'VENDOR789')
,(3, 'CONTRACTA')
,(3, 'CONTRACTB')
,(2, 'VENDORXYZ')
,(3, 'CONTRACT1')
;
WITH cteA AS (
SELECT ROW_NUMBER() OVER (ORDER BY rowNbr) as listNbr
,rowNbr
FROM dbo.fileParseTest
WHERE rowType = 2
)
,cteB AS (
SELECT c1.listNbr, c1.rowNbr AS rowNbrStart, ISNULL(c2.rowNbr - 1, 100000) AS rowNbrEnd
FROM cteA c1
LEFT OUTER JOIN cteA c2
ON c1.listNbr + 1 = c2.listNbr
)
SELECT b.listNbr, fpt.rowNbr, fpt.rowType, fpt.data
FROM cteB b
LEFT OUTER JOIN dbo.fileParseTest fpt
ON fpt.rowNbr BETWEEN b.rowNbrStart and b.rowNbrEnd
listNbrrowTypedata
12VENDOR123
13CONTRACTX
13CONTRACTY
13CONTRACT1
13CONTRACT2
13CONTRACT3
13CONTRACT4
13CONTRACT5
13CONTRACT6
13CONTRACT7
13CONTRACT8
13CONTRACT9
22VENDOR456
23CONTRACTZ
32VENDOR789
33CONTRACTA
33CONTRACTB
42VENDORXYZ
43CONTRACT1
Jason Wolfkill
May 13, 2013 at 11:39 am
Chad Crawford (5/13/2013)
Yo Jeff, sounds like a candidate for the "Quirky Update"?
I don't think that the QU is necessary here, though it would work.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 13, 2013 at 11:52 am
Brandie Tarvin (5/13/2013)
So, any ideas?
Brandie, Try this out:
-- Let's add a column to the staging table to make deletes easier. You could even index this if you want to.
-- However, this isn't necessary.
IF NOT EXISTS (SELECT * FROM sys.columns WHERE object_id = OBJECT_ID('dbo.Staging','U') AND name = 'DateField')
ALTER TABLE dbo.Staging ADD DateField DATE;
-- this is necessary only to prevent any other updates to the Summary table until all of the inserts have taken place.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- get the vendor records assigned first, and assign sequential numbers for the UNID
WITH cte AS
(
SELECT ID, UNID, Extract_Record, DateField, ROW_NUMBER() OVER (ORDER BY ID) AS RN
FROM dbo.Staging
WHERE Extract_Record LIKE '2%' -- get the vendor data
)
UPDATE cte
SET UNID = RN,
DateField=LTRIM(RTRIM(Substring(Extract_Record, 97,6)))+'01';
-- now assign the details to the same UNID for the vendor record < this current record
-- if you don't want to add the DateField column to the Staging table, you can remove
-- the 2nd CROSS APPLY and the update to that column.
UPDATE t1
SET UNID = t2.UNID,
DateField = t3.DateField
FROM dbo.Staging t1
CROSS APPLY (SELECT MAX(UNID) FROM dbo.Staging WHERE ID < t1.ID) t2(UNID)
CROSS APPLY (SELECT DateField FROM dbo.Staging WHERE UNID = t2.UNID) t3
WHERE t1.Extract_Record LIKE '3%';
-- delete the current month:
-- Again, if you don't want to use the DateField column, just use your existing delete statement
-- DELETE FROM dbo.Staging where DateField = CONVERT(DATETIME, CONVERT(char(6),GETDATE(),112) + '01')
BEGIN TRANSACTION
DECLARE @max-2 INT;
SELECT @max-2=ISNULL(MAX(TransactionID),0) FROM dbo.Summary;
-- the repeatable read prevents any other transaction from inserting data that would change the max setting
-- I need to control the value of the identity column, so that I can associate the identity value with the UNID value for the summary record.
-- The association is the @max-2 from above with the UNID already calculated for this record.
SET IDENTITY_INSERT dbo.Summary ON;
INSERT INTO dbo.Summary( TransactionID, Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9)
SELECT @max-2+UNID,
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9
FROM dbo.Staging ri
LEFT OUTER JOIN Summary rit
ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))
WHERE rit.TransactionID IS NULL
-- TransactionID is a PK -->> NOT NULL. A null here means that no matching record was found, so it's okay to insert this record.
-- No need to compare every column.
AND ri.Extract_Record LIKE '2%'; -- just the "2" records
-- don't forget to turn IDENTITY INSERT off!
SET IDENTITY_INSERT dbo.Summary OFF;
INSERT INTO Detail (TransactionID,
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,
Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16)
SELECT @max-2+UNID, -- since we already have the UNID matching in the table, and we inserted UNID+@Max into the summary, use this same calc for the detail.
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,
LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,
LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,
LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,
LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,
LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,
LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,
LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16
FROM dbo.Staging ri2
LEFT OUTER JOIN dbo.Detail rit2
ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))
WHERE rit2.TransactionDetailID IS NULL
-- TransactionDetailID is PK -->> NOT NULL. Null here means no matching record was found, so it's okay to insert.
-- No need to compare every column
AND ri2.Extract_Record LIKE '3%' -- just the "3" rows
COMMIT TRANSACTION
SELECT * FROM dbo.Staging ORDER BY ID;
SELECT * FROM dbo.Summary;
SELECT * FROM dbo.Detail;
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 13, 2013 at 12:13 pm
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line
Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.
I need a solution that addresses my current setup, including the inconsistent data.
May 13, 2013 at 12:16 pm
Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.
May 13, 2013 at 12:40 pm
Brandie Tarvin (5/13/2013)
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to lineExactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.
I need a solution that addresses my current setup, including the inconsistent data.
Handling the inconsistent data is my favorite part of doing it in SSIS, but the other approaches put forward here look good too.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy