June 23, 2010 at 2:55 am
I have an unusual requirement and am hoping somebody here can suggest something to take the pain out of it - please bear with the painful explanation.
I've been asked to supply a CSV file to represent hierachical data - please don't point out the folly of this as it's only too obvious to me that we should be using XML, but anyway....
To work around being able to express a hierarchy in CSV we have a "record types" flag - A, B or C. A is for the customer, B and C are items of clothing the customer bought where C is footwear and B is anything else. An A record has three fields - record type, customer id, name. B and C records have 5 fields - record type, customer id, clothing type, colour, price. Currently I have a SSIS package which outputs the record types seperately - all the A records, then B then C, so that if I concatenate the results I get e.g:
A, 1, Tom
A, 2, Bill
B, 1, Shirt, Blue, 19.99
B, 1, Jeans, Grey, 24.99
B, 2, Hat, Black, 9.99
C, 2, Socks, Grey, 3.99
But the format requested is that all of the same customers records are sorted together so that the CSV output should be:
A, 1, Tom
B, 1, Shirt, Blue, 19.99
B, 1, Jeans, Grey, 24.99
A, 2, Bill
B, 2, Hat, Black, 9.99
C, 2, Socks, Grey, 3.99
I'm looking for ideas on the simplest way of doing this, bearing in mind that this is very simplified example data and that really I'm dealing with thousands of records. Any help or suggestions greatly appreciated.
June 23, 2010 at 3:02 am
Simply foolish choice.
Anyway, you could pre-process the files splitting them into three files depending on the A/B/C flag.
-- Gianluca Sartori
June 23, 2010 at 4:12 am
I would opt for using 3 files as suggested, but if this not possible then you could put the results of your current package into a single staging table and then select fromt that ordering by CustomerId, Flag
What is the purpose of this file? i can not really see this format being useful for any data processing systems..
June 24, 2010 at 1:22 am
Crazy as it seems apparently this is the format the destination system wants. Let me explain further - at the moment I already have the three files produced by my package, each the result of a separate query. The last step of my package is to literally concatenate the three, but this obviously produces a file ordered according to record type whereas the request is to have records grouped by customer.
I already thought of using a staging table and maybe there is no better way - of course this nearly doubles the processing time which is also a factor.
I was just hoping somebody out there might have some inspiring new angle.
June 24, 2010 at 8:14 am
You might find that using a staging table actually improves the performance of the package since SQL server is more efficent at joining/concatenating large sets of data
June 24, 2010 at 10:20 am
Taking the idea of a staging table to the next level, how does this work out for you?
declare @test-2 table (data varchar(100), cust_id int null)
insert into @test-2 (data)
select 'A, 1, Tom' UNION ALL
SELECT 'A, 2, Bill' UNION ALL
SELECT 'B, 1, Shirt, Blue, 19.99' UNION ALL
SELECT 'B, 1, Jeans, Grey, 24.99' UNION ALL
SELECT 'B, 2, Hat, Black, 9.99' UNION ALL
SELECT 'C, 2, Socks, Grey, 3.99'
;with CTE as
(
select cust_id, data, pos = CHARINDEX(' ', data)
from @test-2
)
update cte
set cust_id = SUBSTRING(data, pos+1, charindex(',', data, pos+1) - pos -1)
select data
from @test-2
order by cust_id, data
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 25, 2010 at 1:43 am
Thanks everyone and WayneS in particular for reminding me of that possibility - I may well base my solution around that idea.
June 25, 2010 at 2:41 pm
I've seen this sort of format before. It seems to be common for transferring data to mainframe and midrange based systems that may not use a RDBMS. I've seen this in interfaces to SAP as well, where the record type field is crucial to proper processing.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply