Sort CSV with uneven columns

Question

Post reply

Sort CSV with uneven columns

ohack

SSCrazy

Points: 2043
More actions
June 23, 2010 at 2:55 am

#140875

I have an unusual requirement and am hoping somebody here can suggest something to take the pain out of it - please bear with the painful explanation.
I've been asked to supply a CSV file to represent hierachical data - please don't point out the folly of this as it's only too obvious to me that we should be using XML, but anyway....
To work around being able to express a hierarchy in CSV we have a "record types" flag - A, B or C. A is for the customer, B and C are items of clothing the customer bought where C is footwear and B is anything else. An A record has three fields - record type, customer id, name. B and C records have 5 fields - record type, customer id, clothing type, colour, price. Currently I have a SSIS package which outputs the record types seperately - all the A records, then B then C, so that if I concatenate the results I get e.g:
A, 1, Tom
A, 2, Bill
B, 1, Shirt, Blue, 19.99
B, 1, Jeans, Grey, 24.99
B, 2, Hat, Black, 9.99
C, 2, Socks, Grey, 3.99
But the format requested is that all of the same customers records are sorted together so that the CSV output should be:
A, 1, Tom
B, 1, Shirt, Blue, 19.99
B, 1, Jeans, Grey, 24.99
A, 2, Bill
B, 2, Hat, Black, 9.99
C, 2, Socks, Grey, 3.99
I'm looking for ideas on the simplest way of doing this, bearing in mind that this is very simplified example data and that really I'm dealing with thousands of records. Any help or suggestions greatly appreciated.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

Simply foolish choice.

Anyway, you could pre-process the files splitting them into three files depending on the A/B/C flag.

-- Gianluca Sartori

steveb. SSC-Forever Points: 46733 More actions · Answer 2

I would opt for using 3 files as suggested, but if this not possible then you could put the results of your current package into a single staging table and then select fromt that ordering by CustomerId, Flag

What is the purpose of this file? i can not really see this format being useful for any data processing systems..

ohack SSCrazy Points: 2043 More actions · Answer 3

Crazy as it seems apparently this is the format the destination system wants. Let me explain further - at the moment I already have the three files produced by my package, each the result of a separate query. The last step of my package is to literally concatenate the three, but this obviously produces a file ordered according to record type whereas the request is to have records grouped by customer.

I already thought of using a staging table and maybe there is no better way - of course this nearly doubles the processing time which is also a factor.

I was just hoping somebody out there might have some inspiring new angle.

steveb. SSC-Forever Points: 46733 More actions · Answer 4

You might find that using a staging table actually improves the performance of the package since SQL server is more efficent at joining/concatenating large sets of data

WayneS SSC Guru Points: 95460 More actions · Answer 5

Taking the idea of a staging table to the next level, how does this work out for you?

declare @test-2 table (data varchar(100), cust_id int null)

insert into @test-2 (data)

select 'A, 1, Tom' UNION ALL

SELECT 'A, 2, Bill' UNION ALL

SELECT 'B, 1, Shirt, Blue, 19.99' UNION ALL

SELECT 'B, 1, Jeans, Grey, 24.99' UNION ALL

SELECT 'B, 2, Hat, Black, 9.99' UNION ALL

SELECT 'C, 2, Socks, Grey, 3.99'

;with CTE as

(

select cust_id, data, pos = CHARINDEX(' ', data)

from @test-2

)

update cte

set cust_id = SUBSTRING(data, pos+1, charindex(',', data, pos+1) - pos -1)

select data

from @test-2

order by cust_id, data

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

ohack SSCrazy Points: 2043 More actions · Answer 6

Thanks everyone and WayneS in particular for reminding me of that possibility - I may well base my solution around that idea.

Ross McMicken SSCarpal Tunnel Points: 4438 More actions · Answer 7

I've seen this sort of format before. It seems to be common for transferring data to mainframe and midrange based systems that may not use a RDBMS. I've seen this in interfaces to SAP as well, where the record type field is crucial to proper processing.