Header and Detail Data

Question

Header and Detail Data

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Phil Parkin SSC Guru Points: 246986 More actions · Answer 1

bpowers (1/11/2016)
Well after submitting the SSIS file for testing I was notified that the Header and Detail must be grouped together. Ughhh... If you still have a bcp option you'd be willing to work with me I would greatly appreciate it. Also, xp_cmdshell can be used as I will simply turn it on and off during the execution.
Thanks

Jeff Moden, you have centre stage:-P

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Alvin Ramard SSC-Forever Points: 41190 More actions · Answer 2

I can see a way of doing this in SSIS, if the output is a text file with one or two columns.

What's I'm visualizing is rows in the data flow that have both the header and detail data in each DF row. Toward the end of the data flow, there would be something to detect if the current row has a new header or not. If it's a new header, follow the path that has the header output logic. If it's not a new header, follow the path that has the details output logic.

Each row written to file would be all the required data fields, concatenated with the desired delimiters.

Sounds simple? Implementation might be a bit harder than you might expect.

Alvin Ramard
Memphis PASS Chapter[/url]

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Phil Parkin SSC Guru Points: 246986 More actions · Answer 3

Alvin Ramard (1/11/2016)
I can see a way of doing this in SSIS, if the output is a text file with one or two columns.
What's I'm visualizing is rows in the data flow that have both the header and detail data in each DF row. Toward the end of the data flow, there would be something to detect if the current row has a new header or not. If it's a new header, follow the path that has the header output logic. If it's not a new header, follow the path that has the details output logic.
Each row written to file would be all the required data fields, concatenated with the desired delimiters.
Sounds simple? Implementation might be a bit harder than you might expect.

Yes, I can see how this would work. Hours of debugging fun await.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 4

For SSIS I was thinking more along the lines of this:

1. Execute SQL Task gets all IDs needing to go into the file and stores in SSIS variable of type Object (an ADO.NET DataSet under the covers).

2. Precedence Constraint to Foreach Loop Container iterates over SSIS Variable just populated. In Foreach:

2.1. Data Flow Task writes header data to file utilizing the ID from the outer loop.

2.2. Precedence Constraint from last Data Flow Task to new Data Flow Task that writes details rows to file utilizing the ID from the outer loop. The destination will use a different File Connection Manager pointed to same physical file.

Done.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Phil Parkin SSC Guru Points: 246986 More actions · Answer 5

Orlando Colamatteo (1/11/2016)
For SSIS I was thinking more along the lines of this:
1. Execute SQL Task gets all IDs needing to go into the file and stores in SSIS variable of type Object (an ADO.NET DataSet under the covers).
2. Precedence Constraint to Foreach Loop Container iterates over SSIS Variable just populated. In Foreach:
2.1. Data Flow Task writes header data to file utilizing the ID from the outer loop.
2.2. Precedence Constraint from last Data Flow Task to new Data Flow Task that writes details rows to file utilizing the ID from the outer loop. The destination will use a different File Connection Manager pointed to same physical file.
Done.

This is an elegant idea. There is a chance, I suspect, that the destination file will get locked by DF (header) such that DF2 (detail) fails. But if not, this is tidy.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Alvin Ramard SSC-Forever Points: 41190 More actions · Answer 6

If using SSIS, I think it would be easier to first dump the headers and details into separate files then using a script task to merge the two files together.

Alvin Ramard
Memphis PASS Chapter[/url]

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 7

Phil Parkin (1/11/2016)
Orlando Colamatteo (1/11/2016)
For SSIS I was thinking more along the lines of this:
1. Execute SQL Task gets all IDs needing to go into the file and stores in SSIS variable of type Object (an ADO.NET DataSet under the covers).
2. Precedence Constraint to Foreach Loop Container iterates over SSIS Variable just populated. In Foreach:
2.1. Data Flow Task writes header data to file utilizing the ID from the outer loop.
2.2. Precedence Constraint from last Data Flow Task to new Data Flow Task that writes details rows to file utilizing the ID from the outer loop. The destination will use a different File Connection Manager pointed to same physical file.
Done.
This is an elegant idea. There is a chance, I suspect, that the destination file will get locked by DF (header) such that DF2 (detail) fails. But if not, this is tidy.

Thanks Phil. My design makes two assumptions:

1. The two DFTs* will not execute in parallel.

2. The Flat File Connection Managers will close their respective file handles once they are done writing to the file.

Re: #1 The DFTs should run in sequence because of the Precedence Constraint I have placed between them.

Re: #2 In the first DFT the file will be opened for writing, the header line will be written and then the file will be closed. Immediately after that the second DFT will open the file, write the detail lines associated with the header line and then the file will be closed again. That cycle would repeat until all IDs were processed.

* Just in case: DFT = Data Flows Task, DF = Data Flow and is used sometimes as shorthand for Data Flow Task

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 8

Alvin Ramard (1/11/2016)
If using SSIS, I think it would be easier to first dump the headers and details into separate files then using a script task to merge the two files together.

That could definitely work but extends the scope of required knowledge into C# development.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Alvin Ramard SSC-Forever Points: 41190 More actions · Answer 9

Orlando Colamatteo (1/11/2016)
Alvin Ramard (1/11/2016)
If using SSIS, I think it would be easier to first dump the headers and details into separate files then using a script task to merge the two files together.
That could definitely work but extends the scope of required knowledge into C# development.

Gained knowledge is not a bad thing. It could easily be the quickest way to get this done.

Alvin Ramard
Memphis PASS Chapter[/url]

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Phil Parkin SSC Guru Points: 246986 More actions · Answer 10

Orlando Colamatteo (1/11/2016)
Phil Parkin (1/11/2016)
Orlando Colamatteo (1/11/2016)
For SSIS I was thinking more along the lines of this:
1. Execute SQL Task gets all IDs needing to go into the file and stores in SSIS variable of type Object (an ADO.NET DataSet under the covers).
2. Precedence Constraint to Foreach Loop Container iterates over SSIS Variable just populated. In Foreach:
2.1. Data Flow Task writes header data to file utilizing the ID from the outer loop.
2.2. Precedence Constraint from last Data Flow Task to new Data Flow Task that writes details rows to file utilizing the ID from the outer loop. The destination will use a different File Connection Manager pointed to same physical file.
Done.
This is an elegant idea. There is a chance, I suspect, that the destination file will get locked by DF (header) such that DF2 (detail) fails. But if not, this is tidy.
Thanks Phil. My design makes two assumptions:
1. The two DFTs* will not execute in parallel.
2. The Flat File Connection Managers will close their respective file handles once they are done writing to the file.
Re: #1 The DFTs should run in sequence because of the Precedence Constraint I have placed between them.
Re: #2 In the first DFT the file will be opened for writing, the header line will be written and then the file will be closed. Immediately after that the second DFT will open the file, write the detail lines associated with the header line and then the file will be closed again. That cycle would repeat until all IDs were processed.
* Just in case: DFT = Data Flows Task, DF = Data Flow and is used sometimes as shorthand for Data Flow Task

I agree that it should work. But even if it does, repeatedly opening, writing and saving the target file for a lot of Ids must take its toll on performance.

I sense Jeff waiting in the wings, munching his popcorn.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 11

Yet another option that still uses SSIS but that puts more focus into the T-SQL selection logic is to generate a resultset where the data is already interleaved that contains a wide column data that looks exactly how it should look in the data line of the file except for maybe some right-space-padding. Similar to what Phil said in his initial post on this thread. This way the SSIS Package would simply be:

1. Data Flow Task selects data where header and detail lines are already interleaved and each row contains a wide column containing data like each line in the file. The data would then be laid down in the file as it comes out of the database. If the header and detail lines were different total widths, e.g. headers are 200 characters wide and details are 300 characters wide, then the Flat File Connection Manager would need to be setup with Ragged Right formatting.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

No matter which output route you take, why not simply create a set with

key (order id?)

line (order line number/row_number())

data (nvarchar(max) fixed width data)

Insert the headers, all with [line] = 0

Insert the details, all with [line] > 0

Output [data] to text file, order by ,[line]

If you need file headers or order trailers/file trailer, they are simple enough to include in the set.

Oh, and archive this set with the date/time stamp so that you can recreate the files they accidentally lose without having to re-query the source data.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 13

Phil Parkin (1/11/2016)
Orlando Colamatteo (1/11/2016)
Phil Parkin (1/11/2016)
Orlando Colamatteo (1/11/2016)
For SSIS I was thinking more along the lines of this:
1. Execute SQL Task gets all IDs needing to go into the file and stores in SSIS variable of type Object (an ADO.NET DataSet under the covers).
2. Precedence Constraint to Foreach Loop Container iterates over SSIS Variable just populated. In Foreach:
2.1. Data Flow Task writes header data to file utilizing the ID from the outer loop.
2.2. Precedence Constraint from last Data Flow Task to new Data Flow Task that writes details rows to file utilizing the ID from the outer loop. The destination will use a different File Connection Manager pointed to same physical file.
Done.
This is an elegant idea. There is a chance, I suspect, that the destination file will get locked by DF (header) such that DF2 (detail) fails. But if not, this is tidy.
Thanks Phil. My design makes two assumptions:
1. The two DFTs* will not execute in parallel.
2. The Flat File Connection Managers will close their respective file handles once they are done writing to the file.
Re: #1 The DFTs should run in sequence because of the Precedence Constraint I have placed between them.
Re: #2 In the first DFT the file will be opened for writing, the header line will be written and then the file will be closed. Immediately after that the second DFT will open the file, write the detail lines associated with the header line and then the file will be closed again. That cycle would repeat until all IDs were processed.
* Just in case: DFT = Data Flows Task, DF = Data Flow and is used sometimes as shorthand for Data Flow Task
I agree that it should work. But even if it does, repeatedly opening, writing and saving the target file for a lot of Ids must take its toll on performance.

It could but it may not matter much in the grand schema of things. The good news is that the solution should scale linearly.

I sense Jeff waiting in the wings, munching his popcorn.

Quite possibly 🙂 This is partially why I threw out the option of shaping the resultset entirely in T-SQL and just having SSIS write that to a flat file as-is. That solution requires a lot more effort in the T-SQL area but should perform admirably.

I'm happy to take SSIS, C# and T-SQL to compete with anyone using xp_cmdshell, T-SQL and bcp, any day 🙂

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 14

Oh, and archive this set with the date/time stamp so that you can recreate the files they accidentally lose without having to re-query the source data.

Why not just archive the generated file? I am thinking filer space would be much cheaper than database space for archiving and if a replacement file were needed it's right there. If there was a data correction you'd likely have to re-pull from the transactional source anyway.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004409 More actions · Answer 15

bpowers (1/11/2016)
Well after submitting the SSIS file for testing I was notified that the Header and Detail must be grouped together. Ughhh... If you still have a bcp option you'd be willing to work with me I would greatly appreciate it. Also, xp_cmdshell can be used as I will simply turn it on and off during the execution.
Thanks

Whether using SSIS or something else, now would be a great time to post some example header and detail rows as you'd like to see them. Also, when you say "flat file", do you really mean a CSV/TSV or fixed field format (with the understanding that header and detail rows will be different)?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)