Export same select into different files

Question

Export same select into different files

dant12

SSCrazy

Points: 2712
More actions
February 24, 2011 at 10:43 am

#247261

Example:
declare @test-2 table
(
col2 varchar(15),col3 numeric(12,2)
)
insert into @test-2
select 'File1', 15 union all
select 'File1', 16 union all
select 'File1', 17 union all
select 'File2', 18 union all
select 'File2', 19 union all
select 'File2', 10 union all
select 'File3', 11 union all
select 'File3', 12 union all
select 'File3', 13 union all
select 'File3', 14 union all
select 'File4', 17
select * from @test-2
given this example (in reality it will end up being around 30k files), is there a simple (not ridiculously complicated) way to direct this single select into several files?
My first thought was cascading foreach loops, but that would take a crap load of time running, and as I'm new to the SSIS world, I'm kinda clueless on this.
Any thought appreciated
--
Thiago Dantas
@DantHimself

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246991 More actions · Answer 1

Strange requirement!

Can you do it just once and then do a file copy to the others - rather than running the data transfer over and over again?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Todd McDermid Default port Points: 1416 More actions · Answer 2

If you're doing this in the Data Flow (which you should), then you can use the Multicast component to make "copies" of the data flow - one in, two (or more) out - and send those to independent destinations.

Todd McDermid - SQL Server MVP, MCTS (SQL 08 BI), MCSD.Net
My Blog - Dimension Merge SCD Component for SSIS - SSIS Community Tasks and Components

dant12 SSCrazy Points: 2712 More actions · Answer 3

tmcdermid (2/24/2011)
If you're doing this in the Data Flow (which you should), then you can use the Multicast component to make "copies" of the data flow - one in, two (or more) out - and send those to independent destinations.

sounds good, except I will need *many* destinations!

the way I see this, and please correct me if I'm wrong, is hardcoding the destinations, which is not possible.

@Phil

you mean, extracting the dataset to a single file and iterate it to generate other files? Not a bad idea really.

Any more ideas? Anyone?

--
Thiago Dantas
@DantHimself

Phil Parkin SSC Guru Points: 246991 More actions · Answer 4

Pretty much what I meant.

Where are you getting the destinations from?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

dant12 SSCrazy Points: 2712 More actions · Answer 5

I apologize, I thought it was clear. The destinations come from the query.

take a look at the example

what i need (with this example) is

filename: file1.csv

contents:

File115.00

File116.00

File117.00

filename: file2.csv

contents:

File218.00

File219.00

File210.00

filename: file3.csv

contents:

File311.00

File312.00

File313.00

File314.00

filename: file4.csv

contents:

File417.00

--
Thiago Dantas
@DantHimself

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 6

Ah, in that case you'll need a conditional split to split out the data flow to the right destinations.

However, if you have a lot of destinations, it can be a PITA to configure it all.

So maybe you can create some script task in .NET to handle it dynamically?

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Phil Parkin SSC Guru Points: 246991 More actions · Answer 7

Hmmm, wasn't clear to me! But now it is and my suggestion was not valid.

My first idea would be to use an Execute SQL task to populate an object variable with the results of

Select distinct filename

from ...

Now you have something to work with.

Next task is a foreach loop which iterates around ('shreds') the object variable.

Inside the foreach loop is a dataflow task which creates the files one at a time.

Your source query within the dataflow will be an expression

select * from ... where filename = object variable (can't remember the syntax, sorry)

and the destination path will also be an expression using the object variable.

This is quite advanced for an SSIS newbie 🙂

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

dant12 SSCrazy Points: 2712 More actions · Answer 8

Phil Parkin (2/24/2011)
Hmmm, wasn't clear to me! But now it is and my suggestion was not valid.
My first idea would be to use an Execute SQL task to populate an object variable with the results of
Select distinct filename
from ...
Now you have something to work with.
Next task is a foreach loop which iterates around ('shreds') the object variable.
Inside the foreach loop is a dataflow task which creates the files one at a time.
Your source query within the dataflow will be an expression
select * from ... where filename = object variable (can't remember the syntax, sorry)
and the destination path will also be an expression using the object variable.
This is quite advanced for an SSIS newbie 🙂

that was kinda how I thought of it at first, but isn't a foreach like this pretty slow?

Speed is not really required here, but I don't want it running for hours either

--
Thiago Dantas
@DantHimself

Phil Parkin SSC Guru Points: 246991 More actions · Answer 9

One way or another, you've got to create all of those files. Given that you need the destination to be dynamic, I can't think of a easy way to make it run in parallel - I guess you could break the files into ranges and have multiple foreach loops - one for each range - to introduce some parallelism. Not nice though.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

dant12 SSCrazy Points: 2712 More actions · Answer 10

the way im doing it right now is a foreach loop only to loop through the 5 databases i need to query.

the query also returns parts of the destination folder, to ease up my work

ADO.NET source with the source query and a Script Component as destination

inside the script component, on ProcessInputRow method i define a new IO.StreamWriter with the destination assembled, build the row with a Text.StringBuilder and call writeline on the streamwriter, and close it.

anything else i can do to speed this up? (im even newer to this kind of programming, and the company devs in Java :-P)

--
Thiago Dantas
@DantHimself

dant12 SSCrazy Points: 2712 More actions · Answer 11

Just got error 0xC0047020(unable to process the data) on the ado.net source then a following 0xC02090F5(something with PrimeOutput method in the message) error

I have no idea what this means and google hasn't been really useful =|

Anyone knows what these are?

--
Thiago Dantas
@DantHimself