May 12, 2017 at 5:00 am
I have a weird one today. Because my destination system can only accept files of a certain size, I need to dynamically create multiple files for a single day's feed. This means I need to export X number of rows per file, but the issue is each time I run this I don't know the total number of rows that will be in the table I'm creating my files from. So the number of files created will be dynamic based on the row count.
In this case, let's say the maximum row count per file is 750,000 (.csv file, not .xlsx). One day the table has 2 million rows, but another day it has 5 million. Or perhaps 1.5 million. The idea is that the ForEach Loop container will run 3 times on day 1 (creating 3 files) while running 7 times for the 5 million rows or 2 times for the 1.5 million rows. I've got the EXECUTE T-SQL step right before the container to pull the number of files needed to a variable:
SELECT (COUNT(*)/750000) AS FileCount FROM dbo.MyTable;
The above pushes to an @FileCount variable in the package. I was going to use this as the Foreach variable enumerator, but it wants a string pointing to an ADO object. As does the ADO enumerator. None of the other enumerators seem to be what I'm looking for.
Maybe I'm doing this wrong, but I built the table with a bit column to tell me when certain rows have been exported (meaning I update the column from 0 to 1 when I push them to one of the files and then go to grab the next set of 750,000 set at 0). Unfortunately, I can't figure out how to translate that schema over to the Foreach Loop Container to force it to run dynamically based on number of result sets.
Any thoughts on how I can accomplish this?
May 12, 2017 at 5:08 am
Quick thought, use a row number or an id number, divide by the max row size and wrap this in either a view or a inline table valued function. Then you can pass each iteration number as a parameter.
😎
May 12, 2017 at 5:11 am
One possible solution (perhaps not the best) would be to have a query return one row for each iteration needed (using a numbers table, of course), then use the ADO Enumerator for the ForEach container
May 12, 2017 at 5:22 am
Thanks, Guys. I'm going to watch this thread for a little bit more (I want to hear what Thom has to say) before digging back into this. I've been banging my head against the proverbial wall, so need to step away and let these suggestions percolate (if you know what I mean).
May 12, 2017 at 5:28 am
Are you exporting the data from a table?
😎
May 12, 2017 at 5:47 am
If you know the absolute maximum number of rows in the source, you could perhaps add an NTILE to your source query & then have a conditional split which sends the various streams to different file destinations.
Or here is a pure C# solution which could easily be adapted to work in SSIS.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 12, 2017 at 7:21 am
I was actually planning to use a Script Component and an ADO variable as the datasource, however, when I get to the second iteration I get no results from the variable. it's as if SSIS drops the contents after you read it the first time... Little confused. I'm trying to trouble shoot it some more.
EDIT: So, apparently, using fill empties the SSIS Variable as the variable in C# is filled . Which means, when I do my second iteration, it's then empty. D'oh!. >_<
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
May 12, 2017 at 11:33 am
I've done something similar in the past, here is some sample code I mocked up
public void Main()
{
try
{
var strQuery = "Select col1, col2 from dbo.TableName";
var fileName = "";
var recordCount = 0;
var fileCount = 0;
var recordNumberPerFile = 10000;
//create connection using a connection manager in the package
SqlConnection conn = (SqlConnection)Dts.Connections["Database"].AcquireConnection(null);
SqlCommand cmd = new SqlCommand(strQuery, conn);
SqlDataReader reader = cmd.ExecuteReader();
StreamWriter writer = null;
while (reader.Read())
{
//if no write or we have reached the record number
if (writer == null || recordCount == recordNumberPerFile)
{
recordCount = 0;
// Close the previous file if it is open...
if (writer != null)
{
writer.Close();
writer.Dispose();
}
fileName = "c:\\edi\\FileName_" + (++fileCount).ToString() + ".txt";
// Open the new file...
writer = new StreamWriter(fileName);
}
//populate each row into the writer
string recordSet = reader.GetValue(0).ToString() + reader.GetValue(1).ToString();
writer.WriteLine(recordSet); // recordInfo is sudo code as well
recordCount++;
}
conn.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception e)
{
Dts.Events.FireError(0, "Script Task Example", e.Message, String.Empty, 0);
}
}
May 12, 2017 at 11:59 am
Eirikur Eiriksson - Friday, May 12, 2017 5:28 AMAre you exporting the data from a table?
😎
Yes. I've got a table with 25+ columns in it and the final column is ExportedToFile BIT NOT NULL (set as 0 meaning "not yet exported"). The idea is to skim the top 750000 records off that, update them in the data flow as I'm pushing them to the file (Multicast is a wonderful thing) and then grab the next 750,000 for the next file to do the same thing.
May 12, 2017 at 12:00 pm
Phil Parkin - Friday, May 12, 2017 5:47 AMIf you know the absolute maximum number of rows in the source, you could perhaps add an NTILE to your source query & then have a conditional split which sends the various streams to different file destinations.
Or here is a pure C# solution which could easily be adapted to work in SSIS.
That doesn't help. I never know until the code is run how many rows will be in the table and that number will not be consistent. Which means I can't have different file destinations because that number will hardly ever be consistent.
EDIT: grammar fix
EDIT 2: Missed the C# link. Opening that up now. Thank you for the reference.
May 12, 2017 at 12:01 pm
FridayNightGiant - Friday, May 12, 2017 11:33 AMI've done something similar in the past, here is some sample code I mocked up
public void Main()
{
try
{
var strQuery = "Select col1, col2 from dbo.TableName";var fileName = "";
var recordCount = 0;
var fileCount = 0;
var recordNumberPerFile = 10000;//create connection using a connection manager in the package
SqlConnection conn = (SqlConnection)Dts.Connections["Database"].AcquireConnection(null);
SqlCommand cmd = new SqlCommand(strQuery, conn);
SqlDataReader reader = cmd.ExecuteReader();
StreamWriter writer = null;while (reader.Read())
{
//if no write or we have reached the record number
if (writer == null || recordCount == recordNumberPerFile)
{
recordCount = 0;// Close the previous file if it is open...
if (writer != null)
{
writer.Close();
writer.Dispose();
}fileName = "c:\\edi\\FileName_" + (++fileCount).ToString() + ".txt";
// Open the new file...
writer = new StreamWriter(fileName);
}
//populate each row into the writer
string recordSet = reader.GetValue(0).ToString() + reader.GetValue(1).ToString();
writer.WriteLine(recordSet); // recordInfo is sudo code as wellrecordCount++;
}conn.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception e)
{
Dts.Events.FireError(0, "Script Task Example", e.Message, String.Empty, 0);
}
}
Hrm. I did not think about using a script for this process. Thank you. I'll look at this code and let you know how it works out for me.
May 12, 2017 at 12:06 pm
Thom A - Friday, May 12, 2017 7:21 AMI was actually planning to use a Script Component and an ADO variable as the datasource, however, when I get to the second iteration I get no results from the variable. it's as if SSIS drops the contents after you read it the first time... Little confused. I'm trying to trouble shoot it some more.EDIT: So, apparently, using fill empties the SSIS Variable as the variable in C# is filled . Which means, when I do my second iteration, it's then empty. D'oh!. >_<
Thanks for looking, Thom. I was actually having a similar problem, trying to figure out how to loop back to an EXECUTE SQL Task that was outside the loop and providing a number of rows. I thought about nesting containers, but realized I still didn't have a good way of kicking off the outermost one, etc. and that's when I came here looking for additional guidance.
May 12, 2017 at 12:13 pm
Brandie Tarvin - Friday, May 12, 2017 12:00 PMPhil Parkin - Friday, May 12, 2017 5:47 AMIf you know the absolute maximum number of rows in the source, you could perhaps add an NTILE to your source query & then have a conditional split which sends the various streams to different file destinations.
Or here is a pure C# solution which could easily be adapted to work in SSIS.That doesn't help. I never know until the code is run how many rows will be in the table and that number will not be consistent. Which means I can't have different file destinations because that number will hardly ever be consistent.
EDIT: grammar fix
EDIT 2: Missed the C# link. Opening that up now. Thank you for the reference.
Just to clarify, consistency is not a prerequisite for my suggestion. Only that you know the absolute maximum number of rows that will ever be exported.
For example, if you know that you'll never have more than 7.5 million rows, and your batch size is 750,000, you'd need to create 10 flat file destinations.
Eg, if 1.5 million rows, you'd run an NTILE(2), breaking the data into two 750k groups.
Your conditional split would fire these rows to outputs 1 and 2, leaving the other 8 destinations unused.
The C# idea is clearly more flexible, but this solution may be able to use parallelism & therefore execute more quickly.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 15, 2017 at 8:10 am
Phil Parkin - Friday, May 12, 2017 12:13 PMJust to clarify, consistency is not a prerequisite for my suggestion. Only that you know the absolute maximum number of rows that will ever be exported.
For example, if you know that you'll never have more than 7.5 million rows, and your batch size is 750,000, you'd need to create 10 flat file destinations.
And as I'm trying to explain, there never will be an absolute maximum guarantee. Our business isn't consistent enough for me or anyone else to be able to predict it. Things slow down when there are recessions or depressions. Things increase as people get more disposable income or more small businesses open up and start doing business with us. My company would like to become an industry leader, which means doing business with more and more people. So I can only guess the data we are tracking will continue to grow, with the assumption that sometimes it won't.
May 15, 2017 at 8:27 am
Brandie Tarvin - Monday, May 15, 2017 8:10 AMPhil Parkin - Friday, May 12, 2017 12:13 PMJust to clarify, consistency is not a prerequisite for my suggestion. Only that you know the absolute maximum number of rows that will ever be exported.
For example, if you know that you'll never have more than 7.5 million rows, and your batch size is 750,000, you'd need to create 10 flat file destinations.And as I'm trying to explain, there never will be an absolute maximum guarantee. Our business isn't consistent enough for me or anyone else to be able to predict it. Things slow down when there are recessions or depressions. Things increase as people get more disposable income or more small businesses open up and start doing business with us. My company would like to become an industry leader, which means doing business with more and more people. So I can only guess the data we are tracking will continue to grow, with the assumption that sometimes it won't.
OK, got it. Then you need to find a dynamic way. The C# scripting method would work and is probably the simplest.
To achieve parallelism, it should be possible to create a Parent/Child package structure with as many occurrences of the child package as threads you wish to run in parallel.
The master package would have to pass things like target filename and source data range start/end to the child package.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply