July 26, 2021 at 2:14 pm
Dear Group:
Please note: I can only use solutions that involve SSIS and/or Stored Procedures due to limits placed on our team and environment. As much as I understand and agree there are better solutions, we are limited to only things that involve SSIS and commands that can be executed inside a stored procedure.
We are having an issue, and I am not exactly sure how to solve it and hoping others can give me suggestions, pointers, etc. We have a process that runs and exports data to a file. The file is 16GB in size (please, I understand this isn't the best solution, but again, due to our environment, we can only create a file, place it on a server and have other teams pick it up, as much as we would like to have direct connections, ETL tools, etc, which we are planning on implementing in the next budget year, but unfortunately that doesn't help the immediate issue at hand). The downside is the team that ingests this file, can only ingest 5GB files. As such, we need to break out our current process into multiple files.
Our current process is simply an SSIS package that "SELECTS * FROM TABLE" and then writes the results to a file. I am having a terrible time figuring out how to break this process into multiple steps so we can create multiple files. For instance "SELECT TOP 1000000 FROM TABLE", then output that to a file and grab the next 1,000,000 records, and output those to a file, until all records in the table have been exported. Also, we would need to update the filename with each iteration: file_1, file_2, file_3, etc.
Any help and/or pointers to research would be greatly appreciate.
July 26, 2021 at 2:23 pm
The least-disruptive way of doing this which I can think of is to add a script task at the end of your existing SSIS process which splits your output file into the required chunks.
I have not tried it, but take a look at Jon Skeet's answer here.
It will take longer than an optimised process which writes out the files only once, of course.
I'll think about other possibilities.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 26, 2021 at 2:55 pm
or instead of using a file defined within SSIS you create and write to the file using a c# script component as a data transformation within your dataflow.
there you can implement logic to close and create a new file after X number of rows while using the same source component.
if using a streamwriter it will be as fast as SSIS file destination.
sample rudimentary example here https://social.msdn.microsoft.com/Forums/sqlserver/en-US/82639c70-1954-4c15-9f6e-2ece1c69ef78/ssis-script-task-component-to-write-input0buffer-to-a-text-file?forum=sqlintegrationservices
July 26, 2021 at 4:04 pm
When I have had these types of issues in the past - I broke the files out by a date. If you have any data available in that table that can be used to partition the data by date, then you can easily setup the process to export the data using that date.
The next thing is to identify the maximum date range - daily, weekly, monthly or yearly data.
If you cannot use a date, then find some other data in the tables to partition the files. Once you have that identified, you can create a procedure that returns a result set of the identified partitions. Using a foreach loop you then loop over the partition, grabbing the identifiers and passing them to the data flow task - and use those same parameters to dynamically create the output file.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
July 30, 2021 at 12:07 pm
May not be the best idea, but you could also figure out the number of records that are about 5 GB, insert them into a WorkTable, and then delete those recs from your original table. Then have SSIS export the work table as file 1. Repeat until the original table is empty. If you have to maintain the original table, make a copy of it first and then use it for the process. Sorry, just an idea that is more SQL based instead of a script.
August 2, 2021 at 12:16 pm
I would look at introducing a Conditional Split Transformation between the data source and the destination if you can define appropriate conditions on which to split the data. You can then add an additional destination for each of the conditions you add.
If you can't define any suitable conditions for the split then you might have to go back to the table you are exporting and add an additional column which can be used for the split, you could use an identity column or even a column to specify which file to output the data to. I guess it depends on if there is any order required for the output. If no particular order is required, then calculate how many files you want and how many rows for each file and then set the appropriate number of rows to that value using SET ROWCOUNT with an UPDATE STATEMENT.
For example, ALTER TABLE tablename ADD COLUMN FileNumber TINYINT NULL;
SET ROWCOUNT 100000;
UPDATE tablename SET FileNumber = 1 WHERE FileNumber IS NULL;
Do you have access to a Linux computer or can install CYGWIN utilities? As another option would be to use the SPLIT command on the current output file you are already producing.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply