SSIS - Write table to multiple files

  • Hi there, everyone.

    I have an SSIS package where I have to write a large table to smaller csv files. I have created a stored procedure as below that is called by an OleDb Source inside a workflow task. This then needs to write each result to a file. I'm very confused as to how to do this. Below is the stored procedure. Can anyone steer me in the right direction as to how to approach this, please?

    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE PROCEDURE [dbo].[GEO_FileExtract]

    AS

    BEGIN TRY

    DECLARE @TempTable TABLE (
    [IDNO] [varchar](16) NULL,
    [GEO001] [varchar](15) NULL,
    [GEO002] [varchar](5) NULL,
    [GEO003] [varchar](5) NULL,
    [GEO004] [varchar](5) NULL,
    [GEO005] [varchar](5) NULL)


    WHILE EXISTS (
    SELECT TOP 1 *
    FROM dbo.GEO AS INXG
    WHERE IsProcessed = 0)

    BEGIN

    INSERT INTO @TempTable (
    IDNO,
    GEO001,
    GEO002,
    GEO003,
    GEO004,
    GEO005)
    SELECT TOP 50000
    INXG.IDNO,
    INXG.GEO001,
    INXG.GEO002,
    INXG.GEO003,
    INXG.GEO004,
    INXG.GEO005
    FROM dbo.GEO AS INXG
    ORDER BY INXG.IDNO

    UPDATE INXG
    SET IsProcessed = 1
    FROM dbo.GEO AS INXG
    JOIN @TempTable AS TT
    ON INXG.IDNO = TT.IDNO
    WHERE INXG.IsProcessed = 0

    SELECT
    IDNO AS IDENTIFIERNUMBER,
    TT.GEO001 AS EACODEOFCONSUMER,
    TT.GEO002 AS PROVINCEOFCONSUMER,
    TT.GEO003 AS DISTRICTOFCONSUMER,
    TT.GEO004 AS METROPOLITANOFCONSUMER,
    TT.GEO005 AS MAINPLACEOFCONSUMER
    FROM @TempTable AS TT

    END

    END TRY

    BEGIN CATCH

    SELECT
    ERROR_NUMBER() AS ErrorNumber,
    ERROR_STATE() AS ErrorState,
    ERROR_SEVERITY() AS ErrorSeverity,
    ERROR_PROCEDURE() AS ErrorProcedure,
    ERROR_LINE() AS ErrorLine,
    ERROR_MESSAGE() AS ErrorMessage;

    END CATCH
  • BCP aka Bulk Import/Export is 1 option

  • I have a few quick comments.

    1. Using a table variable for such large volumes of data is usually not a great idea. A temp table will almost certainly perform better.
    2. Shouldn't your initial INSERT include "WHERE IsProcessed = 0" to avoid reselecting rows already selected?
    3. By doing a SELECT in your CATCH block, you are telling the proc not to fail on error, but merely to return a resultset. Is that what you want? If you want it to properly fail on error, put a THROW or RAISERROR in there.
    4. Use an OLEDB data source to execute the proc and write to a file.
    5. Once the file is generated, use a script task or file system tasks to copy it to your various desired names/locations.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Do you know the number of files (smaller files) to be created out of the table? Or it's dynamic?

  • Hi there. Yes, it does include a where IsProcessed =! Or should. Seems I forgot to put it back. I’ll use a temp table instead. Thanks for the advice.

  • Hi there, Brahmanand. The row count is dynamic as it changes. The max size per file should be around 100MB. This will be sent to Azure BLOB and copied to Snowflake.

  • I recommend the following steps:

    1. Identify the total row count and keep it in a variable [A]
    2. Identify the average row size (in bytes), and derive the average number of rows that will fit in the 100 MB, and keep it in a variable
    3. Divide A with B (A/B), apply the ceiling function to get the nearest integer [C]. This will become the number of groups.
    4. Use NTILE() function on the original table and supply the variable (C) to the NTILE function. Overall rows will be divided into groups.
    5. Each individual group is a 100 MB equivalent data/rows.
    6. In SSIS you can use a loop container and iterate from 1 to the number of groups (C). Store the iteration number/group number in an SSIS variable.
    7. Add a Data Flow task in the loop container

      1. The Source shall be OLEDB/ADO NET source. Write the query in the source and filter the data by Group Number.
      2. The Destination shall be a Flat-File destination.

    You can refer to the following articles for dividing a dataset into equal groups.

    Divide the rows in equals batches

    Divide the rows of a column in equals batches in grid format

     

     

     

  • I misunderstood the requirement – you want to split the rows contained in the table across a number of files. My comment (5) therefore makes no sense whatsoever!

    The method described above looks like it will work. Another possibility is that you output to a single large file and then run a process afterwards which splits that file into multiple files.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Wow, awesome. Thank you!

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • matilda wrote:

    Thanks for this case. Very interesting solution!

    Sounds like a typical generic post as a precursor to SPAM.  "Santa is watching"!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply