Cross Join

  • Is it possible to perform a cross join is SSIS? I want to create a Cartesian product for all the rows in my dataflow against another table residing on a different server.

  • I don't see why it couldn't be done, but it might peform like an absolute dog, due to either network latency and/or just the sheer number of rows it generates. Cross joins between data on the same server is potential trouble enough, but between two different servers, I think you're asking for a nightmare, unless the total number of rows in the Cartesian product is relatively small, and would remain that way going forward.

    Two thoughts: Start by copying the table from the 2nd server to the first, and then do your cartesian product on one server, or, Select the appropriate subset of data from the 2nd server and copy it to the 1st. One of those is likely to help significantly.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • I had already considered using SSIS to move the smaller table to the other server and then performing the cross join in a OLEDB Source component. While this would work I just wondered if there were a native SSIS way of doing it or whether anyone knew of a custom component that did it.

  • You can’t do this directly but by adding a dummy column to both data sources you should be able to accomplish it with a Merge Join transform.

    If both of your sources are queries just add a dummy column with a value of 1. Then join both sets on that dummy column. Every input from the left side will then match every input from the right side.

    The following is a SQL example but it demonstrates the idea.

    -- 5 rows * 4 rows = 20 rows out

    SELECT

    a.RealData AS RealDataSet1

    ,b.RealData AS RealDataSet2

    FROM

    (

    SELECT 1 AS DummyColumn, 'a' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'b' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'c' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'd' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'e' AS RealData

    ) a

    INNER JOIN

    (

    SELECT 1 AS DummyColumn, 'w' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'x' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'y' AS RealData UNION ALL

    SELECT 1 AS DummyColumn, 'z' AS RealData

    ) b

    ON

    a.DummyColumn = b.DummyColumn

    My biggest concern would be the potential blocking effects of both queries against the source systems as SSIS multiplies the inputs to create the result set.

  • Clever stuff, might give that a try. Comments about performance taken on board. Thanks chaps.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply