January 21, 2009 at 3:43 am
Is it possible to perform a cross join is SSIS? I want to create a Cartesian product for all the rows in my dataflow against another table residing on a different server.
January 22, 2009 at 8:28 am
I don't see why it couldn't be done, but it might peform like an absolute dog, due to either network latency and/or just the sheer number of rows it generates. Cross joins between data on the same server is potential trouble enough, but between two different servers, I think you're asking for a nightmare, unless the total number of rows in the Cartesian product is relatively small, and would remain that way going forward.
Two thoughts: Start by copying the table from the 2nd server to the first, and then do your cartesian product on one server, or, Select the appropriate subset of data from the 2nd server and copy it to the 1st. One of those is likely to help significantly.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
January 22, 2009 at 8:53 am
I had already considered using SSIS to move the smaller table to the other server and then performing the cross join in a OLEDB Source component. While this would work I just wondered if there were a native SSIS way of doing it or whether anyone knew of a custom component that did it.
January 22, 2009 at 9:11 am
You canβt do this directly but by adding a dummy column to both data sources you should be able to accomplish it with a Merge Join transform.
If both of your sources are queries just add a dummy column with a value of 1. Then join both sets on that dummy column. Every input from the left side will then match every input from the right side.
The following is a SQL example but it demonstrates the idea.
-- 5 rows * 4 rows = 20 rows out
SELECT
a.RealData AS RealDataSet1
,b.RealData AS RealDataSet2
FROM
(
SELECT 1 AS DummyColumn, 'a' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'b' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'c' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'd' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'e' AS RealData
) a
INNER JOIN
(
SELECT 1 AS DummyColumn, 'w' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'x' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'y' AS RealData UNION ALL
SELECT 1 AS DummyColumn, 'z' AS RealData
) b
ON
a.DummyColumn = b.DummyColumn
My biggest concern would be the potential blocking effects of both queries against the source systems as SSIS multiplies the inputs to create the result set.
January 22, 2009 at 10:03 am
Clever stuff, might give that a try. Comments about performance taken on board. Thanks chaps.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply