Distinct Combinations

  • Hello. I'm working on a problem where I need to find the unique combinations of suppliers that supply various items.

    In my data I have Supplier, Item, Orders. An Item can be supplied by 1 or more suppliers. I'd like to find all combinations of suppliers that supply the same item(s) happens in my data.

    For example: Item 1 is supplied by A and B. Item 2 is supplied by B only and Item 3 is supplied by A only. In my results I'd like to see SupplierCombinationID 1 = A, SupplierCombinationID 2 = A and B (2 records with ID value 2). SupplierCombinationID 3 = B.

    It's easy enough to see for a particular item all of the suppliers that supply that item. I can't seem to figure out how to then see the distinct combination of suppliers regardless of the specific item they're supplying.

    Thanks for any help.

  • I'm not sure if you are looking to see multiple rows per item for each unique supplier or a distinct list of suppliers that have supplied any item or 1 row per item displaying all suppliers of that item.

    The first and second ones are pretty straight forward. The last one, 1 row per item displaying all suppliers of that item, is a little more advanced.

    Below are all three hopefully one of these are what you are looking for.

    For each of the three examples the table preperation is the same and is next:

    ------------------------------------------------------------------------------------------

    -- Create a test table named tblOrder

    ------------------------------------------------------------------------------------------

    IF NOT(OBJECT_ID(N'tblOrder') is null)

    DROP TABLE tblOrder

    CREATE TABLE tblOrder (iOrderId INT IDENTITY(1,1),iItemId INT,vcSupplierId VARCHAR(1))

    CREATE CLUSTERED INDEX idx1 ON tblOrder (iItemId,vcSupplierId)

    ------------------------------------------------------------------------------------------

    -- Query that returns unique list of suppliers by supplied item

    ------------------------------------------------------------------------------------------

    INSERT INTO tblOrder

    SELECT 1,'A'

    UNION SELECT 1,'B'

    UNION SELECT 2,'B'

    UNION SELECT 3,'A'

    UNION SELECT 4,'C'

    UNION SELECT 4,'A'

    UNION SELECT 4,'B'

    UNION SELECT 5,'B'

    UNION SELECT 5,'D'

    The first, multiple rows per item for each unique supplier, is achieved using "group by"

    SELECT iItemId,vcSupplierId

    FROM tblOrder

    GROUP BY iItemId,vcSupplierId

    ORDER BY iItemId,vcSupplierId

    The results of this query are:[font="Courier New"]

    iItemId vcSupplierId

    ----------- ------------

    1 A

    1 B

    2 B

    3 A

    4 A

    4 B

    4 C

    5 B

    5 D

    (9 row(s) affected)[/font]

    The second, list of unique suppliers that have supplied any item, is also achieved using "Group By"

    ------------------------------------------------------------------------------------------

    -- Query that returns unique list of suppliers)

    ------------------------------------------------------------------------------------------

    SELECT vcSupplierId

    FROM tblOrders

    GROUP BY vcSupplierId

    ORDER BY vcSupplierId

    The results of this query are:[font="Courier New"]

    vcSupplierId

    ------------

    A

    B

    C

    D

    (4 row(s) affected)[/font]

    The last, 1 row per item displaying all suppliers of that item, is achieved using the FOR XML PATH along with "Group By" to concatenate the individual rows into a single column on their respective row for the same iItemId and is a little more advanced.

    ------------------------------------------------------------------------------------------

    -- Query that returns items with the concatenated list of suppliers (delimited by ',')

    ------------------------------------------------------------------------------------------

    SELECT iItemId,LEFT(vcItemSuppliers,len(vcItemSuppliers)-1) vcItemSuppliers

    FROM (

    SELECT oq.iItemId

    ,(

    SELECT t1.vcSupplierId+','

    FROM tblOrder t1

    WHERE t1.iItemId = oq.iItemId

    GROUP BY t1.vcSupplierId

    FOR XML PATH('')

    ) vcItemSuppliers

    FROM tblOrder oq

    GROUP BY oq.iItemId

    ) SQ

    GROUP BY iItemId,vcItemSuppliers

    -- NOTE: The subquery named SQ is responsible for removing the trailing commas, the necessity of encapsulating these into a subquery depends upon the desired results.

    The results are:[font="Courier New"]

    iItemId vcItemSuppliers

    ----------- ----------------

    1 A,B

    2 B

    3 A

    4 A,B,C

    5 B,D

    (5 row(s) affected)[/font]

    -

  • Jason - that's an interesting result, thank you. I've never used For XML Path before, or anything with XML for that matter. From your results, I'm actually looking for a table where each vcItemSuppliers record becomes a group of records (grouped by an arbitrary integer) completely removed from the items (using items only for the basis from which to group the suppliers to begin with).

    Using your data, the results would be something like:

    Group / SupplierID

    1, A

    2, B

    3, A

    3, B

    4, A

    4, B

    4, C

    5, B

    5, D

    The combination of group and supplier is unique. A supplier is included in a group if that entire group of suppliers (A, B and C) for example supply the same product.

  • macwiz (6/2/2009)


    Jason - that's an interesting result, thank you. I've never used For XML Path before, or anything with XML for that matter. From your results, I'm actually looking for a table where each vcItemSuppliers record becomes a group of records (grouped by an arbitrary integer) completely removed from the items (using items only for the basis from which to group the suppliers to begin with).

    I read your posting in more detail and edited my post accordingly... unfortunately I didn't edit before you saw the first one. Now the original post should have what you're looking for (query 1).

    -

  • It's true that a basic group by gives the correct results on the sample data, but I have many more items supplied by the same group of suppliers. If I include the item in the group by I wind up with duplicate combinations of suppliers.

    Let's say Item 6 is supplied by B and D. With the group by (query 1) I would end up with 2 more rows returned for item 6 in addition to the two rows for item 5. Because items 5 and 6 are both supplied by B and D (and only B and D), what I'm looking for is one record for each of B and D (2 rows total) in group "Alpha" (to avoid confusion with items).

    Group Alpha is comprised of suppliers B and D. Essientially creating a new level of aggregation of common suppliers shared by one or more products.

  • This should do the trick for you... This yields the results you are looking but may not be the most efficient way to get those results. I didn't look into performance or optimizations much here (I just expanded upon the third query from the original post), at the very least this should get you started.

    ------------------------------------------------------------------------------------------

    -- Create a test table named tblOrder

    ------------------------------------------------------------------------------------------

    IF NOT(OBJECT_ID(N'tblOrder') is null)

    DROP TABLE tblOrder

    CREATE TABLE tblOrder (iOrderId INT IDENTITY(1,1),iItemId INT,vcSupplierId VARCHAR(1))

    CREATE CLUSTERED INDEX idx1 ON tblOrder (iItemId,vcSupplierId)

    ------------------------------------------------------------------------------------------

    -- Populate tblOrder with expected data results

    ------------------------------------------------------------------------------------------

    INSERT INTO tblOrder

    SELECT 1,'A'

    UNION SELECT 1,'B'

    UNION SELECT 2,'B'

    UNION SELECT 3,'A'

    UNION SELECT 4,'C'

    UNION SELECT 4,'A'

    UNION SELECT 4,'B'

    UNION SELECT 5,'B'

    UNION SELECT 5,'D'

    UNION SELECT 6,'B'

    UNION SELECT 6,'D'

    ------------------------------------------------------------------------------------------

    -- Query that returns items with the concatenated list of suppliers (delimited by ',')

    ------------------------------------------------------------------------------------------

    SELECT tbl2.iGroupId,tbl1.vcSupplierId

    FROM (

    SELECT ROW_NUMBER() OVER(ORDER BY vcItemSuppliers) iGroupId,LEFT(vcItemSuppliers,len(vcItemSuppliers)-1) vcItemSuppliers

    FROM (

    SELECT oq.iItemId

    ,(

    SELECT t1.vcSupplierId+','

    FROM tblOrder t1

    WHERE t1.iItemId = oq.iItemId

    FOR XML PATH('')

    ) vcItemSuppliers

    FROM tblOrder oq

    ) SQ

    GROUP BY vcItemSuppliers

    ) tbl2

    INNER JOIN tblOrder tbl1

    ON tbl2.vcItemSuppliers LIKE '%'+tbl1.vcSupplierId+'%'

    GROUP BY tbl2.iGroupId,tbl1.vcSupplierId

    The results look like this:[font="Courier New"]

    iGroupId vcSupplierId

    -------------------- ------------

    1 A

    2 A

    2 B

    3 A

    3 B

    3 C

    4 B

    5 B

    5 D

    (9 row(s) affected)[/font]

    -

  • Jason - thanks for the help. I'm running now. As long as it works optimization shouldn't be too much of an issue because I should only need to run rarely.

  • I was able to run this, although I had to essientially run the basic group by query as the insert into tblOrder for the actual query to have any chance to finish reasonably.

    I'm now stuck on the next step. For each Item at each Supplier some number (N) of units were sold. I'd like to populate the final result with the percent share that the supplier (vcSupplierID) for that record gave to the overall group (iGroupID). Modifying the select into tblOrder I have created:

    ------------------------------------------------------------------------------------------

    -- Create a test table named tblOrder

    ------------------------------------------------------------------------------------------

    IF NOT(OBJECT_ID(N'tblOrder') is null)

    DROP TABLE tblOrder

    CREATE TABLE tblOrder (iOrderId INT IDENTITY(1,1),iItemId INT,vcSupplierId VARCHAR(1), iUnits INT)

    CREATE CLUSTERED INDEX idx1 ON tblOrder (iItemId,vcSupplierId)

    ------------------------------------------------------------------------------------------

    -- Populate tblOrder with expected data results

    ------------------------------------------------------------------------------------------

    INSERT INTO tblOrder

    SELECT 1,'A', 2

    UNION SELECT 1,'B', 3

    UNION SELECT 2,'B', 4

    UNION SELECT 3,'A', 5

    UNION SELECT 4,'C', 6

    UNION SELECT 4,'A', 7

    UNION SELECT 4,'B', 8

    UNION SELECT 5,'B', 9

    UNION SELECT 5,'D', 10

    UNION SELECT 6,'B', 11

    UNION SELECT 6,'D', 12

    The added column in the results would be the decimal values in the column I've manually entered here:

    iGroupId vcSupplierId decShare

    -------------------- ------------ --------

    1 A 2/2=1

    2 A 2/5=.4

    2 B 3/5=.6

    3 A 7/21=.333

    3 B 8/21

    3 C 6/21

    4 B 4/4=1

    5 B 20/42

    5 D 22/42

    Mathematically the third column is the sum of units for that vcSupplierID across all iItemID's included in the iGroupID for that vcSupplierID divided by the sum for the iGroupID for all iItemID's in that iGroupID. I'm not sure how to get back to these units in order to accomplish the division.

    Thanks again for all of the assistance.

  • Look for OVER() clause with PARTITION BY clause and aggregating with SUM function in BOOKS ONLINE. Something like this should do.

    SELECT *, ( Units * 100 ) / SUM( Units ) OVER( PARTITION BY vcSupplierId ) AS Share FROM tblOrder

    --Ramesh


  • I think I've managed to get it. Here's the code I ended up using:

    SELECTtbl2.iGroupId,

    tbl1.vcSupplierId,

    sum(tbl1.units) / tbl2.GroupUnits as share

    FROM(SELECTROW_NUMBER() OVER(ORDER BY sq.vcItemSuppliers) iGroupId,

    LEFT(sq.vcItemSuppliers,len(sq.vcItemSuppliers)-1) vcItemSuppliers,

    sum(sq.itemunits) as GroupUnits

    FROM(SELECToq.iItemId,

    (SELECTt1.vcSupplierId+','

    FROMtblOrder t1

    WHEREt1.iItemId = oq.iItemId

    FOR XML PATH('')) vcItemSuppliers,

    cast(sum(units) as float) as itemunits

    FROMtblOrder oq

    group byoq.iItemID) SQ

    GROUP BYsq.vcItemSuppliers) tbl2

    INNER JOIN(SELECToq.iItemId,

    (SELECTt1.vcSupplierId+','

    FROMtblOrder t1

    WHEREt1.iItemId = oq.iItemId

    FOR XML PATH('')) vcItemSuppliers

    FROMtblOrder oq

    group byoq.iItemID) SQ2

    ONLEFT(sq2.vcItemSuppliers,len(sq2.vcItemSuppliers)-1) = tbl2.vcItemSuppliers

    INNER JOINtblOrder tbl1

    ONtbl2.vcItemSuppliers LIKE '%'+tbl1.vcSupplierId+'%'

    andtbl1.iItemID = SQ2.iItemID

    GROUP BYtbl2.iGroupId,

    tbl1.vcSupplierId,

    tbl2.GroupUnits

    order bytbl2.iGroupId,

    tbl1.vcSupplierId

    Thanks again!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply