A delicious dilemma

  • Hello,

    Given the below table, how would you go about finding bars that carry the same brands of scotch? You can break it up in any way you want, and handle orphans (scotch only served at one bar) any way you want, but they have to be represented.

    I've been wrestling with this one for most of today. I thought I had a few recursive cte solutions, but they're tough because there's no real anchor point; you end up with multiple parents in most cases. Ditto Jeff Moden's two articles on hierarchies (#1[/url]), (#2[/url]).

    USE tempdb

    CREATE TABLE #Wobbly (

    Bar VARCHAR(5),

    Scotch VARCHAR(50)

    )

    INSERT #Wobbly

    ( Bar, Scotch )

    SELECT 'Bar1','Laphroaig;Lagavulin;Macallan' UNION ALL

    SELECT 'Bar2','Talisker;Bowmore' UNION ALL

    SELECT 'Bar3','Ardbeg;Oban' UNION ALL

    SELECT 'Bar4','Speyburn;Dewars' UNION ALL

    SELECT 'Bar5','Glenmorangie;Dalmore' UNION ALL

    SELECT 'Bar6','Dalmore' UNION ALL

    SELECT 'Bar7','Glenlivet;Talisker;Balvenie;Glengoyne;Springbank' UNION ALL

    SELECT 'Bar8','Glenmorangie;Bruichladdich' UNION ALL

    SELECT 'Bar9','Laphroaig;Dewars;HighlandPark;Speyburn' UNION ALL

    SELECT 'Bar10','Laphroaig;Ardbeg;DunBheagan;' UNION ALL

    SELECT 'Bar11','Dalmore;Talisker' UNION ALL

    SELECT 'Bar12','Talisker;Oban;Tomintoul;Kilchomin' UNION ALL

    SELECT 'Bar13','Bruichladdich;Glenfiddich' UNION ALL

    SELECT 'Bar14','AnCnoc;Bruichladdich;Dalwhinnie;Tomintoul' UNION ALL

    SELECT 'Bar15','Dalmore;Teachers' UNION ALL

    SELECT 'Bar16','OldPulteney;Arberlour;Macallan' UNION ALL

    SELECT 'Bar17','Glenmorangie;CompassBox' UNION ALL

    SELECT 'Bar18','Longrow;Glenrothes;Dewars' UNION ALL

    SELECT 'Bar19','Tomintoul' UNION ALL

    SELECT 'Bar20','Glenmorangie;Bunnahabhain' UNION ALL

    SELECT 'Bar21','Dalmore;Tomintoul'

    Output would be something like:

    BarX 0 (starting point)

    BarY 1 (sells same scotch as BarX)

    BarZ 2 (sells same scotch as BarY but not BarX)

    BarA 2 (also sells same scotch as BarY but not BarX)

    BarB 3 (sells same scotch as BarA but not XYZ)

    BarC 99 (doesn't sell scotch sold elsewhere)

    Happy Drinking! Querying!

  • Quick thought, the problem itself is straightforward, the issue is more the schema, normalize it and the solution is easy.

    😎

  • Eirikur Eiriksson (2/2/2015)


    Quick thought, the problem itself is straightforward, the issue is more the schema, normalize it and the solution is easy.

    😎

    +10!

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Eirikur Eiriksson (2/2/2015)


    Quick thought, the problem itself is straightforward, the issue is more the schema, normalize it and the solution is easy.

    😎

    That was one of the first things I did. There were about a dozen other columns in this load set that I had to do stuff with, but are irrelevant to this.

    I posted the data up like this because I thought it might make it a bit easier for you to consume, and because it made it a lot easier for me to mask 😉

    Anyway, I messed a bit with, after something like below, making a table of bars, and of scotch, and then a bunch of 'satellite' scotch per bar tables, all with unique IDs. I just got nowhere with this one.

    SELECT Bar, Scotch, Item as ScotchNorm

    INTO #WobblyNorm

    FROM #Wobbly AS w

    CROSS APPLY master.dbo.DelimitedSplit8K(Scotch, ';') AS dsk

    SELECT *

    FROM #WobblyNorm AS wn

  • does this get you started, by normalizing and doing a preliminary join?

    WITH BarsNormalized

    AS

    (

    select * from #Wobbly

    CROSS APPLY dbo.DelimitedSplit8K(Scotch,';') F1

    )

    SELECT T1.Bar ,T1.Item As ScothName,T2.Bar

    FROM BarsNormalized T1

    LEFT OUTER JOIN BarsNormalized T2 ON T1.ITem = T2.Item

    WHERE T1.Bar <> T2.Bar

    ORDER BY T1.Bar

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi Lowell,

    I tried something similar, and my head ended up bouncing off it. The trick I was hoping to pull off was something like, provide any starting point, get back a list:

    1. Bar you chose

    2. Bar that also sells those scotches

    2. Bar that also sells that scotches

    3. Bar that sells same scotches as rank 2 bars but not starting bar

    Thanks

  • So you are doing a 6 degrees of Kevin Scotch on the tables?

    Also, does a match have to one scotch in common or all the scotches of the original bar? Maybe sample output based on a couple different starting bars?

  • Nevyn (2/2/2015)


    So you are doing a 6 degrees of Kevin Scotch on the tables?

    Also, does a match have to one scotch in common or all the scotches of the original bar? Maybe sample output based on a couple different starting bars?

    Hi Nevyn,

    Yes, that's about the size of it. Except maybe N degrees of Kevin Scotch.

    Just one in common is fine.

    Here's some example output, starting with Bar7

    Thanks

    EDIT: added remainder of output for Bar7.

    BarScotchScotchNormDegree

    --First degree

    Bar7Glenlivet;Talisker;Balvenie;Glengoyne;SpringbankTalisker0

    Bar7Glenlivet;Talisker;Balvenie;Glengoyne;SpringbankTalisker0

    Bar7Glenlivet;Talisker;Balvenie;Glengoyne;SpringbankTalisker0

    --Second degree

    Bar2Talisker;BowmoreTalisker1

    Bar11Dalmore;TaliskerTalisker1

    Bar12Talisker;Oban;Tomintoul;KilchominTalisker1

    --Third degree (note that bars from second degree are not considered here)

    Bar14AnCnoc;Bruichladdich;Dalwhinnie;TomintoulTomintoul2

    Bar15Dalmore;TeachersDalmore2

    Bar19TomintoulTomintoul2

    Bar21Dalmore;Tomintoul Dalmore2

    Bar21Dalmore;TomintoulTomintoul2 --Note you don't need duplicates matches here. This line can be discarded

    Bar3Ardbeg;ObanOban2

    Bar5Glenmorangie;DalmoreDalmore2

    Bar6DalmoreDalmore2

    Bar10Laphroaig;Ardbeg;DunBheagan; Ardbeg3

    Bar13Bruichladdich;GlenfiddichBruichladdich3

    Bar17Glenmorangie;CompassBox Glenmorangie3

    Bar20Glenmorangie;BunnahabhainGlenmorangie3

    Bar8Glenmorangie;BruichladdichBruichladdich3

    Bar8Glenmorangie;BruichladdichGlenmorangie3

    Bar1Laphroaig;Lagavulin;MacallanLaphroaig4

    Bar9Laphroaig;Dewars;HighlandPark;SpeyburnLaphroaig4

    Bar16OldPulteney;Arberlour;MacallanMacallan5

    Bar18Longrow;Glenrothes;DewarsDewars5

    Bar4Speyburn;DewarsDewars5

    Bar4Speyburn;DewarsSpeyburn5

  • This was actually sneakily trickier than it first appears, as showing the 0 level doesn't work with the recursion, and taking duplicates out is tough to work in with the recursion.

    This is my first pass, but its pretty messy:

    ;

    DECLARE @StartBar VARCHAR(5) = 'Bar7'

    ;

    WITH BarsNormalized

    AS

    (

    select * from #Wobbly

    CROSS APPLY [AdventureWorks2012].dbo.DelimitedSplit8K(Scotch,';') F1

    ), AllMatches AS (

    SELECT T1.Bar FirstBar ,T1.Item As ScotchName,T2.Bar SecondBar

    FROM BarsNormalized T1

    LEFT OUTER JOIN BarsNormalized T2 ON T1.ITem = T2.Item

    WHERE T1.Bar <> T2.Bar

    ), RecursiveCTE AS (

    SELECT 1 MatchDegrees, FirstBar, ScotchName, SecondBar

    FROM AllMatches

    WHERE FirstBar = @StartBar

    UNION ALL

    SELECT MatchDegrees+1, a.FirstBar, a.ScotchName, a.Secondbar

    FROM RecursiveCTE r

    JOIN AllMatches a

    ON a.FirstBar = r.SecondBar

    AND a.ScotchName <> r.ScotchName

    AND a.SecondBar <> @StartBar

    AND MatchDegrees < 10

    ), Add0LevelAndUniqueRN AS (

    SELECT ROW_NUMBER() OVER (PARTITION BY SecondBar ORDER BY MatchDegrees ASC) rn,* FROM RecursiveCTE

    UNION ALL

    SELECT 1,0,'Bar7','Base Level','Bar7'

    )

    SELECT

    MatchDegrees,SecondBar, w.Scotch, ScotchName

    FROM Add0LevelAndUniqueRN a

    JOIN #Wobbly w

    ON a.SecondBar = w.Bar

    WHERE rn=1

    ORDER BY MatchDegrees,SecondBar

    So this one recursively goes up to 10 degrees through all possible matches, then ranks the appearance of each whiskey to get its first match. Of course, you can move the recursion all the way out to 99, but it will be way slower since it doesn't remove dups until the end.

    You can switch row_number() to rank() if you want the duplicate matches of a bar at a level.

    Of course, it would be way easier to write in a function/procedure, and run a loop until you added no new bars. Doing that, you can check against inserting duplicates as you go.

  • Nevyn (2/2/2015)


    This was actually sneakily trickier than it first appears, as showing the 0 level doesn't work with the recursion, and taking duplicates out is tough to work in with the recursion.

    This is my first pass, but its pretty messy:

    ;

    DECLARE @StartBar VARCHAR(5) = 'Bar7'

    ;

    WITH BarsNormalized

    AS

    (

    select * from #Wobbly

    CROSS APPLY [AdventureWorks2012].dbo.DelimitedSplit8K(Scotch,';') F1

    ), AllMatches AS (

    SELECT T1.Bar FirstBar ,T1.Item As ScotchName,T2.Bar SecondBar

    FROM BarsNormalized T1

    LEFT OUTER JOIN BarsNormalized T2 ON T1.ITem = T2.Item

    WHERE T1.Bar <> T2.Bar

    ), RecursiveCTE AS (

    SELECT 1 MatchDegrees, FirstBar, ScotchName, SecondBar

    FROM AllMatches

    WHERE FirstBar = @StartBar

    UNION ALL

    SELECT MatchDegrees+1, a.FirstBar, a.ScotchName, a.Secondbar

    FROM RecursiveCTE r

    JOIN AllMatches a

    ON a.FirstBar = r.SecondBar

    AND a.ScotchName <> r.ScotchName

    AND a.SecondBar <> @StartBar

    AND MatchDegrees < 10

    ), Add0LevelAndUniqueRN AS (

    SELECT ROW_NUMBER() OVER (PARTITION BY SecondBar ORDER BY MatchDegrees ASC) rn,* FROM RecursiveCTE

    UNION ALL

    SELECT 1,0,'Bar7','Base Level','Bar7'

    )

    SELECT

    MatchDegrees,SecondBar, w.Scotch, ScotchName

    FROM Add0LevelAndUniqueRN a

    JOIN #Wobbly w

    ON a.SecondBar = w.Bar

    WHERE rn=1

    ORDER BY MatchDegrees,SecondBar

    So this one recursively goes up to 10 degrees through all possible matches, then ranks the appearance of each whiskey to get its first match. Of course, you can move the recursion all the way out to 99, but it will be way slower since it doesn't remove dups until the end.

    You can switch row_number() to rank() if you want the duplicate matches of a bar at a level.

    Of course, it would be way easier to write in a function/procedure, and run a loop until you added no new bars. Doing that, you can check against inserting duplicates as you go.

    Yeah! This problem is really weird to solve with SQL Server. It's probably simple with something that uses arrays, like one of those rinky-dink OSS RDBMSs, or one of those "programming" "languages" I've heard stirrings about 😉

    I'd be fine with encapsulating, just wanted to at least get a good logical direction down first. That all looks good to me. I'll spend some time with it tonight.

    Thanks

  • sqldriver (2/1/2015)


    Hello,

    Given the below table, how would you go about finding bars that carry the same brands of scotch?

    So I've got to ask, why do you need it in such a strange output format?

    Isn't this what you're looking for (according to a simplistic interpretation of your original requirement)?

    WITH Scotches AS

    (

    SELECT Bar, Scotch, ItemNumber, Item

    FROM #Wobbly

    CROSS APPLY dbo.DelimitedSplit8K(Scotch, ';')

    )

    SELECT Scotch=Item

    ,Bars=STUFF(

    (

    SELECT ', ' + Bar

    FROM Scotches b

    WHERE a.Item = b.Item

    ORDER BY CAST(SUBSTRING(Bar, 4, 2) AS INT)

    FOR XML PATH('')

    ), 1, 2, '')

    FROM Scotches a

    GROUP BY Item;

    Results:

    Scotch Bars

    AnCnoc Bar14

    Arberlour Bar16

    Ardbeg Bar3, Bar10

    Balvenie Bar7

    Bowmore Bar2

    Bruichladdich Bar8, Bar13, Bar14

    Bunnahabhain Bar20

    CompassBox Bar17

    Dalmore Bar5, Bar6, Bar11, Bar15, Bar21

    Dalwhinnie Bar14

    Dewars Bar4, Bar9, Bar18

    DunBheagan Bar10

    Glenfiddich Bar13

    Glengoyne Bar7

    Glenlivet Bar7

    Glenmorangie Bar5, Bar8, Bar17, Bar20

    Glenrothes Bar18

    HighlandPark Bar9

    Kilchomin Bar12

    Lagavulin Bar1

    Laphroaig Bar1, Bar9, Bar10

    Longrow Bar18

    Macallan Bar1, Bar16

    Oban Bar3, Bar12

    OldPulteney Bar16

    Speyburn Bar4, Bar9

    Springbank Bar7

    Talisker Bar2, Bar7, Bar11, Bar12

    Teachers Bar15

    Tomintoul Bar12, Bar14, Bar19, Bar21


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (2/2/2015)


    sqldriver (2/1/2015)


    Hello,

    Given the below table, how would you go about finding bars that carry the same brands of scotch?

    So I've got to ask, why do you need it in such a strange output format?

    Isn't this what you're looking for (according to a simplistic interpretation of your original requirement)?

    WITH Scotches AS

    (

    SELECT Bar, Scotch, ItemNumber, Item

    FROM #Wobbly

    CROSS APPLY dbo.DelimitedSplit8K(Scotch, ';')

    )

    SELECT Scotch=Item

    ,Bars=STUFF(

    (

    SELECT ', ' + Bar

    FROM Scotches b

    WHERE a.Item = b.Item

    ORDER BY CAST(SUBSTRING(Bar, 4, 2) AS INT)

    FOR XML PATH('')

    ), 1, 2, '')

    FROM Scotches a

    GROUP BY Item;

    Results:

    Scotch Bars

    AnCnoc Bar14

    Arberlour Bar16

    Ardbeg Bar3, Bar10

    Balvenie Bar7

    Bowmore Bar2

    Bruichladdich Bar8, Bar13, Bar14

    Bunnahabhain Bar20

    CompassBox Bar17

    Dalmore Bar5, Bar6, Bar11, Bar15, Bar21

    Dalwhinnie Bar14

    Dewars Bar4, Bar9, Bar18

    DunBheagan Bar10

    Glenfiddich Bar13

    Glengoyne Bar7

    Glenlivet Bar7

    Glenmorangie Bar5, Bar8, Bar17, Bar20

    Glenrothes Bar18

    HighlandPark Bar9

    Kilchomin Bar12

    Lagavulin Bar1

    Laphroaig Bar1, Bar9, Bar10

    Longrow Bar18

    Macallan Bar1, Bar16

    Oban Bar3, Bar12

    OldPulteney Bar16

    Speyburn Bar4, Bar9

    Springbank Bar7

    Talisker Bar2, Bar7, Bar11, Bar12

    Teachers Bar15

    Tomintoul Bar12, Bar14, Bar19, Bar21

    Hi Dwain,

    The format is an end user specification, which sort of makes sense, if you ever happen across it.

    That's a cool trick, re-de-normalizing the data, but, maybe I'm dense, how would you apply the next degree of relationships to that? I suppose I could just search the bar column for LIKE '%BarXX%', but it doesn't give me the flattened family tree-ish output I need, where secondary relationships are established, like in Nevyn's example.

    Thanks

  • sqldriver (2/2/2015)


    dwain.c (2/2/2015)


    sqldriver (2/1/2015)


    Hello,

    Given the below table, how would you go about finding bars that carry the same brands of scotch?

    So I've got to ask, why do you need it in such a strange output format?

    Isn't this what you're looking for (according to a simplistic interpretation of your original requirement)?

    WITH Scotches AS

    (

    SELECT Bar, Scotch, ItemNumber, Item

    FROM #Wobbly

    CROSS APPLY dbo.DelimitedSplit8K(Scotch, ';')

    )

    SELECT Scotch=Item

    ,Bars=STUFF(

    (

    SELECT ', ' + Bar

    FROM Scotches b

    WHERE a.Item = b.Item

    ORDER BY CAST(SUBSTRING(Bar, 4, 2) AS INT)

    FOR XML PATH('')

    ), 1, 2, '')

    FROM Scotches a

    GROUP BY Item;

    Results:

    Scotch Bars

    AnCnoc Bar14

    Arberlour Bar16

    Ardbeg Bar3, Bar10

    Balvenie Bar7

    Bowmore Bar2

    Bruichladdich Bar8, Bar13, Bar14

    Bunnahabhain Bar20

    CompassBox Bar17

    Dalmore Bar5, Bar6, Bar11, Bar15, Bar21

    Dalwhinnie Bar14

    Dewars Bar4, Bar9, Bar18

    DunBheagan Bar10

    Glenfiddich Bar13

    Glengoyne Bar7

    Glenlivet Bar7

    Glenmorangie Bar5, Bar8, Bar17, Bar20

    Glenrothes Bar18

    HighlandPark Bar9

    Kilchomin Bar12

    Lagavulin Bar1

    Laphroaig Bar1, Bar9, Bar10

    Longrow Bar18

    Macallan Bar1, Bar16

    Oban Bar3, Bar12

    OldPulteney Bar16

    Speyburn Bar4, Bar9

    Springbank Bar7

    Talisker Bar2, Bar7, Bar11, Bar12

    Teachers Bar15

    Tomintoul Bar12, Bar14, Bar19, Bar21

    Hi Dwain,

    The format is an end user specification, which sort of makes sense, if you ever happen across it.

    That's a cool trick, re-de-normalizing the data, but, maybe I'm dense, how would you apply the next degree of relationships to that? I suppose I could just search the bar column for LIKE '%BarXX%', but it doesn't give me the flattened family tree-ish output I need, where secondary relationships are established, like in Nevyn's example.

    Thanks

    As you've pointed out, Nevyn's solution seems to give you the format you need (I tested it and so it appeared to me). I sort of figured you'd come back with "end-user specification" as the answer to my question.

    But personally, I don't really see the point of this "secondary relationships" thing. My suggestion was more along the lines of trying to understand why the user needs this information and offering alternatives that are easier to produce. If you've got hundreds of bars and/or types of scotch (imagine the bar-crawl that!), an rCTE isn't going to do wonders for your server's performance.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • One other thing. If you do have lots of rows (more than in the sample data), you might want to experiment with breaking Nevyn's solution into a divide-and-conquer approach, or possibly even using a WHILE loop to do the recursive part.

    It might be a bit faster for you.

    When you've got a rCTE in amongst several cascaded CTEs (Nevyn's has 3), sometimes what I've suggested causes the optimizer to have to do a bit less work to get a workable query plan from it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Ah, yeah, it's not much more rows than this. I guess the point of the secondary relationships is 'route choosing' of sorts. Like, if you start with BarX, these are your next possible choices (scotch in common) and then these are your next choices (secondary scotch in common) and then these are your next choices (tertiary scotch in common) and so on, until either all possible unique choices (bars) are exhausted, or all are exhausted and un-matched choices (bars) are displayed. If you picture that as a single list column that allows someone to choose a 'search path', it may make more sense.

    Thanks

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply