A delicious dilemma

Question

A delicious dilemma

sqldriver

SSChampion

Points: 10440
More actions
February 1, 2015 at 9:23 pm

#314203

Hello,
Given the below table, how would you go about finding bars that carry the same brands of scotch? You can break it up in any way you want, and handle orphans (scotch only served at one bar) any way you want, but they have to be represented.
I've been wrestling with this one for most of today. I thought I had a few recursive cte solutions, but they're tough because there's no real anchor point; you end up with multiple parents in most cases. Ditto Jeff Moden's two articles on hierarchies (#1[/url]), (#2[/url]).
USE tempdb
CREATE TABLE #Wobbly (
Bar VARCHAR(5),
Scotch VARCHAR(50)
)
INSERT #Wobbly
( Bar, Scotch )
SELECT 'Bar1','Laphroaig;Lagavulin;Macallan' UNION ALL
SELECT 'Bar2','Talisker;Bowmore' UNION ALL
SELECT 'Bar3','Ardbeg;Oban' UNION ALL
SELECT 'Bar4','Speyburn;Dewars' UNION ALL
SELECT 'Bar5','Glenmorangie;Dalmore' UNION ALL
SELECT 'Bar6','Dalmore' UNION ALL
SELECT 'Bar7','Glenlivet;Talisker;Balvenie;Glengoyne;Springbank' UNION ALL
SELECT 'Bar8','Glenmorangie;Bruichladdich' UNION ALL
SELECT 'Bar9','Laphroaig;Dewars;HighlandPark;Speyburn' UNION ALL
SELECT 'Bar10','Laphroaig;Ardbeg;DunBheagan;' UNION ALL
SELECT 'Bar11','Dalmore;Talisker' UNION ALL
SELECT 'Bar12','Talisker;Oban;Tomintoul;Kilchomin' UNION ALL
SELECT 'Bar13','Bruichladdich;Glenfiddich' UNION ALL
SELECT 'Bar14','AnCnoc;Bruichladdich;Dalwhinnie;Tomintoul' UNION ALL
SELECT 'Bar15','Dalmore;Teachers' UNION ALL
SELECT 'Bar16','OldPulteney;Arberlour;Macallan' UNION ALL
SELECT 'Bar17','Glenmorangie;CompassBox' UNION ALL
SELECT 'Bar18','Longrow;Glenrothes;Dewars' UNION ALL
SELECT 'Bar19','Tomintoul' UNION ALL
SELECT 'Bar20','Glenmorangie;Bunnahabhain' UNION ALL
SELECT 'Bar21','Dalmore;Tomintoul'
Output would be something like:
BarX 0 (starting point)
BarY 1 (sells same scotch as BarX)
BarZ 2 (sells same scotch as BarY but not BarX)
BarA 2 (also sells same scotch as BarY but not BarX)
BarB 3 (sells same scotch as BarA but not XYZ)
BarC 99 (doesn't sell scotch sold elsewhere)
Happy ~~Drinking!~~ Querying!

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182910 More actions · Answer 1

Quick thought, the problem itself is straightforward, the issue is more the schema, normalize it and the solution is easy.

😎

Phil Parkin SSC Guru Points: 246956 More actions · Answer 2

Eirikur Eiriksson (2/2/2015)
Quick thought, the problem itself is straightforward, the issue is more the schema, normalize it and the solution is easy.
😎

+10!

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

sqldriver SSChampion Points: 10440 More actions · Answer 3

Eirikur Eiriksson (2/2/2015)
Quick thought, the problem itself is straightforward, the issue is more the schema, normalize it and the solution is easy.
😎

That was one of the first things I did. There were about a dozen other columns in this load set that I had to do stuff with, but are irrelevant to this.

I posted the data up like this because I thought it might make it a bit easier for you to consume, and because it made it a lot easier for me to mask 😉

Anyway, I messed a bit with, after something like below, making a table of bars, and of scotch, and then a bunch of 'satellite' scotch per bar tables, all with unique IDs. I just got nowhere with this one.

SELECT Bar, Scotch, Item as ScotchNorm

INTO #WobblyNorm

FROM #Wobbly AS w

CROSS APPLY master.dbo.DelimitedSplit8K(Scotch, ';') AS dsk

SELECT *

FROM #WobblyNorm AS wn

Lowell SSC Guru Points: 323518 More actions · Answer 4

does this get you started, by normalizing and doing a preliminary join?

WITH BarsNormalized

AS

(

select * from #Wobbly

CROSS APPLY dbo.DelimitedSplit8K(Scotch,';') F1

)

SELECT T1.Bar ,T1.Item As ScothName,T2.Bar

FROM BarsNormalized T1

LEFT OUTER JOIN BarsNormalized T2 ON T1.ITem = T2.Item

WHERE T1.Bar <> T2.Bar

ORDER BY T1.Bar

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

sqldriver SSChampion Points: 10440 More actions · Answer 5

Hi Lowell,

I tried something similar, and my head ended up bouncing off it. The trick I was hoping to pull off was something like, provide any starting point, get back a list:

1. Bar you chose

2. Bar that also sells those scotches

2. Bar that also sells that scotches

3. Bar that sells same scotches as rank 2 bars but not starting bar

Thanks

Nevyn SSChampion Points: 14210 More actions · Answer 6

So you are doing a 6 degrees of Kevin Scotch on the tables?

Also, does a match have to one scotch in common or all the scotches of the original bar? Maybe sample output based on a couple different starting bars?

sqldriver SSChampion Points: 10440 More actions · Answer 7

Nevyn (2/2/2015)
So you are doing a 6 degrees of Kevin Scotch on the tables?
Also, does a match have to one scotch in common or all the scotches of the original bar? Maybe sample output based on a couple different starting bars?

Hi Nevyn,

Yes, that's about the size of it. Except maybe N degrees of Kevin Scotch.

Just one in common is fine.

Here's some example output, starting with Bar7

Thanks

EDIT: added remainder of output for Bar7.

BarScotchScotchNormDegree

--First degree

Bar7Glenlivet;Talisker;Balvenie;Glengoyne;SpringbankTalisker0

--Second degree

Bar2Talisker;BowmoreTalisker1

Bar11Dalmore;TaliskerTalisker1

Bar12Talisker;Oban;Tomintoul;KilchominTalisker1

--Third degree (note that bars from second degree are not considered here)

Bar14AnCnoc;Bruichladdich;Dalwhinnie;TomintoulTomintoul2

Bar15Dalmore;TeachersDalmore2

Bar19TomintoulTomintoul2

Bar21Dalmore;Tomintoul Dalmore2

Bar21Dalmore;TomintoulTomintoul2 --Note you don't need duplicates matches here. This line can be discarded

Bar3Ardbeg;ObanOban2

Bar5Glenmorangie;DalmoreDalmore2

Bar6DalmoreDalmore2

Bar10Laphroaig;Ardbeg;DunBheagan; Ardbeg3

Bar13Bruichladdich;GlenfiddichBruichladdich3

Bar17Glenmorangie;CompassBox Glenmorangie3

Bar20Glenmorangie;BunnahabhainGlenmorangie3

Bar8Glenmorangie;BruichladdichBruichladdich3

Bar8Glenmorangie;BruichladdichGlenmorangie3

Bar1Laphroaig;Lagavulin;MacallanLaphroaig4

Bar9Laphroaig;Dewars;HighlandPark;SpeyburnLaphroaig4

Bar16OldPulteney;Arberlour;MacallanMacallan5

Bar18Longrow;Glenrothes;DewarsDewars5

Bar4Speyburn;DewarsDewars5

Bar4Speyburn;DewarsSpeyburn5

Nevyn SSChampion Points: 14210 More actions · Answer 8

This was actually sneakily trickier than it first appears, as showing the 0 level doesn't work with the recursion, and taking duplicates out is tough to work in with the recursion.

This is my first pass, but its pretty messy:

;

DECLARE @StartBar VARCHAR(5) = 'Bar7'

;

WITH BarsNormalized

AS

(

select * from #Wobbly

CROSS APPLY [AdventureWorks2012].dbo.DelimitedSplit8K(Scotch,';') F1

), AllMatches AS (

SELECT T1.Bar FirstBar ,T1.Item As ScotchName,T2.Bar SecondBar

FROM BarsNormalized T1

LEFT OUTER JOIN BarsNormalized T2 ON T1.ITem = T2.Item

WHERE T1.Bar <> T2.Bar

), RecursiveCTE AS (

SELECT 1 MatchDegrees, FirstBar, ScotchName, SecondBar

FROM AllMatches

WHERE FirstBar = @StartBar

UNION ALL

SELECT MatchDegrees+1, a.FirstBar, a.ScotchName, a.Secondbar

FROM RecursiveCTE r

JOIN AllMatches a

ON a.FirstBar = r.SecondBar

AND a.ScotchName <> r.ScotchName

AND a.SecondBar <> @StartBar

AND MatchDegrees < 10

), Add0LevelAndUniqueRN AS (

SELECT ROW_NUMBER() OVER (PARTITION BY SecondBar ORDER BY MatchDegrees ASC) rn,* FROM RecursiveCTE

UNION ALL

SELECT 1,0,'Bar7','Base Level','Bar7'

)

SELECT

MatchDegrees,SecondBar, w.Scotch, ScotchName

FROM Add0LevelAndUniqueRN a

JOIN #Wobbly w

ON a.SecondBar = w.Bar

WHERE rn=1

ORDER BY MatchDegrees,SecondBar

So this one recursively goes up to 10 degrees through all possible matches, then ranks the appearance of each whiskey to get its first match. Of course, you can move the recursion all the way out to 99, but it will be way slower since it doesn't remove dups until the end.

You can switch row_number() to rank() if you want the duplicate matches of a bar at a level.

Of course, it would be way easier to write in a function/procedure, and run a loop until you added no new bars. Doing that, you can check against inserting duplicates as you go.

sqldriver SSChampion Points: 10440 More actions · Answer 9

Nevyn (2/2/2015)
This was actually sneakily trickier than it first appears, as showing the 0 level doesn't work with the recursion, and taking duplicates out is tough to work in with the recursion.
This is my first pass, but its pretty messy:
;
DECLARE @StartBar VARCHAR(5) = 'Bar7'
;
WITH BarsNormalized
AS
(
select * from #Wobbly
CROSS APPLY [AdventureWorks2012].dbo.DelimitedSplit8K(Scotch,';') F1
), AllMatches AS (
SELECT T1.Bar FirstBar ,T1.Item As ScotchName,T2.Bar SecondBar
FROM BarsNormalized T1
LEFT OUTER JOIN BarsNormalized T2 ON T1.ITem = T2.Item
WHERE T1.Bar <> T2.Bar
), RecursiveCTE AS (
SELECT 1 MatchDegrees, FirstBar, ScotchName, SecondBar
FROM AllMatches
WHERE FirstBar = @StartBar
UNION ALL
SELECT MatchDegrees+1, a.FirstBar, a.ScotchName, a.Secondbar
FROM RecursiveCTE r
JOIN AllMatches a
ON a.FirstBar = r.SecondBar
AND a.ScotchName <> r.ScotchName
AND a.SecondBar <> @StartBar
AND MatchDegrees < 10
), Add0LevelAndUniqueRN AS (
SELECT ROW_NUMBER() OVER (PARTITION BY SecondBar ORDER BY MatchDegrees ASC) rn,* FROM RecursiveCTE
UNION ALL
SELECT 1,0,'Bar7','Base Level','Bar7'
)
SELECT
MatchDegrees,SecondBar, w.Scotch, ScotchName
FROM Add0LevelAndUniqueRN a
JOIN #Wobbly w
ON a.SecondBar = w.Bar
WHERE rn=1
ORDER BY MatchDegrees,SecondBar
So this one recursively goes up to 10 degrees through all possible matches, then ranks the appearance of each whiskey to get its first match. Of course, you can move the recursion all the way out to 99, but it will be way slower since it doesn't remove dups until the end.
You can switch row_number() to rank() if you want the duplicate matches of a bar at a level.
Of course, it would be way easier to write in a function/procedure, and run a loop until you added no new bars. Doing that, you can check against inserting duplicates as you go.

Yeah! This problem is really weird to solve with SQL Server. It's probably simple with something that uses arrays, like one of those rinky-dink OSS RDBMSs, or one of those "programming" "languages" I've heard stirrings about 😉

I'd be fine with encapsulating, just wanted to at least get a good logical direction down first. That all looks good to me. I'll spend some time with it tonight.

Thanks

Dwain Camps SSC Guru Points: 86948 More actions · Answer 10

sqldriver (2/1/2015)
Hello,
Given the below table, how would you go about finding bars that carry the same brands of scotch?

So I've got to ask, why do you need it in such a strange output format?

Isn't this what you're looking for (according to a simplistic interpretation of your original requirement)?

WITH Scotches AS

(

SELECT Bar, Scotch, ItemNumber, Item

FROM #Wobbly

CROSS APPLY dbo.DelimitedSplit8K(Scotch, ';')

)

SELECT Scotch=Item

,Bars=STUFF(

(

SELECT ', ' + Bar

FROM Scotches b

WHERE a.Item = b.Item

ORDER BY CAST(SUBSTRING(Bar, 4, 2) AS INT)

FOR XML PATH('')

), 1, 2, '')

FROM Scotches a

GROUP BY Item;

Results:

Scotch Bars

AnCnoc Bar14

Arberlour Bar16

Ardbeg Bar3, Bar10

Balvenie Bar7

Bowmore Bar2

Bruichladdich Bar8, Bar13, Bar14

Bunnahabhain Bar20

CompassBox Bar17

Dalmore Bar5, Bar6, Bar11, Bar15, Bar21

Dalwhinnie Bar14

Dewars Bar4, Bar9, Bar18

DunBheagan Bar10

Glenfiddich Bar13

Glengoyne Bar7

Glenlivet Bar7

Glenmorangie Bar5, Bar8, Bar17, Bar20

Glenrothes Bar18

HighlandPark Bar9

Kilchomin Bar12

Lagavulin Bar1

Laphroaig Bar1, Bar9, Bar10

Longrow Bar18

Macallan Bar1, Bar16

Oban Bar3, Bar12

OldPulteney Bar16

Speyburn Bar4, Bar9

Springbank Bar7

Talisker Bar2, Bar7, Bar11, Bar12

Teachers Bar15

Tomintoul Bar12, Bar14, Bar19, Bar21

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

sqldriver SSChampion Points: 10440 More actions · Answer 11

dwain.c (2/2/2015)
sqldriver (2/1/2015)
Hello,
Given the below table, how would you go about finding bars that carry the same brands of scotch?
So I've got to ask, why do you need it in such a strange output format?
Isn't this what you're looking for (according to a simplistic interpretation of your original requirement)?
WITH Scotches AS
(
SELECT Bar, Scotch, ItemNumber, Item
FROM #Wobbly
CROSS APPLY dbo.DelimitedSplit8K(Scotch, ';')
)
SELECT Scotch=Item
,Bars=STUFF(
(
SELECT ', ' + Bar
FROM Scotches b
WHERE a.Item = b.Item
ORDER BY CAST(SUBSTRING(Bar, 4, 2) AS INT)
FOR XML PATH('')
), 1, 2, '')
FROM Scotches a
GROUP BY Item;
Results:
Scotch Bars
AnCnoc Bar14
Arberlour Bar16
Ardbeg Bar3, Bar10
Balvenie Bar7
Bowmore Bar2
Bruichladdich Bar8, Bar13, Bar14
Bunnahabhain Bar20
CompassBox Bar17
Dalmore Bar5, Bar6, Bar11, Bar15, Bar21
Dalwhinnie Bar14
Dewars Bar4, Bar9, Bar18
DunBheagan Bar10
Glenfiddich Bar13
Glengoyne Bar7
Glenlivet Bar7
Glenmorangie Bar5, Bar8, Bar17, Bar20
Glenrothes Bar18
HighlandPark Bar9
Kilchomin Bar12
Lagavulin Bar1
Laphroaig Bar1, Bar9, Bar10
Longrow Bar18
Macallan Bar1, Bar16
Oban Bar3, Bar12
OldPulteney Bar16
Speyburn Bar4, Bar9
Springbank Bar7
Talisker Bar2, Bar7, Bar11, Bar12
Teachers Bar15
Tomintoul Bar12, Bar14, Bar19, Bar21

Hi Dwain,

The format is an end user specification, which sort of makes sense, if you ever happen across it.

That's a cool trick, re-de-normalizing the data, but, maybe I'm dense, how would you apply the next degree of relationships to that? I suppose I could just search the bar column for LIKE '%BarXX%', but it doesn't give me the flattened family tree-ish output I need, where secondary relationships are established, like in Nevyn's example.

Thanks

Dwain Camps SSC Guru Points: 86948 More actions · Answer 12

sqldriver (2/2/2015)
dwain.c (2/2/2015)
sqldriver (2/1/2015)
Hello,
Given the below table, how would you go about finding bars that carry the same brands of scotch?
So I've got to ask, why do you need it in such a strange output format?
Isn't this what you're looking for (according to a simplistic interpretation of your original requirement)?
WITH Scotches AS
(
SELECT Bar, Scotch, ItemNumber, Item
FROM #Wobbly
CROSS APPLY dbo.DelimitedSplit8K(Scotch, ';')
)
SELECT Scotch=Item
,Bars=STUFF(
(
SELECT ', ' + Bar
FROM Scotches b
WHERE a.Item = b.Item
ORDER BY CAST(SUBSTRING(Bar, 4, 2) AS INT)
FOR XML PATH('')
), 1, 2, '')
FROM Scotches a
GROUP BY Item;
Results:
Scotch Bars
AnCnoc Bar14
Arberlour Bar16
Ardbeg Bar3, Bar10
Balvenie Bar7
Bowmore Bar2
Bruichladdich Bar8, Bar13, Bar14
Bunnahabhain Bar20
CompassBox Bar17
Dalmore Bar5, Bar6, Bar11, Bar15, Bar21
Dalwhinnie Bar14
Dewars Bar4, Bar9, Bar18
DunBheagan Bar10
Glenfiddich Bar13
Glengoyne Bar7
Glenlivet Bar7
Glenmorangie Bar5, Bar8, Bar17, Bar20
Glenrothes Bar18
HighlandPark Bar9
Kilchomin Bar12
Lagavulin Bar1
Laphroaig Bar1, Bar9, Bar10
Longrow Bar18
Macallan Bar1, Bar16
Oban Bar3, Bar12
OldPulteney Bar16
Speyburn Bar4, Bar9
Springbank Bar7
Talisker Bar2, Bar7, Bar11, Bar12
Teachers Bar15
Tomintoul Bar12, Bar14, Bar19, Bar21
Hi Dwain,
The format is an end user specification, which sort of makes sense, if you ever happen across it.
That's a cool trick, re-de-normalizing the data, but, maybe I'm dense, how would you apply the next degree of relationships to that? I suppose I could just search the bar column for LIKE '%BarXX%', but it doesn't give me the flattened family tree-ish output I need, where secondary relationships are established, like in Nevyn's example.
Thanks

As you've pointed out, Nevyn's solution seems to give you the format you need (I tested it and so it appeared to me). I sort of figured you'd come back with "end-user specification" as the answer to my question.

But personally, I don't really see the point of this "secondary relationships" thing. My suggestion was more along the lines of trying to understand why the user needs this information and offering alternatives that are easier to produce. If you've got hundreds of bars and/or types of scotch (imagine the bar-crawl that!), an rCTE isn't going to do wonders for your server's performance.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86948 More actions · Answer 13

One other thing. If you do have lots of rows (more than in the sample data), you might want to experiment with breaking Nevyn's solution into a divide-and-conquer approach, or possibly even using a WHILE loop to do the recursive part.

It might be a bit faster for you.

When you've got a rCTE in amongst several cascaded CTEs (Nevyn's has 3), sometimes what I've suggested causes the optimizer to have to do a bit less work to get a workable query plan from it.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

sqldriver SSChampion Points: 10440 More actions · Answer 14

Ah, yeah, it's not much more rows than this. I guess the point of the secondary relationships is 'route choosing' of sorts. Like, if you start with BarX, these are your next possible choices (scotch in common) and then these are your next choices (secondary scotch in common) and then these are your next choices (tertiary scotch in common) and so on, until either all possible unique choices (bars) are exhausted, or all are exhausted and un-matched choices (bars) are displayed. If you picture that as a single list column that allows someone to choose a 'search path', it may make more sense.

Thanks